Develop AWS Glue Jobs Locally Using PyCharm and Docker on Windows - step by step

Поділитися
Вставка
  • Опубліковано 6 вер 2024

КОМЕНТАРІ • 76

  • @IWasBoredSo
    @IWasBoredSo Рік тому +2

    you just saved me a few bucks that I was spending on Glue during some experiments and learning! Good to have that kind of content on youtube and possibility to support you :)

    • @DataEngUncomplicated
      @DataEngUncomplicated  Рік тому

      Wow, Thanks for the direct support through buying me a coffee, I really appreciate the support Hubert! I'm happy that you were able to save on compute costs!

  • @kandikondakarthik1432
    @kandikondakarthik1432 Рік тому +2

    Wow, simply amazing video. Very well explained and detailed information. Please keep doing the great work!

  • @dougkfarrell
    @dougkfarrell 3 місяці тому +1

    This is fantastic! I'm new to AWS Glue and was really struggling to get traction developing an ETL script. Being able to develop locally, I don't really care about the costs, but the ability to debug, get feedback, and just the turnaround time to try things is amazing. Again, thanks.
    I'd like to ask you more questions, how can I do that?

    • @DataEngUncomplicated
      @DataEngUncomplicated  3 місяці тому +1

      Thanks, feel free to post your questions here. Me or someone else might be able to help you out!

    • @dougkfarrell
      @dougkfarrell 3 місяці тому

      @@DataEngUncomplicated Thanks! I'm using Glue ETL to read two different CSV files into Dynamic Frames, normalize and union them together. I need to write some SQL to an existing RDS MySQL database to query records to figure out if I need to update or insert data. Is there a good (as in fast) way to iterate over the normalized, unioned DynamicFrame and read and write to an RDS MySQL database?
      Thanks in advance for any help!

  • @bartoszturkowyd3608
    @bartoszturkowyd3608 Рік тому +1

    Oh, such a great timing for such a great tutorial! Thank you very much! ❤

    • @DataEngUncomplicated
      @DataEngUncomplicated  Рік тому

      Thanks for your kind words! I'm glad it was helpful! I recommend this way to develop glue jobs.

  • @Fight3211
    @Fight3211 Рік тому +3

    Would love a similar tutorial for VScode :)

    • @DataEngUncomplicated
      @DataEngUncomplicated  Рік тому +3

      You're the second person that has requested this! Do you think more folks use vs code? I am considering making a video soon.

    • @ahm_mask5161
      @ahm_mask5161 Рік тому +2

      I was literally thinking the same thing

    • @waleayeni
      @waleayeni 11 місяців тому

      yes please

    • @user-cj4ug8pv3z
      @user-cj4ug8pv3z 11 місяців тому

      please >

    • @DataEngUncomplicated
      @DataEngUncomplicated  9 місяців тому

      I have just uploaded the video configuring aws glue with VS Code and docker. Thanks for the suggestion! ua-cam.com/video/__j-SyopVBs/v-deo.html

  • @ahm_mask5161
    @ahm_mask5161 Рік тому +1

    Loved the video would of loved it more if it was in vs code also if you could make a etl tutorial using glue locally that would be awesome

    • @DataEngUncomplicated
      @DataEngUncomplicated  Рік тому +1

      Thanks, I will make a video with vs code since there seems to be some demand for this! Yup I'm also working on some tutorials using glue locally in my next couple of videos

  • @herleyshaori
    @herleyshaori 7 місяців тому +1

    This video helps me.

  • @prabhathkota107
    @prabhathkota107 4 місяці тому +1

    Very much helpful. Thanks

  • @abhishekgarg6301
    @abhishekgarg6301 10 місяців тому +1

    great tutorial, will you be creating the same with visual studio code?

    • @DataEngUncomplicated
      @DataEngUncomplicated  10 місяців тому +1

      I will be as soon as I come back from vacation!

    • @DataEngUncomplicated
      @DataEngUncomplicated  9 місяців тому

      I have just uploaded the video configuring aws glue with VS Code and docker. Thanks for the suggestion! ua-cam.com/video/__j-SyopVBs/v-deo.html

  • @giorgosstamatakis7144
    @giorgosstamatakis7144 Рік тому +1

    Great video, I would like to ask if anyone else experienced the following issue.
    When I add the glue-libs repo as a new content root, PyCharm stops recognising the pyspark imports as valid. Moreover, the window visible in 6:10 (showing the available python packages) is empty. Any ideas on what could have gone wrong?

    • @maximilianrausch5193
      @maximilianrausch5193 6 місяців тому

      I am having the same issue (no packages shown as available). Any ideas how to fix it?

    • @DataEngUncomplicated
      @DataEngUncomplicated  6 місяців тому

      Thanks, is your docker container running? That's the first thing I would check to make sure it's not a problem finding the docker container on your machine

    • @maximilianrausch5193
      @maximilianrausch5193 6 місяців тому +1

      @@DataEngUncomplicated I updated pycharm to newest version and it resolved the issue.

  • @Patrick-ig3cn
    @Patrick-ig3cn Рік тому +1

    Amazing tutorial! You explained the whole process extremely clearly.
    Quick question, do you know if this is also possible to set up in VSCode?

    • @DataEngUncomplicated
      @DataEngUncomplicated  Рік тому +5

      Thanks Patrick! Great I'm glad it made sense. Yes! It is also possible to set up in vs code! I don't use vscode but I could make a video if enough people think it would be useful.

    • @Patrick-ig3cn
      @Patrick-ig3cn Рік тому +1

      Thanks for the reply, if there is demand for it I'd be extremely grateful!
      Otherwise thanks so much again for the tutorial, it's extremely enlightening on the whole process!

    • @DataEngUncomplicated
      @DataEngUncomplicated  Рік тому

      ​@@Patrick-ig3cn you're welcome! I highly recommend developing glue jobs locally. I have another video coming out tomorrow briefly explaining the benefits.

    • @DataEngUncomplicated
      @DataEngUncomplicated  9 місяців тому

      I have just uploaded the video for setting it up with vs code Thanks for the suggestion! ua-cam.com/video/__j-SyopVBs/v-deo.html

  • @mackfarshi8289
    @mackfarshi8289 8 місяців тому

    Thank you so much for this video. Very well explained and helpful. I was wondering if there is a way that we can also resolve "SparkContext" error in the import or link to a video you explain it. really appreciate it.

  • @user-rh1xc4qp6u
    @user-rh1xc4qp6u 9 місяців тому

    Great video! really nice!
    I am struggling to find out how to set the "--additional-python-modules" anyone else ? can´t find anything related to it for local run :(

  • @asishb
    @asishb 11 місяців тому +1

    Hello ! I dont have Professional version of PyCharm. Is there any way that you can explain how to configure using VS Code or free version of PyCharm ?

    • @DataEngUncomplicated
      @DataEngUncomplicated  11 місяців тому +3

      Hey! Sorry you need the professional version of pycharm for it to work. I plan on making a tutorial for vs code soon.

    • @DataEngUncomplicated
      @DataEngUncomplicated  9 місяців тому +1

      I have just uploaded the video configuring aws glue with VS Code and docker. Thanks for the suggestion! ua-cam.com/video/__j-SyopVBs/v-deo.html

  • @nguyentonggiang1994
    @nguyentonggiang1994 Рік тому

    Nice video. You've got a thumbs up from me. However, I got trouble when installing extra python libraries to the glue container. Could you please guide me how to install external python library to this glue container? Thanks a lot.

    • @DataEngUncomplicated
      @DataEngUncomplicated  Рік тому

      Thanks, that's a good question, I will have to get back to you. I'm sure you can do it by going into the docker container and installing them directly in there but I wonder if there is an easier way to do this.

  • @yashsrivastava14
    @yashsrivastava14 7 місяців тому

    Can you also show how to setup default credentials in the docker container?

    • @DataEngUncomplicated
      @DataEngUncomplicated  7 місяців тому

      Hi, yes I cover this in the video. You have to set the credential path in the docker image.

  • @prabhathkota107
    @prabhathkota107 4 місяці тому

    Docker option not available in PyCharm community edition I guess

  • @aabbassp
    @aabbassp 10 місяців тому

    Thanks for the video! Amazing.
    Can you deploy this to AWS somehow automatically or you need to do it manually?

    • @DataEngUncomplicated
      @DataEngUncomplicated  10 місяців тому

      Yes! You cann deploy this automatically many ways. Using terraform, cdk , or cloud formation template

  • @kkos
    @kkos 9 місяців тому

    Great Video! All works, however we cannot use Docker API you're using in tutorial. I've tried to connect to Docker daemon using SSH. I can run Glue Job, but cannot run debugger. Getting ConnectionRefusedError: [Errno 111] Connection refused. Did you manage to make debugger work for Docker SSH?

  • @maximilianrausch5193
    @maximilianrausch5193 10 місяців тому

    Amazing video

  • @user-qm3lq4dv6g
    @user-qm3lq4dv6g 9 місяців тому

    Nice video. I want to know how to access tables of the glue catalog that belongs to the related aws account? if run spark.sql('show databases') in your script, will all databases of the online catalog be shown?

    • @DataEngUncomplicated
      @DataEngUncomplicated  9 місяців тому

      It should as long as your profile has the permission to see the databases

  • @guyfridman4426
    @guyfridman4426 9 місяців тому

    Thank you, any chance to do the same tutorial on Mac ?

  • @ahkamnaseek2850
    @ahkamnaseek2850 11 місяців тому

    Hi, can you please tell what’s your exact pycharm version please? Coz, docker is not working correctly with new pucharm version. I tried with 2023.1.4 and it worked

    • @DataEngUncomplicated
      @DataEngUncomplicated  11 місяців тому

      Sure! My version is 2023.1. strange, hopefully they fix the issue with the latest version. I wonder if anyone else is experiencing the same issue you have encountered?

  • @Dickandsongs
    @Dickandsongs 5 місяців тому

    Great tutorial. Would be even more useful if you would explain, how to add additional libraries to the run.

    • @Dickandsongs
      @Dickandsongs 5 місяців тому

      and aws configuration didn't worked...

  • @LearningNewThings0407
    @LearningNewThings0407 7 місяців тому

    The sample data used in the script should be present in my personal aws accounts s3 bucket ?

    • @DataEngUncomplicated
      @DataEngUncomplicated  7 місяців тому

      For testing locally or deployment to use on the AWS Glue Service? For testing locally you do not need it in your personal aws account s3 bucket if you are running docker locally.

    • @LearningNewThings0407
      @LearningNewThings0407 7 місяців тому

      @@DataEngUncomplicated I am trying to test Glue locally. I have docker running locally. I am not sure about the "Update Docker Container Settings". Why do we need to provide AWS credentials and why IAM permissions are required specifically for this testing ? My understanding is that these credentials and permissions are used to connect/use services on AWS but since we are running it locally, do we still need to provide AWS credentials? Also, say if I don't have an AWS account setup yet, does it mean I cannot run AWS Glue locally as well ?

    • @DataEngUncomplicated
      @DataEngUncomplicated  7 місяців тому

      Good question, so if you need to connect to data on an s3 bucket for testing then you need to pass in credentials. If not, then you don't need to pass in any profile and can skip this sense. It's not a requirement.

    • @LearningNewThings0407
      @LearningNewThings0407 7 місяців тому

      @@DataEngUncomplicated thank you so much for confirming this. So is the data file "memberships.json" used in this example located in the docker image running locally? In the code the path points to s3 location. Please let me know if this assumption is correct.

    • @DataEngUncomplicated
      @DataEngUncomplicated  7 місяців тому

      No, the member.json file is coming from s3 and I needed an iam role that had permission to access that s3 bucket which is why I had to pass the credential file into the docker image. The data is being moved from s3 into the docker container when I run the code. Hopefully this helps clarify things.

  • @ahkamnaseek2850
    @ahkamnaseek2850 10 місяців тому

    Did you try to install additional python packages to the image?? From the IDE it’s not allowing.

    • @DataEngUncomplicated
      @DataEngUncomplicated  10 місяців тому

      No I did not, you probably need to go into the docket image and install it that way vs through the UI. Have you tried that?

    • @ahkamnaseek2850
      @ahkamnaseek2850 10 місяців тому

      @@DataEngUncomplicated we can’t log inside the image directly right? Wht I did was I could be able to run the default image as container and logged in to it and installed the library and built the container as a new image. Then from pycharm, I pointed to it. Now the library is visible from pycharm but the import statement is failing while running the code. Idkw 😌

  • @ricardoroa5874
    @ricardoroa5874 Рік тому

    I dont have the AWS Connection window, I need to install something additional on pycharm?

    • @ricardoroa5874
      @ricardoroa5874 Рік тому

      I just installed AWS CLI and It work!, thanks, great tutorial!

    • @DataEngUncomplicated
      @DataEngUncomplicated  Рік тому +1

      Hi Ricardo, Sorry I must have missed that pre-requisite. Thanks for flagging this for others! I'm glad you got it working! It's going to make development much better

    • @gouravroy4573
      @gouravroy4573 3 місяці тому

      @@DataEngUncomplicated I am not getting AWS connection window even after installing aws cli. I am using pycharm professional edition.

  • @errrbrrr3821
    @errrbrrr3821 11 місяців тому

    please make also for vs code

    • @DataEngUncomplicated
      @DataEngUncomplicated  9 місяців тому

      I have just uploaded the video configuring aws glue with VS Code and docker. Thanks for the suggestion! ua-cam.com/video/__j-SyopVBs/v-deo.html

  • @brunoniello2019
    @brunoniello2019 10 місяців тому

    i use vs code :(

    • @DataEngUncomplicated
      @DataEngUncomplicated  10 місяців тому

      I'll make a video setting it up with vs code

    • @DataEngUncomplicated
      @DataEngUncomplicated  9 місяців тому

      I have just uploaded the video configuring aws glue with VS Code and docker. Thanks for the suggestion! ua-cam.com/video/__j-SyopVBs/v-deo.html