AWS Hands-On: ETL with Glue and Athena

Cumulus Cycles

688

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 6 лют 2025
In this video, I'll show you how to use AWS Glue to run an ETL Job against data in an S3 bucket, then save the transformed data in another S3 Bucket, and finally use AWS Athena to query the data.
WHO Data: covid19.who.in...

КОМЕНТАРІ • 46

@TellyMan200 Місяць тому ⁺¹
Thanks for the demo and clear explanation on how and what to do to succeed in the tasks
@lonewolfe5786 2 місяці тому ⁺²
Thank you so much for this very helpful material. keep creating AWS mini projects videos sir pleaseee
@ChimDashi 4 місяці тому ⁺¹
Great content. Clear, concise, and informative.
@lonewolfe5786 2 місяці тому ⁺¹
This whole process in this tutorial, can we call it data warehousing already?
@rockyrocks2049 Рік тому ⁺²
Greatly explained video, I tried to follow other videos ended up with errors, because most of the videos people don't explain what IAM role and permissions need to be created before jumping into crawler and glue job, but thanks a lot explaining everything from scratch. If you can just explain a little bit on what kind of situation we need to take care of VPC, Subnets, Internet, Routing before creating a glue job that would be really great, because on some videos I have seen people are setting it up, I don't know whether it's actually required or not. Also, please explain the custom policy creation, custom pyspark code to develop a SCD type 2 job, a static look up from a look up table to source table data mapping. Bcoz in Azure SCD type 2 job development is quite easy they have readily available transformations NotExist & Create Key kind of transformations. Thanks a lot lot lot @Cumulus.
@shaktiman-x7y 7 місяців тому
Thank you sir, pretty good demo and clear and effective explanation.
@cumuluscycles 6 місяців тому ⁺¹
Glad it was helpful!
@RicardoPorteladaSilva 10 місяців тому ⁺²
totally excellent! thank you!
@arunclementgurupatham-i7h 2 місяці тому
Great content
@nicknick65 9 місяців тому ⁺¹
brilliant: very well explained and easy to understand, thank you
@cumuluscycles 6 місяців тому
Glad it was helpful!
@heisenberg0121 8 місяців тому ⁺¹
Thank you!! It's help me clarify AWS Glue.
@dfelton316 Рік тому ⁺¹
What if there are multiple data sources? Are there separate databases for each source? Can multiple data sources be place into the same database?
@venkateshnekarakanti3268 2 місяці тому ⁺¹
Thank you sir
@khaledabouelella Рік тому ⁺¹
Excellent explanation, Thank you
@cumuluscycles Рік тому
Glad it was helpful!
@jgojiz 4 місяці тому
brilliant!
@mazharulboni5419 Рік тому ⁺¹
well explained. thank you
@fifthnail Рік тому ⁺¹
10:46 I had a similar issue, I followed what you were doing with compression type. I selected GZIP, everything zipped as GZIP, however, I tried unselecting with Compression Type "None" and it defaulted back to GZIP. My guess is that you were NOT using GZIP originally, THEN for your tutorial you started used GZIP, and then it defaulted back to "None". To resolve, I needed to delete the original DATA TARGET S3 Bucket, and setup the Target from scratch. My guess is the Script code was not updating for some reason when changing.
@cumuluscycles Рік тому ⁺¹
Thanks for this, I'll have to go and test it out!
@mejiger Рік тому ⁺¹
clean explanation; thanks
@cumuluscycles Рік тому
Glad it was helpful!
@rubulroy55 Рік тому
We want to use S3 in Glue then IAM rule shud hav been S3 service as IAM rule is used in Glue. Confused am I missing something 😕
@nagrotte 11 місяців тому
Great content🙏
@aabbassp 2 роки тому ⁺¹
Thanks for the video.
@cumuluscycles 2 роки тому
You're welcome!
@krishj8011 8 місяців тому ⁺¹
nice tutorial
@sgyakkala 6 місяців тому
@cumuluscycles Thanks for your video. I have followed your video and generated the output file. But, I see multiple partitioned output files are generated instead of generating a single output file. I want to generate a single output file only. I am totally clueless where the mistake is. Is there any config setting I am missing? Please help me.
@ARATHI2000 10 місяців тому
@Cumulus, Great tutorial. Thank you so much. In my case, noticed that the Schema generated is in Array form not individual column names. Columns are wrapped into an Array. Any thoughts? Thx again!
@cumuluscycles 10 місяців тому ⁺¹
I'm glad you found the video useful. I just ran through the process again and my schema was generated with Cols, so I'm really not sure why yours was in Array form. Maybe someone else will comment if they experience the same.
@lonewolfe5786 2 місяці тому
i experienced the same, i just missed to remove the opening and closing array bracket from the json file as shown in the video.
@rockyrocks2049 Рік тому
Also @Cumulus, while creating a job for the prod env, what are the requisites we need to take care of in terms of job, policy and crawler please explain that as well. I mean policy now we have added Power user, but in prod I think we need to narrow down our accesses. Please explain that if possible...Thanks once again.
@mackshonayi943 2 роки тому ⁺¹
Great tutorial thank you so much
@cumuluscycles 2 роки тому
Thanks for the comment. I'm glad it was helpful!
@sags3112 2 роки тому
awesome video... great one
@cumuluscycles 2 роки тому
Thank you 👍
@ulhaqz Рік тому
Hi ! Great Video.
Can you please help me with the following:
I am stuck at 7:28 where you create a Job. For output I am selecting an empty S3 bucket, similar to you. But I am prompted to pick an object. I have tried uploading a CSV and TXT File but they are not recognized as objects. And I get an error and cannot proceed any further. Thanks a lot !
@cumuluscycles Рік тому ⁺¹
Hmmm... That's odd, since you're specifying an output bucket - you shouldn't need to specify an object in the bucket. The only thing I can think of is, when specifying the path to some buckets, I've had to add a slash at the end of the bucket name. I know I didn't have to do that in the video, but it may be worth a try. If you figure it out, can you post here in the event others run into this
@ulhaqz Рік тому ⁺¹
@@cumuluscycles Thanks for the reply. What worked for me was to create a folder in the bucket, and select it ... And there is a new GUI in place too, though I switched to the old one to match instructions in the video.
@AvaneeshThakurRana 2 роки тому
Thank you for this video. Will I also be able to use Glue to run an ETL job for data in Aws RDS and then save the data in S3 and use Athena to query?
@cumuluscycles 2 роки тому
Hi. You should be able to get data from RDS using a Glue Connection. Give this a read: docs.aws.amazon.com/glue/latest/dg/connection-properties.html
@MrDottyrock Рік тому
@@cumuluscycles can you connect to on prem database to run etl outside AWS?
@cumuluscycles Рік тому
@@MrDottyrock Give the following a read and see if it helps: aws.amazon.com/blogs/big-data/how-to-access-and-analyze-on-premises-data-stores-using-aws-glue/
@suryatejasingasani256 Рік тому
Hii bro i have a doubt i have a datastage job converted into XML file i want to convert the XML file into glue job how can I do
@cumuluscycles Рік тому
Hi. I haven’t done this before, but this info may help you: docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-format-xml-home.html
@AliTwaij Рік тому
excllent thankyou

Наступне

Автоматичне відтворення

AWS Hands-On: Build a real-time Streaming App with Amazon Kinesis