AWS Tutorials - AWS Glue Job Optimization Part-2

AWS Tutorials - Using Job Bookmarks in AWS Glue Jobs

AWS Tutorials - Partition Data in S3 using AWS Glue Job

Сестра обхитрила!

"ХИТРЕЦ": Трамп РОЗЛЮТИВ Скабєєву / Оля ЛИЄ ЯДОМ #shorts

Кирилл Набутов. Арестович в Кремле, кто взорвал командующего в Москве, война России с НАТО

AWS Tutorials - AWS Glue Job Optimization Part-1

AWS Tutorials

Переглядів 15 513

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 26 січ 2025

КОМЕНТАРІ • 29

@arunr2265 2 роки тому ⁺³
Nice video. But one question is if we use the filter condition while loading the data, won't spark's catalyst optimizer pushes down the filter and reads less rows. is the predicate syntax used only on glue, can apache spark too use it
@AWSTutorialsOnline 2 роки тому
It is Apache Spark level thing. Purpose is to scan and load only that much data what is required. Please check this link - jaceklaskowski.gitbooks.io/mastering-spark-sql/content/spark-sql-Optimizer-PushDownPredicate.html
@cheluvesha 2 роки тому
Here are a few reasons why you may want to explicitly specify pushdown predicates in AWS Glue:
Improved performance: By explicitly specifying pushdown predicates in AWS Glue, you can control which filtering conditions are pushed down to the storage layer. This can result in improved query performance, as the storage layer will only return the data that is actually needed for the query.
Fine-grained control: By explicitly specifying pushdown predicates in AWS Glue, you can have fine-grained control over the filtering conditions that are pushed down to the storage layer. This can be useful when you want to optimize a specific query or set of queries.
Troubleshooting: By explicitly specifying pushdown predicates in AWS Glue, you can make it easier to troubleshoot performance issues. For example, if a query is not performing as expected, you can check whether the pushdown predicates are properly specified and being used effectively.
In conclusion, while Apache Spark's Catalyst Optimizer can automatically push down filtering conditions to the storage layer, explicitly specifying pushdown predicates in AWS Glue can provide additional benefits in terms of performance, control, and troubleshooting.
@worldupdates. Рік тому ⁺²
Keep it up sir.
@AWSTutorialsOnline Рік тому
Thank you, I will
@abeeya13 7 місяців тому
Can this be used to read few columns from s3 bucket?
Рік тому ⁺¹
Hi , one question by other point , is posible create a table in glue catalog from a glue job ? with a s3 source target data ?, one condition is what table exist in the glue catalog , but exist other way for create it dynamically ?
@AWSTutorialsOnline Рік тому ⁺¹
Technically possible. You can use python boto3 aws sdk in Glue job to check existence of a table. if it does not exist, you simply start the crawler to create table catalog.
@abhijeetpatil-k5y Рік тому ⁺¹
sir which iam role give you for aws glue also for jupyter notebook
@AWSTutorialsOnline Рік тому
This link might help - docs.aws.amazon.com/glue/latest/dg/create-an-iam-role-notebook.html
@Books_Stories_Poetry Рік тому ⁺²
spark follws lazy evaluation and when it will prepare plan to fetch data it will take it into consideration . push down predicate does not make any sense
@abhishek822 Рік тому ⁺¹
Does push down predicate work for JDBC source ?
@AWSTutorialsOnline Рік тому
No, it is designed for S3 bucket based partition or HIVE metadata only
@VishalSharma-hv6ks 2 роки тому ⁺²
Very Informative..
@AWSTutorialsOnline 2 роки тому
Thanks a lot
@mesaquestbsb 2 роки тому ⁺¹
Hey, very nice video! I have some questions. Does this approach applies to delta tables? I work with a table with more than 150 millions of lines that I need weekly delete almost 50 million of lines and load the new data? What is your suggestion for deletion considering I use partitioning?
@AWSTutorialsOnline 2 роки тому
Pushdown is designed for partition data. You should try Delta Lake framework in AWS Glue for your purpose. Details - docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-format-delta-lake.html
@mohdshoeb5101 2 роки тому ⁺¹
Can you please tell me how to optimize reading a large amount of files every hour.because I have 10- 12 glue jobs and every job is reading 10 tables in every run.
@AWSTutorialsOnline 2 роки тому
Please check part-5 where I explained batching. It might help. Please let me know.
@ericguymon5418 2 роки тому ⁺¹
What method in either LakeFormation or Glue would recommend for daily ingestion of a database with say 15 tables, 5 of which could add 1 million rows daily?
@AWSTutorialsOnline 2 роки тому
You can use AWS Glue job for it. Please check this link - aws-dojo.com/workshoplists/workshoplist33/
@PakIslam2012 2 роки тому
Isnt glueContext.create_dynamic_frame also lazily evaluated like Spark DataFrame? Which would mean it should not end up loading the entire 410 records in the initial code?
@AWSTutorialsOnline 2 роки тому ⁺²
You are right. It is lazy load. It is more about in memory processing size. do you want to process 410 record in memory to filter down to 160 records or do you want to processing 160 records right from the beginning.
@Ady_Sr Рік тому ⁺¹
your are right, I think it would never load up 410 records at first place, due to lazy evaluation it will only pick filtered records. Not all , even if the filter is in 2nd statement.
@Abubakar91718 2 роки тому ⁺³
Appreciates
@AWSTutorialsOnline 2 роки тому
Thanks
@sanooosai 10 місяців тому ⁺¹
thank you sir
@merryphillips3915 2 роки тому
☝️ Promo-SM!!
@AWSTutorialsOnline 2 роки тому
??

Наступне

Автоматичне відтворення

AWS Tutorials - AWS Glue Job Optimization Part-2

AWS Tutorials - AWS Glue Job Optimization Part-2

AWS Tutorials - Using Job Bookmarks in AWS Glue Jobs

AWS Tutorials - Using Job Bookmarks in AWS Glue Jobs

AWS Tutorials - Partition Data in S3 using AWS Glue Job

AWS Tutorials - Partition Data in S3 using AWS Glue Job

Сестра обхитрила!

Сестра обхитрила!

"ХИТРЕЦ": Трамп РОЗЛЮТИВ Скабєєву / Оля ЛИЄ ЯДОМ #shorts

"ХИТРЕЦ": Трамп РОЗЛЮТИВ Скабєєву / Оля ЛИЄ ЯДОМ #shorts

Кирилл Набутов. Арестович в Кремле, кто взорвал командующего в Москве, война России с НАТО

Кирилл Набутов. Арестович в Кремле, кто взорвал командующего в Москве, война России с НАТО

НА ЦЕ можна дивитись ВІЧНО! Такої ПАЛКОЇ зустрічі НІХТО НЕ ЧЕКАВ

НА ЦЕ можна дивитись ВІЧНО! Такої ПАЛКОЇ зустрічі НІХТО НЕ ЧЕКАВ

AWS Tutorials - AWS Glue Job Optimization Part-4

AWS Tutorials - AWS Glue Job Optimization Part-4

AWS Tutorials - Data Quality Check using AWS Glue DataBrew

AWS Tutorials - Data Quality Check using AWS Glue DataBrew

AWS Glue & Avoiding Rookie Mistakes

AWS Glue & Avoiding Rookie Mistakes

AWS Tutorials - AWS Glue Pipeline to Ingest Multiple SQL Tables

AWS Tutorials - AWS Glue Pipeline to Ingest Multiple SQL Tables

Tracking Processed Data Using AWS Glue Job Bookmarks | Incremental ETL In-depth intuition

Tracking Processed Data Using AWS Glue Job Bookmarks | Incremental ETL In-depth intuition

AWS Tutorials - Data Quality Check in AWS Glue ETL Pipeline

AWS Tutorials - Data Quality Check in AWS Glue ETL Pipeline

AWS Tutorials - AWS Glue Job Optimization - Flexible Job Execution

AWS Tutorials - AWS Glue Job Optimization - Flexible Job Execution

AWS Tutorials - AWS Glue Job Optimization Part-3

AWS Tutorials - AWS Glue Job Optimization Part-3

AWS Hands-On: ETL with Glue and Athena

AWS Hands-On: ETL with Glue and Athena

Что будет если украсть в магазине шоколадку 🍫

Что будет если украсть в магазине шоколадку 🍫

Cool Items!🥰 New Gadgets, Smart Appliances, Kitchen Tools Utensils, Home Cleaning, Beauty #shorts

Cool Items!🥰 New Gadgets, Smart Appliances, Kitchen Tools Utensils, Home Cleaning, Beauty #shorts

до конца, там самая счастливая табалапка🐾🐾 #тикток #табалапка

до конца, там самая счастливая табалапка🐾🐾 #тикток #табалапка

Рождение Немецкой Легенды - Mercedes 190E 2.3-16

Рождение Немецкой Легенды - Mercedes 190E 2.3-16

ふわふわシフォン大作戦🩷スイーツ戦隊のキラキラミッション✨【銀座コージーコーナー】 #shorts #シフォンケーキ #クリスマスケーキ #クリスマス #ケーキ #チョコケーキ #christmas

ふわふわシフォン大作戦🩷スイーツ戦隊のキラキラミッション✨【銀座コージーコーナー】 #shorts #シフォンケーキ #クリスマスケーキ #クリスマス #ケーキ #チョコケーキ #christmas

1% vs 100% #beatbox #tiktok

1% vs 100% #beatbox #tiktok

THE AMAZING DIGITAL CIRCUS - Ep 4: Fast Food Masquerade

THE AMAZING DIGITAL CIRCUS - Ep 4: Fast Food Masquerade

Уличный боец с ДУХОМ воина

Уличный боец с ДУХОМ воина