AWS Glue PySpark: Unpivot Columns To Rows

AWS Tutorials - AWS Glue Handling Nested Data

AWS Glue PySpark: Calculate Fields

Не змогли приїхати на похорон сина-військового, бо опинились в окупації #shorts #україна

Эффект Карбонаро и нестандартная коробка

А ты заметил? 🔍 #тнт #shorts #юмор #камеди #харламов #стасмихайлов #демискарибидис #пашаволя #клип

AWS Glue PySpark: Flatten Nested Schema (JSON)

DataEng Uncomplicated

Переглядів 13 361

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 6 лис 2022
This is a technical tutorial on how to flatten or unnest JSON arrays into columns in AWS Glue with Pyspark. This video will walk through how to use the relationalize transform and how to join the dynamic frames together for further analysis or writing to another location.
The Script and Example: github.com/AdrianoNicolucci/d...
Sample data: github.com/AdrianoNicolucci/d...
#aws #awsglue

КОМЕНТАРІ • 26

@oggyoggyoggyy Рік тому ⁺²
This is a truly amazon channel helping people to understand and learn more about ETL and cloud computing.
Thanks so much!
@DataEngUncomplicated Рік тому ⁺¹
Wow! thanks for the super thanks James! I don't get many of these being a small channel so it is very much appreciated!
@oggyoggyoggyy Рік тому
@@DataEngUncomplicated Don't know if you can include paypal link, so that you wouldn't have to pay the commissions to UA-cam.
@smrtysam Рік тому ⁺¹
Really good tutorial.
@DataEngUncomplicated Рік тому ⁺¹
Thanks! I'm glad it was helpful.
@Streampax Рік тому
Hi, Can we position or change the column order when transforming a json file while loading the metadata
@Learn_IT_with_Azizul Рік тому ⁺¹
Great 👍
@DataEngUncomplicated Рік тому
Thank you! Cheers!
@najwanabdulkareem4015 Рік тому
Hi ... this is great video, have a question: what happen if we dont have a common column to join the different dataset on? is there any work around it?
@DataEngUncomplicated Рік тому
Hi Najwan, If you are using this method to unflatten there has to be a common field if I recall since it would create a one to many relationship. Unless you are asking about flattening a json with a different method?
@saad1732 Рік тому
Amazing, this kind of scenarios are presently more often than not as a data engineer.
Can we run python code within the same interactive session notebook?
So for instance, we can run python code to pull data from API (using requests) or SQS destination queue in json, then relationalize with pyspark code?
@DataEngUncomplicated Рік тому
Thanks! You could do that if your source is an SQS queue, but using pyspark is a complete over kill to process small datasets. I would just run your python code on a lambda function
@offersononlineshopping7872 7 місяців тому
I don't know the exact schema for the input table, would you have any dynamic way of approach for same scenario instead of hard coding the column names
@DataEngUncomplicated 2 місяці тому
Hi, sorry for the late reply, well you can create a python function to retrieve the column names from your dataframe. Once you have them, you can dynamically pass them to this funciton
@claytonvanderhaar3772 Рік тому ⁺¹
Hi your channel is really awesome and helpful, was wondering if it is possible to join on 2 different json files stored in separate s3 buckets
@andrewwatson6473 Рік тому ⁺¹
I believe this shouldn’t be an issue. As long as you have the prerequisite permissions in place for both s3 buckets, I think you can just append the s3 URI to the paths array in connection_options
@claytonvanderhaar3772 Рік тому
@@andrewwatson6473 Great thanks man for the help I appreciate it
@DataEngUncomplicated Рік тому ⁺¹
Hi Clayton thanks for your feedback! Yes this is totally possible! Once you read the data into a separate dataframes, you can use the join transform to join them.
@claytonvanderhaar3772 Рік тому
@@DataEngUncomplicated Hi thanks another issue I am having if I have to join data on a attribute that comes in multiple time with a different timestamp
@DataEngUncomplicated Рік тому ⁺¹
@@claytonvanderhaar3772 Hi Clayton, this sounds like a data problem. I'm not sure if you want it to join on this timestamp or not but you could change your timestamp to a date from datetime.
@meghanayerramsetti8394 6 місяців тому
Hi I have a complete nested json file while I am running crawler on it , i am getting only one schema wtih column name array and data type array and in that array data type the column name and datatype are present is that correct
@DataEngUncomplicated 5 місяців тому
Hi, do you want to elaborate on your problem? I don't understand the question.
@shrikantpandey6401 Рік тому
Hi please provide the dataset you used, it will be great
@DataEngUncomplicated Рік тому
Sure Shrikant! I'll upload it to my GitHub repo in the morning which you can find in the description of the video.
@DataEngUncomplicated Рік тому
Please see link for sample dataset: github.com/AdrianoNicolucci/dataenguncomplicated/blob/main/aws_glue/sample_data/customer_orders_with_addresses.json
@shrikantpandey6401 Рік тому
@@DataEngUncomplicated Thanks a lot for providing the dataset :)

Наступне

Автоматичне відтворення

AWS Glue PySpark: Unpivot Columns To Rows

AWS Glue PySpark: Unpivot Columns To Rows

AWS Tutorials - AWS Glue Handling Nested Data

AWS Tutorials - AWS Glue Handling Nested Data

AWS Glue PySpark: Calculate Fields

AWS Glue PySpark: Calculate Fields

Не змогли приїхати на похорон сина-військового, бо опинились в окупації #shorts #україна

Не змогли приїхати на похорон сина-військового, бо опинились в окупації #shorts #україна

Эффект Карбонаро и нестандартная коробка

Эффект Карбонаро и нестандартная коробка

А ты заметил? 🔍 #тнт #shorts #юмор #камеди #харламов #стасмихайлов #демискарибидис #пашаволя #клип

А ты заметил? 🔍 #тнт #shorts #юмор #камеди #харламов #стасмихайлов #демискарибидис #пашаволя #клип

меня не было 9 дней

меня не было 9 дней

AWS Glue Tutorial for Beginners| Learn everything about Glue in 30 mins| Glue Data Catalog| Glue ETL

AWS Glue Tutorial for Beginners| Learn everything about Glue in 30 mins| Glue Data Catalog| Glue ETL

AWS Glue ETL Job | How to create Glue ETL Job using PySpark | Transform S3 Data using Glue PySpark

AWS Glue ETL Job | How to create Glue ETL Job using PySpark | Transform S3 Data using Glue PySpark

Flattening a JSON Object Using Recursion in Python

Flattening a JSON Object Using Recursion in Python

Top AWS Services A Data Engineer Should Know

Top AWS Services A Data Engineer Should Know

ETL | Incremental Data Load from Amazon S3 Bucket to Amazon Redshift Using AWS Glue | Datawarehouse

ETL | Incremental Data Load from Amazon S3 Bucket to Amazon Redshift Using AWS Glue | Datawarehouse

AWS Tutorials - Flat nested data with “Flatten” Transform in AWS Glue Studio

AWS Tutorials - Flat nested data with “Flatten” Transform in AWS Glue Studio

AWS Tutorials - Data Quality Check using AWS Glue DataBrew

AWS Tutorials - Data Quality Check using AWS Glue DataBrew

PARSING EXTREMELY NESTED JSON: USING PYTHON | RECURSION

PARSING EXTREMELY NESTED JSON: USING PYTHON | RECURSION

ETL | AWS Glue | AWS S3 | Data Cleansing | Transforming data with AWS Glue in ETL workflows

ETL | AWS Glue | AWS S3 | Data Cleansing | Transforming data with AWS Glue in ETL workflows

Эта глина - наше спасение 🤗 #полимернаяглина #королевакружек #творчество #ручнаяработа

Эта глина - наше спасение 🤗 #полимернаяглина #королевакружек #творчество #ручнаяработа

Сохраняем дыню на зиму без уксуса и стерилизации

Сохраняем дыню на зиму без уксуса и стерилизации

ШОУ Я: Егор Крид, Иван Золо, Егорик, Даник, Янгер #2

ШОУ Я: Егор Крид, Иван Золо, Егорик, Даник, Янгер #2

От первого лица: Школа 6 🤩 НОЧЬ с ДЕВУШКОЙ ❤️ СЛОМАЛИ КАРЬЕРУ ДИРЕКТОРА 😭 ФИНАЛ ГЛАЗАМИ ШКОЛЬНИКА

От первого лица: Школа 6 🤩 НОЧЬ с ДЕВУШКОЙ ❤️ СЛОМАЛИ КАРЬЕРУ ДИРЕКТОРА 😭 ФИНАЛ ГЛАЗАМИ ШКОЛЬНИКА

ХЕРЕЙД И НАТАХА УГАДЫВАЮТ МОИ МЫСЛИ!

ХЕРЕЙД И НАТАХА УГАДЫВАЮТ МОИ МЫСЛИ!

УКРАЇНА ПЕРЕТВОРЮЄТЬСЯ НА АФРИКУ? Наслідки ПЕКЕЛЬНОЇ СПЕКИ в часи БЛЕКАУТІВ

УКРАЇНА ПЕРЕТВОРЮЄТЬСЯ НА АФРИКУ? Наслідки ПЕКЕЛЬНОЇ СПЕКИ в часи БЛЕКАУТІВ

Удар по "Охматдиту". Мама і донька злякалися, що втратили одна одну #shorts #Охматдит #Київ

Удар по "Охматдиту". Мама і донька злякалися, що втратили одна одну #shorts #Охматдит #Київ

Відомі ІМЕНА ПРИЧЕТНИХ ДО УДАРУ ПО ОХМАТДИТУ 08.07.2024

Відомі ІМЕНА ПРИЧЕТНИХ ДО УДАРУ ПО ОХМАТДИТУ 08.07.2024