14. explode(), split(), array() & array_contains() functions in PySpark |

WafaStudies

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 13 жов 2024
In this video, I explained about explode() , split(), array() & array_contains() functions usages with ArrayType column in PySpark.
Link for PySpark Playlist:
• 1. What is PySpark?
Link for PySpark Real Time Scenarios Playlist:
• 1. Remove double quote...
Link for Azure Synapse Analytics Playlist:
• 1. Introduction to Azu...
Link to Azure Synapse Real Time scenarios Playlist:
• Azure Synapse Analytic...
Link for Azure Data bricks Play list:
• 1. Introduction to Az...
Link for Azure Functions Play list:
• 1. Introduction to Azu...
Link for Azure Basics Play list:
• 1. What is Azure and C...
Link for Azure Data factory Play list:
• 1. Introduction to Azu...
Link for Azure Data Factory Real time Scenarios
• 1. Handle Error Rows i...
Link for Azure Logic Apps playlist
• 1. Introduction to Azu...
#PySpark #Spark #databricks #azuresynapse #synapse #notebook #azuredatabricks #PySparkcode #dataframe #WafaStudies #maheer #azure

КОМЕНТАРІ • 31

@raghunathpanse3258 Рік тому ⁺⁴
this is worthy to watch.... The speed I picked up after following you is unbelievable. thank you soo muchh for this amazing content and no doubt your explanation is finest ever I have seen.
@WafaStudies Рік тому
Thank you for your kind words ☺️
@Aelmasri-ht5sv Рік тому
Thank you Maheer. you are doing a very gentle work. have you prepared the tips of this videos i means slides or whatever?
@deepjyotimitra1340 Рік тому ⁺⁴
You are doing an amazing job brother. Keep it up. Thanks for all your contributions to data engineering tutorials.
@WafaStudies Рік тому
Thank you ☺️
@tarigopulaayyappa Рік тому ⁺¹
@@WafaStudies brother , can you try to upload the videos quickly as much as you can if you don't mind?
@WafaStudies Рік тому
@@tarigopulaayyappa will try to do more fastly 😇
@tarigopulaayyappa Рік тому
@@WafaStudies Thank you very much.
@VivekKBangaru Рік тому ⁺²
Awesome Video this is i can thoroughly understand it.
@WafaStudies Рік тому
Thank you 😊
@phanidivi3613 Рік тому
Thanks a lot for sharing maheer. Can we create any trail account for practice. As of now Microsoft not provide community free trail subscription I think
@shreyaspatil4861 8 місяців тому
Thanks very much for the tutorial :) , I have a query regarding reading in json files.
so i have an array of structs where each struct has a different structure/schema.
And based on a certain property value of struct I apply filter to get that nested struct , however when I display using printschema it contains fields that do not belong to that object but are somehow being associated with the object from the schema of other structs , how can i possibly fix this issue ?
@Sundar_Tenkasi Місяць тому
Good content
@polakigowtam183 Рік тому ⁺¹
Good Vedio.
Thanks Maheer.
@WafaStudies Рік тому
Welcome 🤗
@yosaki-fv9yy 9 місяців тому
When you used array() ... What if the number of skills is different between each data?
@RakeshGandu-wb7eu Рік тому
Nice video how can we remove duplicates from array column
@sahilgarg7383 7 місяців тому
in case of split, what will happen if we give delimiter as | instead of ,
@julianalilian Рік тому
@WafaStudies
Are there any other ways to explode the array without the explode command?
I ask because I made a script with the explode command, but the performance is really bad and I'm looking for another way to do this.
Thank you!
@tarun007 Рік тому ⁺¹
Thank You Wafa..😁😊
@WafaStudies Рік тому
Welcome 🤗
@vasanthasworld2948 Рік тому
Please drop that notebook details in description..so that it will be easy for us to refer...or u can share at git hub repository
@DataWithNagar Рік тому
explained about explode() , split(), array() & array_contains() functions usages with ArrayType column in PySpark.
----------------------------------------
data = [(1,'Maheer',['dotnet','azure']),(2,'Wafa',['java','aws'])]
schema = ['id', 'name', 'skills']
df = spark.createDataFrame(data=data,schema=schema)
df.display()
df.printSchema()
-----
#explode()
from pyspark.sql.functions import explode,col
df.show()
df1 = df.withColumn('skill',explode(col='skills'))
df1.show()
-------------------------------------------
data = [(1,'Maheer','dotnet,azure'),(2,'Wafa','java,aws')]
schema = ['id', 'name', 'skills']
df = spark.createDataFrame(data=data,schema=schema)
df.display()
df.printSchema()
-----
#split()
from pyspark.sql.functions import split,col
df.show()
df1 = df.withColumn('skills_array',split('skills',','))
df1.show()
--------------------------------------------
data = [(1,'Maheer','dotnet','azure'),(2,'Wafa','java','aws')]
schema = ['id', 'name', 'primaryskill', 'secondaryskill']
df = spark.createDataFrame(data=data,schema=schema)
df.display()
df.printSchema()
------
#array()
from pyspark.sql.functions import array,col
df.show()
df1 = df.withColumn('skillsArray',array(col('primarySkill'),col('secondarySkill')))
df1.show()
---------------------------------------------
data = [(1,'Maheer',['dotnet','azure']),(2,'Wafa',['java','aws'])]
schema = ['id', 'name', 'skills']
df = spark.createDataFrame(data=data,schema=schema)
df.display()
df.printSchema()
------
from pyspark.sql.functions import array_contains,col
df.show()
df1 = df.withColumn('HasJavaSkill',array_contains('skills',value='java'))
df1.show()
-------------------------------------------------
@mohitpande2006 Рік тому
sir how can we explode more than 2 columns or more like 150
@SonuKumar-fn1gn Рік тому ⁺¹
Thank you ❤️
@WafaStudies Рік тому
Welcome 🤗
@deepakk8758 Рік тому ⁺¹
thanks Sir
@WafaStudies Рік тому
Welcome
@abhishekstatus_7 Рік тому
For me I am not sure why it was not working I changed the script then i got skills and skill both the columns from pyspark.sql.functions import explode, col
# Sample data
data = [(1, 'abhishek', ['dotnet', 'azure']), (2, 'abhi', ['java', 'aws'])]
schema = ['id', 'name', 'skills']
# Create DataFrame
df = spark.createDataFrame(data, schema)
df.show()
# Apply explode function on the "skills" column and rename the exploded column
df1 = df.withColumn('skill', explode(col('skills'))).select('id', 'name', 'skills', 'skill')
df1.show()
@vutv5742 7 місяців тому
Completed
@maskally6398 Місяць тому
0:48 eh tusi soap paya pani ch,, sahi tarah dasso , confusion ho rhi hai bahut jada

Наступне

Автоматичне відтворення

15. MapType Column in PySpark | #azuredatabricks #Spark #PySpark #Azure