30
69 069

Handle or Fill Null Values Using PySpark Dynamically | Real Time Scenario | #pyspark #dataengineers

21:25

ADF UNTIL ACTIVITY || REAL TIME SCENARIO || VERIFY COUNT OF RECORDS || #azuredatafactory

30:13

Lakehouse Arch || DWH v/s DATALAKE v/s DELTALAKE || #dataengineering #databricks

24:17

ADF COPYDATACTIVITY || Copy Behavior || Quote & Escape Characters || Hands On || #dataengineering

31:46

Data Validation with Pyspark || Rename columns Dynamically ||Real Time Scenario

23:17

Pyspark with YAML File || Part-3 || Real Time Scenario || #pyspark #python #interviewquestions

32:44

Rename Columns in PySpark || Quick Tips for Renaming Columns in PySpark DataFrames || #pyspark

In this Video we discussed about Rename the column spaces in the df using pyspark dynamically , in order to create the Delta Tables. Column names with spaces are not allowed while creating Delta Tables
Source Link ::
drive.google.com/file/d/1VfTu9TAE_wkyVa35iw0f95nXNB-Mppfi/view?usp=sharing
#PySpark
#BigData
#DataScience
#Python
#DataEngineering
#ApacheSpark
#MachineLearning
#DataAnalytics
#Coding
#Programming
#TechTutorial
#DataTransformation
#ETL
#DataProcessing
#CodeWithMe

Відео

Handle or Fill Null Values Using PySpark Dynamically | Real Time Scenario | #pyspark #dataengineers

21:25

Handle or Fill Null Values Using PySpark Dynamically | Real Time Scenario | #pyspark #dataengineers

Переглядів 3927 місяців тому

In this video, we dive into the essential techniques for handling and filling null values dynamically using PySpark. In this tutorial, you will learn: How to identify and handle null values in PySpark DataFrames. Techniques to dynamically fill null values based on various conditions. Practical examples and step-by-step code demonstrations. Notebook link ::: drive.google.com/file/d/1oHJTDblzt2fi...

ADF UNTIL ACTIVITY || REAL TIME SCENARIO || VERIFY COUNT OF RECORDS || #azuredatafactory

30:13

ADF UNTIL ACTIVITY || REAL TIME SCENARIO || VERIFY COUNT OF RECORDS || #azuredatafactory

Переглядів 1747 місяців тому

In this video we will understand the Privilege's of Using Until Activity in ADF CREATE TABLE dev.TableMetadata ( TableName NVARCHAR(128), LastRowCount INT ); Initialize the metadata for your table if not already done IF NOT EXISTS (SELECT 1 FROM dev.TableMetadata WHERE TableName = 'YourTableName') BEGIN INSERT INTO dev.TableMetadata (TableName, LastRowCount) VALUES ('YourTableName', 0); END CRE...

Lakehouse Arch || DWH v/s DATALAKE v/s DELTALAKE || #dataengineering #databricks

24:17

Lakehouse Arch || DWH v/s DATALAKE v/s DELTALAKE || #dataengineering #databricks

Переглядів 2628 місяців тому

In this video we discussed about Difference between DWH v/s DATALAKE v/s DELTALAKE . In the next part we will see practical Drawbacks of DATALAKE Link for notes:: drive.google.com/file/d/10gbSmYnNUThWYHWCIZR9vWR1pJiVm14x/view?usp=sharing #dataanalytics #azuredataengineer #databricks #pyspark #datawarehouse #datalake #sql

ADF COPYDATACTIVITY || Copy Behavior || Quote & Escape Characters || Hands On || #dataengineering

31:46

ADF COPYDATACTIVITY || Copy Behavior || Quote & Escape Characters || Hands On || #dataengineering

Переглядів 2878 місяців тому

In this Video we will discuss about the Quote char, Escape Char and Copy Behavior in ADF Copy DataActivity considering source as a CSV file. Note :: If u leave the Copy Behavior as empty by default it takes "Preserve Hierarchy" Notes Link :: drive.google.com/file/d/1yVsU1HsdShe2On21LBKyurR4JfPOWDO9/view?usp=sharing #dataengineering #azuredataengineer #azuredatabricks #pyspark #database #databricks

Data Validation with Pyspark || Rename columns Dynamically ||Real Time Scenario

23:17

Data Validation with Pyspark || Rename columns Dynamically ||Real Time Scenario

Переглядів 3309 місяців тому

In this video explained about how we can rename the columns for the selected columns from source file dynamically with pyspark. To execute this dynamically we used Metadata files Important Links : Meta columns : drive.google.com/file/d/1EWxcWNpG52rznjK2MnRo9jUGQpfpnxyl/view?usp=sharing MetaFiles : drive.google.com/file/d/1szbTXZuDxYk2Hk6kk_VttEoetZBdQj4E/view?usp=sharing Source Files 4 wheelers...

Pyspark with YAML File || Part-3 || Real Time Scenario || #pyspark #python #interviewquestions

32:44

Pyspark with YAML File || Part-3 || Real Time Scenario || #pyspark #python #interviewquestions

Переглядів 26610 місяців тому

In this video will see how to read sql server tables from YAML file and create a pyspark df on top of that part2 link: ua-cam.com/video/aQlazXrjgrU/v-deo.html part1 link : ua-cam.com/video/ujoF2Wd_2T0/v-deo.htmlsi=kV48HUg88exWVJY2 Playlist link: ua-cam.com/play/PLWhMEKuFLBt8Kt-Y2DOeTzFtAOxzQdwTe.html #pyspark #dataengineering #pythonprogramming #sql #spark #databricks #dataanalytics

Pyspark with YAML File || Part-2 || Real Time Scenario || #pyspark #python #interviewquestions

15:30

Pyspark with YAML File || Part-2 || Real Time Scenario || #pyspark #python #interviewquestions

Переглядів 29911 місяців тому

In this video will see the issue or error for the part-1 and who to read csv sources from YAML file part1 link : ua-cam.com/video/ujoF2Wd_2T0/v-deo.htmlsi=kV48HUg88exWVJY2 Playlist link: ua-cam.com/play/PLWhMEKuFLBt8Kt-Y2DOeTzFtAOxzQdwTe.html #pyspark #dataengineering #pythonprogramming #sql #spark #databricks #dataanalytics

Pyspark with YAML file || Part-1 || Pyspark Real Time Scenario || #pyspark

33:32

Pyspark with YAML file || Part-1 || Pyspark Real Time Scenario || #pyspark

Переглядів 89911 місяців тому

In this video , we will consider YAML file as a Config file, Read Sources mentioned in the YAML file to load the data playlist link : ua-cam.com/play/PLWhMEKuFLBt8Kt-Y2DOeTzFtAOxzQdwTe.html #pyspark #databricks #dataanalytics #spark #interviewquestions #pythonprogramming #dataengineering #databricks #yaml

Data Insertion in DimDate || Using SQL || Real Time Scenario

32:12

Data Insertion in DimDate || Using SQL || Real Time Scenario

Переглядів 26311 місяців тому

In this Video we will discus about Basic Date functions and how to use those to insert data into DimDate in the DataWareHouse Model recursively .Once after 90days completed again re-run the script code link: drive.google.com/file/d/1gyIQMOtVjHNTqwzMdT0jMj5yJSQWOv_c/view?usp=sharing #pyspark #sqlserver #dataengineering #dataanalytics #datawarehouse #sql

Data Validation using pyspark || Handle Unexpected Records || Real Time Scenario ||

22:54

Data Validation using pyspark || Handle Unexpected Records || Real Time Scenario ||

Переглядів 47111 місяців тому

This video tells you about , how we can handle the unexpected records using pyspark on top the source data frame playlist link: ua-cam.com/play/PLWhMEKuFLBt8Kt-Y2DOeTzFtAOxzQdwTe.html&si=tg-Du5LOsXUe8-Ju code : drive.google.com/file/d/1Z r3KePT0uI_WpvKJSKN0GGK8vWZdq5/view?usp=sharing #pyspark #databricks #dataanalytics #data #dataengineering

Data Validations using Pyspark || Filtering Duplicate Records || Real Time Scenarios

29:49

Data Validations using Pyspark || Filtering Duplicate Records || Real Time Scenarios

Переглядів 50911 місяців тому

This video tells about How we can filter out or handle the duplicate records using Pyspark in a Dynamic Way... #azuredatabricks #dataengineering #dataanalysis #pyspark #pythonprogramming #dataengineering #dataanalysis #pyspark #python #sql Playlist Link: ua-cam.com/play/PLWhMEKuFLBt8Kt-Y2DOeTzFtAOxzQdwTe.html&si=oRPexgefXxT0R8Y7

Data Validation Using Pyspark || ColumnPositionComparision ||

24:18

Data Validation Using Pyspark || ColumnPositionComparision ||

Переглядів 65011 місяців тому

How we can develop a function or script using Pyspark to compare ColumnPosition while dumping data into raw layer from stage or source layer. #pyspark #databricks #dataanalytics #spark #interviewquestions #pythonprogramming #dataengineering linkedin : www.linkedin.com/in/lokeswar-reddy-valluru-b57b63188/

Data Validation with Pyspark || Schema Comparison || Dynamically || Real Time Scenario

28:44

Data Validation with Pyspark || Schema Comparison || Dynamically || Real Time Scenario

Переглядів 2,1 тис.Рік тому

In this Video we covered how we can perform quick data validation like Schema comparison between source and Target: In the next video we will look into Date/TimeStamp format check and duplicate count check . Column Comparison link : ua-cam.com/video/U9QqTh9ynAM/v-deo.html #dataanalytics #dataengineeringessentials #azuredatabricks #dataanalysis #pyspark #pythonprogramming #sql #databricks #PySpa...

Data Validation with Pyspark || Real Time Scenario

37:34

Data Validation with Pyspark || Real Time Scenario

Переглядів 9 тис.Рік тому

In this video will discuss about , how we are going to perform data validation with pyspark Dynamically Data Sources Link: drive.google.com/drive/folders/10aEhm5xcazOHgOGzRouc8cDZw4X8KC0o?usp=sharing #pyspark #databricks #dataanalytics #data #dataengineering

Implementing Pyspark Real Time Application || End-to-End Project || Part-5 || HiveTable ||MYSQL

42:35

Implementing Pyspark Real Time Application || End-to-End Project || Part-5 || HiveTable ||MYSQL

Переглядів 4,6 тис.Рік тому

Implementing Pyspark Real Time Application || End-to-End Project || Part-5 || HiveTable ||MYSQL

Introduction to Spark [Part-1] || Spark Architecture || How does it works internally !!

47:58

Introduction to Spark [Part-1] || Spark Architecture || How does it works internally !!

Переглядів 882Рік тому

Introduction to Spark [Part-1] || Spark Architecture || How does it works internally !!

Implementing Pyspark Real Time Application || End-to-End Project || Part-4

1:09:10

Implementing Pyspark Real Time Application || End-to-End Project || Part-4

Переглядів 2,8 тис.Рік тому

Implementing Pyspark Real Time Application || End-to-End Project || Part-4

Implementing Pyspark Real Time Application || End-to-End Project || Part-3||

33:28

Implementing Pyspark Real Time Application || End-to-End Project || Part-3||

Переглядів 2,4 тис.Рік тому

Implementing Pyspark Real Time Application || End-to-End Project || Part-3||

Implementing Pyspark Real Time Application || End-to-End Project || Part-2

50:30

Implementing Pyspark Real Time Application || End-to-End Project || Part-2

Переглядів 4,3 тис.Рік тому

Implementing Pyspark Real Time Application || End-to-End Project || Part-2

Implementing Pyspark Real Time Application || End-to-End Project || Part-1

1:27:21

Implementing Pyspark Real Time Application || End-to-End Project || Part-1

Переглядів 28 тис.Рік тому

Implementing Pyspark Real Time Application || End-to-End Project || Part-1

Implementing SCD-Type2 in ADF||Part2-Updated

39:59

Implementing SCD-Type2 in ADF||Part2-Updated

Переглядів 1 тис.Рік тому

Implementing SCD-Type2 in ADF||Part2-Updated

Implementing SCD-Type2 in Azure Data Factory Dynamically ||Part-1

59:01

Implementing SCD-Type2 in Azure Data Factory Dynamically ||Part-1

Переглядів 1,4 тис.Рік тому

Implementing SCD-Type2 in Azure Data Factory Dynamically ||Part-1

Implementing FBS Project with Azure Part-3 ||Azure Data Engineer End-To End Project

40:22

Implementing FBS Project with Azure Part-3 ||Azure Data Engineer End-To End Project

Переглядів 549Рік тому

Implementing FBS Project with Azure Part-3 ||Azure Data Engineer End-To End Project

Implementing FBS Project with Azure Part-2 ||Azure Data Engineer End-To End Project

52:11

Implementing FBS Project with Azure Part-2 ||Azure Data Engineer End-To End Project

Переглядів 749Рік тому

Implementing FBS Project with Azure Part-2 ||Azure Data Engineer End-To End Project

Implementing FBS Project with Azure Part-1 ||Azure Data Engineer End-To End Project

1:03:08

Implementing FBS Project with Azure Part-1 ||Azure Data Engineer End-To End Project

Переглядів 3,6 тис.Рік тому

Implementing FBS Project with Azure Part-1 ||Azure Data Engineer End-To End Project

Excel Multiple Sheets to Azure SQL Dynamically || Using Azure Data Factory || Data Factory Pipelines

22:09

Excel Multiple Sheets to Azure SQL Dynamically || Using Azure Data Factory || Data Factory Pipelines

Переглядів 647Рік тому

Excel Multiple Sheets to Azure SQL Dynamically || Using Azure Data Factory || Data Factory Pipelines

Full Load Data Pipeline Using Azure Data Factory Part 1 || Azure Data Factory || Data Engineering

34:24

Full Load Data Pipeline Using Azure Data Factory Part 1 || Azure Data Factory || Data Engineering

Переглядів 502Рік тому

Full Load Data Pipeline Using Azure Data Factory Part 1 || Azure Data Factory || Data Engineering

Incremental Data Loading Part - 2 || For Multiple Tables Using Azure Data Factory

40:27

Incremental Data Loading Part - 2 || For Multiple Tables Using Azure Data Factory

Переглядів 678Рік тому

Incremental Data Loading Part - 2 || For Multiple Tables Using Azure Data Factory

Incremental Data Loading Part 1 || For Single Table Using Azure Data Factory

36:55

Incremental Data Loading Part 1 || For Single Table Using Azure Data Factory

Переглядів 1,1 тис.Рік тому

Incremental Data Loading Part 1 || For Single Table Using Azure Data Factory

КОМЕНТАРІ

@gaganaulakh41 18 днів тому
I hope the datasets being used in this project are available now? Hope this project is still doable with source data?
@samarthjadhav7103 21 день тому
Could you please demonstrate how to implement scd type 3 in ADF
@hemanthreddymanda1221 Місяць тому
Bro upload more videos with azure pipelines in different scenarios.
@pseshadri9364 2 місяці тому
the above config tbl_datasorce where i should create it can you please confirm it in azuresql or any other Db?
@RahulEternalLearner 2 місяці тому
Amazing content. Thank you for sharing. This time youtube didn't show repeated ads. Thank you youtube.
@SurendraGv 2 місяці тому
Hi can you please share me your telegram id i have doubt on Sql
@sudhanvenna3607 4 місяці тому
Good content! Explained very well , Pls try to upload more content actively.
@kotireddy8648 5 місяців тому
I've seen nearly all of your videos, and they're fantastic. Could you please assist me by giving me any resources or materials so I may learn more?
@kotireddy8648 5 місяців тому
I am interested to continue career as azure data engineer with around 2 years of experience tech stack of azure cloud, pyspark,spark,python,sql.
@kotireddy8648 5 місяців тому
can you please give me the github sourcecode for practise?
@kotireddy8648 5 місяців тому
valuable and good content😍
@sachinmittal5308 5 місяців тому
Hello Sir, Link is not working to download the full code from google drive?
@sarveshkumar-tq4fn 5 місяців тому
hi I am getting error when using df.show() for final dataframe
@ComedyXRoad 5 місяців тому
do we apply these techniques for delta tables also
@sainadhvenkata 6 місяців тому
@dataspark Could you please provide those data links again because those link got expired
@tejathunder 6 місяців тому
sir, please upload continuation for this project.
@samar8136 6 місяців тому
Now it is possible to save delta table with column name having spaces: Rename and drop columns with Delta Lake column mapping
@DataSpark45 6 місяців тому
Renamed columns or removed the spaces and then created
@shaasif 6 місяців тому
thank you so much for your real time project explanation on 5 parts it's really awesome..can you please upload remaining multiple files and file name concept video
@DataSpark45 6 місяців тому
Hi actually that concept covered in the Data validation playlist. By creating metadata files. Thanks
@shaasif 6 місяців тому
@@DataSpark45 can you share you email id i want to communicate with you
@amandoshi5803 7 місяців тому
source code ?
@DataSpark45 6 місяців тому
def SchemaComparision(controldf, spsession, refdf): try: #iterate controldf and get the filename and filepath for x in controldf.collect(): filename = x['filename'] #print(filename) filepath = x['filepath'] #print(filepath) #define the dataframes from the filepaths print("Data frame is creating for {} or {}".format(filepath, filename)) dfs = spsession.read.format('csv').option('header', True).option('inferSchema', True).load(filepath) print("DF Created for {} or {}".format(filepath, filename)) ref_filter = refdf.filter(col('SrcFileName') == filename) for x in ref_filter.collect(): columnNames = x['SrcColumns'] refTypes = x['SrcColumnType'] #print(columnNames) columnNamesList = [x.strip().lower() for x in columnNames.split(",")] refTypesList = [x.strip().lower() for x in refTypes.split(",")] #print(refTypesList) dfsTypes = dfs.schema[columnNames].dataType.simpleString() #StringType() : string , IntergerType() : int dfsTypesList = [x.strip().lower() for x in dfsTypes.split(",")] # columnName : Row id, DataFrameType : int, reftype: int missmatchedcolumns = [(col_name, df_types, ref_types) for (col_name, df_types, ref_types) in zip(columnNamesList, dfsTypesList, refTypesList) if dfsTypesList != refTypesList] if missmatchedcolumns : print("schema comparision has been failed or missmatched for this {}".format(filename)) for col_name, df_types, ref_types in missmatchedcolumns: print(f"columnName : {col_name}, DataFrameType : {df_types}, referenceType : {ref_types}") else: print("Schema comaprision is done and success for {}".format(filename)) except Exception as e: print("An error occured : ", str(e)) return False
@maheswariramadasu1301 7 місяців тому
Highly underrated channel and need more videos
@ArabindaMohapatra 7 місяців тому
I just started watching this playlist. I'm hoping to learn how to deal with schema-related issues in real time.Thanks
@DataSpark45 7 місяців тому
Thanks a million bro
@gregt7725 7 місяців тому
That is great - but how to handle deletion from Source ? Actually I do not understand why after sucessful changes/inserts - any deleletion of sorurce (e.g row number 2) creates duplicated rows of previously changed records.( last_updated_date do it - but why )
@DataSpark45 7 місяців тому
Hi, can you please share the details or pic where u have the doubt
@erwinfrerick3891 7 місяців тому
Great explain, very clearly, this video very helpfull for me
@DataSpark45 7 місяців тому
Glad to hear that!
@ChetanSharma-oy4ge 8 місяців тому
how can i find this code? is there any repo where you have uploaded it.?
@DataSpark45 7 місяців тому
Sorry to say this bro , unfortunately we lost those files
@waseemMohammad-qx7ix 8 місяців тому
thank you for making this project it has helped me a lot.
@maheswariramadasu1301 8 місяців тому
This video really help me because tomorrow I have to explain about this topics I am searching in the UA-cam for the best explanation.This video helps me to know about from scratch
@mohitupadhayay1439 8 місяців тому
Amazing content. Keep a playlist for Real time scenarios for Industry.
@mohitupadhayay1439 8 місяців тому
Very underrated channel!
@DataSpark45 8 місяців тому
Thanks a million bro
@ajaykiranchundi9979 8 місяців тому
Very helpful! Thank you
@shahnawazahmed7474 8 місяців тому
I'm looking for ADF training will u provide that! how can I contact you?Thanks
@DataSpark45 8 місяців тому
Hi, u can contact me through LinkedIn Lokeswar Reddy Valluru
@MuzicForSoul 8 місяців тому
sir, can you please also show us the run failing, you are only showing passing case, when I tested by swaping the columns in dataframe it is still not failing because the set still have them in same order.
@DataSpark45 8 місяців тому
Set values will come from reference df .so it always a constant one
@pranaykumar581 8 місяців тому
Can you provide me the source data file?
@DataSpark45 8 місяців тому
Hi in the description i provided the link bro
@MuzicForSoul 8 місяців тому
why we have to do ColumnPositionComparision? shouldn't the column name comparison you did earlier catch this?
@irfanzain8086 8 місяців тому
Bro, thanks a lot! Great explaination 👍 Can you share part 2?
@vamshimerugu6184 8 місяців тому
Sir Can you make a video on how to connect adls to DataBricks using Service principle
@DataSpark45 8 місяців тому
Thanks for asking, will do that one for sure .
@Sankarabellamkonda 8 місяців тому
This video helped me a lot.hope we can expect more real time scenarios like this
@SuprajaGSLV 8 місяців тому
This really helped me understand the topic better. Great content!
@DataSpark45 8 місяців тому
Glad to hear it!
@SuprajaGSLV 8 місяців тому
Could you please upload a video on differences between Data Lake vs Data warehouse vs Delta Tables
@DataSpark45 8 місяців тому
Thanks a million. That will do for sure
@Lucky-eo8cl 8 місяців тому
Good explanation bro👏🏻. It's Really helpful
@DataSpark45 8 місяців тому
Glad to hear that
@vamshimerugu6184 8 місяців тому
I think schema comparison is the important topic in pyspark . Great explanation sir ❤
@DataSpark45 8 місяців тому
thank you bro
@vamshimerugu6184 8 місяців тому
Great explanation ❤.Keep upload more content on pyspark
@DataSpark45 8 місяців тому
Thank you, I will
@saibhargavreddy5992 8 місяців тому
I found this very useful as I had a similar issue with data validations. It helped a lot while completing my project.
@DataSpark45 8 місяців тому
Glad it helped!
@maheswariramadasu1301 8 місяців тому
This video helps me to understand the multiple even triggers in adf
@DataSpark45 8 місяців тому
Glad to hear that
@maheswariramadasu1301 9 місяців тому
It's help me a lot for learning pyspark easily
@0adarsh101 9 місяців тому
can i use databricks community edition?
@DataSpark45 9 місяців тому
Hi, You can use databricks, then you have to play around dbutils.fs methods in order to get the list / file path as we did in get_env.py file. Thank you
@SaadAhmed-js5ew 9 місяців тому
where's your parquet file located?
@DataSpark45 9 місяців тому
Hi, r u talking about source parquet file! It's under source folder
@OmkarGurme 10 місяців тому
while working with databricks we dont need to start a spark session right ?
@DataSpark45 9 місяців тому
No need brother, we can continue with out defining spark session, i just kept for practice
@listentoyourheart45 10 місяців тому
Nice explanation sir
@kaushikvarma2571 10 місяців тому
is this continuation of part - 2? In part-2 we have neve discussed about test.py and udfs.py
@DataSpark45 10 місяців тому
yes, test.py here i used for just to run the functions and about udfs.py please watch it from 15:oo min onwards
@kaushikvarma2571 10 місяців тому
To solve header error, replace csv code to this "elif file_format == 'csv': df = spark.read.format(file_format).option("header",True).option("inferSchema",True).load(file_dir)"
@memesmacha61 7 місяців тому
Thank ypu bro
@charangowdamn8661 10 місяців тому
Hi sir how can I reach to you can you ?
@charangowdamn8661 10 місяців тому
Hi sir how can I reach to you can you please share your mail id or how can I connect to you
@DataSpark45 10 місяців тому
you can reach me on linkedin id : valluru lokeswar reddy

DataSpark

КОМЕНТАРІ