00:00 Big Data and Hadoop 01:25 Hadoop processed data in batches and was slower due to disk storage, Apache Spark solves these limitations. 02:43 Apache Spark is a fast and efficient data processing framework. 04:11 Apache Spark is a powerful tool for processing and analyzing Big Data. 05:42 Apache Spark application consists of a driver process and executor processes. 07:02 Spark data frames are distributed across multiple computers and require partitioning for parallel execution. 08:24 Spark transformation block will give the final output. 09:40 Spark allows the conversion of data frames and the execution of SQL queries on top of it.
I had understood the concept clearly within 10 min . Now I had a great understanding and knowledge about Apache Spark . This is best Spark video I had gone through . Its clear and top notch Explanation about each of the topics .
Such a nice content! What a man you are! You have covered everything in spark in just 10 mins. I wonder how you made this video and the effort u put in to make this video is wonderful. Thank you for sharing nice content in such a simple manner!!
I usually be off from content titled learn/master/excel X in Y minutes. would have definitely done the same had I came accross this by myself. Watched it only because my frd shared to me. Now I feel that I am lucky after watching this as I could wrap my head around SPARK. Subscribed.
As many other already said, tantastic and informative video on Spark. Nice context by providing the history of Hadoop. Nice pace too, not to fast, not too slow!
Saw a bunch of your roadmap videos back in my freshman year, and now back here prepping for my DS internship, thanks ! The job description had spark/mapreduce which brings me here : )
I tried to replicate the code block at 10:13, Can we use tips.filter(filterA & filterB), this applied both filters at the same time and does not create intermediate results tips.filter(filterA) will create some dataframe, which will be filtered by another filterB Please correct me if I'm wrong thanks !
I'm just getting started with creating a group CNN project with friends and we are dealing a huge dataset of mri scans so I was thinking about platforms that could deal with lots of data without having to deplete my disk lol. Thank you so much for breaking down how Apache works compared to Hadoop, I really appreciate it! 😊
So in just 10 mins, I get to know about Big Data, Hadoop, Spark, Pyspark and how I can write code in Pyspark. Wow, that's what a good explanatory should be like!
Thank you so much for this explanation, youve outlined it quite clearly before Ive even had any experience using Spark, so thank you! If you could slow down your explanation a bit though, that would be helpful
Darshil Sir, I had a query regarding Memory Management concept of Spark. As per my understanding, Spark uses it Execution memory to store intermediate data in execution memory which it shares with storage memory too, if needed. It can also utilize the off-heap memory for storing extra data. 1) Does it access the off heap memory after filling up storage memory? 2) What if it fills up Off heap memory too? Does it wait till GC clears up on-heap part or spills the extra data to disc? Now, in a wide transformation, Spark either sends the data back to disc or transfer it over the network, say for a join operation. Is the part of data sending data back to disc same as above where Spark has the option to spill data to disc on filling up on-heap memory? Please do clarify my above queries, sir. I feel like breaking my head as I couldn't make a headway through it yet even after referring few materials.
In Spark, memory management involves both on-heap memory and off-heap memory. Let me address your queries regarding Spark's memory management: 1. Off-heap memory usage: By default, Spark primarily uses on-heap memory for storing data and execution metadata. However, Spark can also utilize off-heap memory for certain purposes, such as caching and data serialization. Off-heap memory is typically used when the data size exceeds the available on-heap memory or when explicit off-heap memory is configured. It is not used as an overflow for storage memory. 2. Filling up off-heap memory: If off-heap memory fills up, Spark does not automatically spill the data to disk. Instead, it relies on garbage collection (GC) to free up memory. Spark's memory management relies on the JVM's garbage collector to reclaim memory when it becomes necessary. When off-heap memory is full, Spark waits for the JVM's garbage collector to reclaim memory by cleaning up unused objects. Therefore, if off-heap memory fills up, Spark may experience performance degradation or even out-of-memory errors if the garbage collector cannot free enough memory. Thanks, ChatGPT
Amazing concise detailed explanation with great editing. Such a great way of presenting a hard topic in an easy manner. Love your comparisons with teamwork, puzzles etc. So impressed. Big thumbs up and subscribe from me. Eager to see your other videos. Thanks!
Bro cooked. From the history, to the technical design and demo! Hats off!
Don't forget to hit that Subscribe Button for more amazing content :)
Get ready with project.
Please also upload GCP data engineering End-to-End project
You deserve much more than 1000 buddy. I learn so much from your channel
lets get that.
Are you from Gujarat?
Thanks for actually explaining spark, instead of making general comments or assuming we know the basics. Great video. Thumbs up, subscribed.
agreed. I watched like 5 videos prior to this one that made wild assumptions about what I knew
That was an extremely good explanation. Not only explained the theory but also practical examples.
00:00 Big Data and Hadoop
01:25 Hadoop processed data in batches and was slower due to disk storage, Apache Spark solves these limitations.
02:43 Apache Spark is a fast and efficient data processing framework.
04:11 Apache Spark is a powerful tool for processing and analyzing Big Data.
05:42 Apache Spark application consists of a driver process and executor processes.
07:02 Spark data frames are distributed across multiple computers and require partitioning for parallel execution.
08:24 Spark transformation block will give the final output.
09:40 Spark allows the conversion of data frames and the execution of SQL queries on top of it.
best explanation on spark in 10 minutes. its like feynman explaining physics. excellent job!
I had understood the concept clearly within 10 min . Now I had a great understanding and knowledge about Apache Spark . This is best Spark video I had gone through . Its clear and top notch Explanation about each of the topics .
I never knew I could recall so much in just under 10min...
Wonderful content and well explained keeping it simple...
Glad you liked it
You are doing a fabulous job of making Data analytics so easy for everyone. Thank you so very much. God bless you!
This was insanely good. Thanks for explaining the basics so clearly. Now I can learn deeper more comfortably.
I was waiting for this. Please share an end to end project using Spark.
Yes
Waiting for the same...right from spark installation on local as well as on cloud platform
Please upload ASAP.
Yes, if possible can you please also share using pyspark as well..
Such a nice content!
What a man you are!
You have covered everything in spark in just 10 mins. I wonder how you made this video and the effort u put in to make this video is wonderful. Thank you for sharing nice content in such a simple manner!!
I usually be off from content titled learn/master/excel X in Y minutes. would have definitely done the same had I came accross this by myself. Watched it only because my frd shared to me. Now I feel that I am lucky after watching this as I could wrap my head around SPARK.
Subscribed.
The best Spark tutorial I have ever gone through. Thanks a lot Darshil.
Wow, thanks!
The first very clear video about spark that I have seen.
Excellent explanation-clear, concise, and straight to the point.
As many other already said, tantastic and informative video on Spark. Nice context by providing the history of Hadoop. Nice pace too, not to fast, not too slow!
Thanks For Explaining in 10 Min 🙌
crystal clear explanation! loved it❤
I didn't understand apache spark since my undergraduate until I found this gem.
Saw a bunch of your roadmap videos back in my freshman year, and now back here prepping for my DS internship, thanks !
The job description had spark/mapreduce which brings me here : )
I tried to replicate the code block at 10:13,
Can we use tips.filter(filterA & filterB), this applied both filters at the same time and does not create intermediate results
tips.filter(filterA) will create some dataframe, which will be filtered by another filterB
Please correct me if I'm wrong
thanks !
Apache Spark -- explained core concept in such a simple language..
Wonderful job 👍👍👍
I'm just getting started with creating a group CNN project with friends and we are dealing a huge dataset of mri scans so I was thinking about platforms that could deal with lots of data without having to deplete my disk lol. Thank you so much for breaking down how Apache works compared to Hadoop, I really appreciate it! 😊
superb man.. didn't waste the time.. great explaination..
Thanks for this video , much informative and easy to understand using the examples you gave.
What a video really understood the apache spark that i could not in my university.
Thanks for this.Currently reading spark definitive guide.Looking forward to full tutorial
Coming soon!
Nice video
Explained and the presentation is good.
I really understand the software really quickly, thanks man
amazing explanation!! Thank you!
You nailed it man! Amazing information that i am using for my DE interviews
Very good intro to Spark. I've started my data science journey and it really helps.
So in just 10 mins, I get to know about Big Data, Hadoop, Spark, Pyspark and how I can write code in Pyspark.
Wow, that's what a good explanatory should be like!
Brief and informative . Thanks 👍
You nailed it Bro in just 10 mins 😊
Impressive explanation of spark. Making it easy for every beginner to understand.
Glad it was helpful!
One of the best video ! You really exxplained in very precise and esay way. Love it!
Great introduction. Thank you so much.
Wonderfully explained in just 10 mins.
Alright, but need a full tutorial on this topic, if you can.
Working on it!
@@DarshilParmarthank you please upload it ASAP
@@DarshilParmarplease upload
@@DarshilParmar this is a what the heroes did. Kudos to you Darshil
Thank you!! So helpful
Clear and concise explanation
Good job Darshil. Appreciate the work.
Amazing, You explained everything in detail with examples. Best video on UA-cam to know about Spark.👏
Nice Video, Thank You.
Understood video very well. Without any prior knowledge of apache spark
Glad it was helpful
Super explanation bro, I got many answers in one vedio 🥳🥳
Very nice video. Thank you!
You explained the content simple and clear. Thank you for this video.
Really very nice explanations..
To the point, quick, simple and comprehensive knowledge sharing!
I appreciate that!
Waiting for full course from you apache spark
Wonderful summarize!
Awesome video mate! well done.
Thank you so much for this explanation, youve outlined it quite clearly before Ive even had any experience using Spark, so thank you! If you could slow down your explanation a bit though, that would be helpful
It's a 10min series, you can check out my courses for more in-depth guide
This explanation is very gooooooooooooooooooooooooood
Thank u
you are welcome mate!
really good explanation
Super🎉
Waiting for full tutorial
Very soon
Thank u i got the basics
Very well explained! Thank you!
Hello Darshil,
This is great content ! A little bit too much information, hehe. Now it should be digested :)
Thank you sir 👍
Nice explanation..plz do series on spark.
I have a course on Spark, please check description
Nice Explanation, Thank you
very nicely explained
Best tutorial ❤❤all in one
Very well explained , thank you very much
As simple as that.. Liked
Very well explained😊
Great video bro
Really productive video.
Superb one! Can we expect full tutorial over spark!?
Yes, coming soon!
@@DarshilParmar Ah nice then 😍
Great video! Thank you
Just Amazing😇Thank you
A excellent video on Apache Spark. Covered almost everything. Very helpful video to the beginners like me.
Very good video
You explained so many things in 10 minutes 🫡🫡🫡
Thank you man
Fantastic explanation… 👏👏 the way you take your audience through the flow of explaining these concepts is very effective👌
Thanks a lot 😊
Wonderful video you explained everything perfectly
Darshil Sir, I had a query regarding Memory Management concept of Spark.
As per my understanding, Spark uses it Execution memory to store intermediate data in execution memory which it shares with storage memory too, if needed. It can also utilize the off-heap memory for storing extra data.
1) Does it access the off heap memory after filling up storage memory?
2) What if it fills up Off heap memory too? Does it wait till GC clears up on-heap part or spills the extra data to disc?
Now, in a wide transformation, Spark either sends the data back to disc or transfer it over the network, say for a join operation.
Is the part of data sending data back to disc same as above where Spark has the option to spill data to disc on filling up on-heap memory?
Please do clarify my above queries, sir. I feel like breaking my head as I couldn't make a headway through it yet even after referring few materials.
In Spark, memory management involves both on-heap memory and off-heap memory. Let me address your queries regarding Spark's memory management:
1. Off-heap memory usage: By default, Spark primarily uses on-heap memory for storing data and execution metadata. However, Spark can also utilize off-heap memory for certain purposes, such as caching and data serialization. Off-heap memory is typically used when the data size exceeds the available on-heap memory or when explicit off-heap memory is configured. It is not used as an overflow for storage memory.
2. Filling up off-heap memory: If off-heap memory fills up, Spark does not automatically spill the data to disk. Instead, it relies on garbage collection (GC) to free up memory. Spark's memory management relies on the JVM's garbage collector to reclaim memory when it becomes necessary. When off-heap memory is full, Spark waits for the JVM's garbage collector to reclaim memory by cleaning up unused objects. Therefore, if off-heap memory fills up, Spark may experience performance degradation or even out-of-memory errors if the garbage collector cannot free enough memory.
Thanks,
ChatGPT
A very very good video. Thanks, you are doing a really great job!
Darshil I want to learn data engineering from scratch. I don't know anything about these changes, so where do I start? Which course should be taken.

My Python & SQL for Data Engineering is a good place to start - learn.datawithdarshil.com/
Really good content .
What is the difference between Apache spark and Kafka?? Which one to use for data analysis?
Amazing concise detailed explanation with great editing. Such a great way of presenting a hard topic in an easy manner. Love your comparisons with teamwork, puzzles etc. So impressed. Big thumbs up and subscribe from me. Eager to see your other videos. Thanks!
Excellent Explanation...
Excellent video Darshil. Clear and concise! Subscribed!
So is pandas similar to spark where pandas is more suitable for for a single node data processing vs spark is for distributed data processing ?
can you make one of these vids on lakesail's pysail?
Great video, thanks :)
Thank you very much and it's a very nice primer to refresh once the concepts. Thank you for your contributions 👍
It was really helpful. Thanks.
Nice video!
Amazing video. Please share the project doc😊
Very brief and informative video
Thank you for this video, I liked it: simple, clear, and short! Perfect :)
Nice overview.
This is a great explanation
Thanks Darshil