Apache Spark End-To-End Data Engineering Project | Apple Data Analysis

Поділитися
Вставка
  • Опубліковано 11 лип 2024
  • Dive into the world of big data processing with our PySpark Practice playlist. This series is designed for both beginners and seasoned data professionals looking to sharpen their Apache Spark skills through scenario-based questions and challenges.
    Each video provides step-by-step solutions to real-world problems, helping you master PySpark techniques and improve your data-handling capabilities. Whether preparing for a job interview or just learning more about Spark, this playlist is your go-to resource for practical, hands-on learning. Join us to become a PySpark expert!
    In this video, we used DataBricks to create multiple ETL pipelines using the Python API of Apache Spark i.e. PySpark.
    We have used sources like CSV, Parquet, and Delta Table then used Factory Pattern to create the reader class. Factory Pattern is one of the most used Low-Level designs in Data Engineering pipelines that involve multiple sources.
    Then we used PySpark DataFrame API and Spark SQL to write the business transformation logic. In the loader part, we have loaded data into two fashion one using DataLake and another by Data LakeHouse.
    While solving the problems, we are also demonstrating the most asked PySpark #interview problems. We have discussed and demonstrated a lot of concepts like broadcast join, partition by and bucketing, sparkSession, windows functions like LAG and LEAD, delta table and many other concepts.
    After watching, please let us know your thoughts,
    Stay tuned to all to this playlist for all upcoming videos.
    𝗝𝗼𝗶𝗻 𝗺𝗲 𝗼𝗻 𝗦𝗼𝗰𝗶𝗮𝗹 𝗠𝗲𝗱𝗶𝗮:
    🔅 Topmate (For collaboration and Scheduling calls) - topmate.io/ankur_ranjan
    🔅 LinkedIn - / thebigdatashow
    🔅 Instagram - / ranjan_anku
    DataBricks notebooks link. Extract the zip folder by downloading it and then open the HTML files as a notebook in the community version of Databricks.
    🔅 Recommended Link for DataBricks community version login, after signing up:
    community.cloud.databricks.com/
    🔅 Ankur's Notebook source files
    drive.google.com/file/d/15FBg...
    🔅 Input table files
    drive.google.com/drive/folder...
    For practising different Data Engineering interview questions, go to the community section of our UA-cam page.
    / @thebigdatashow
    Narrow vs Wide Transformation
    Short Article link:
    • Post
    Questions 1:
    • Post
    Question 2:
    • Post
    Question 3:
    • Post
    Question 4:
    • Post
    Question 5:
    • Post
    Question 6:
    • Post
    Question 7:
    • Post
    Question 8:
    • Post
    Question 9:
    • Post
    Question 10:
    • Post
    Broadcast Join in #apachespark
    Small article link:
    • Post
    MCQs list
    1. / @thebigdatashow
    2. / @thebigdatashow
    3. / @thebigdatashow
    4. / @thebigdatashow
    5.
    / @thebigdatashow
    Check the COMMUNITY section for a full list of questions.
    Chapters
    00:00 - Project Introduction
    12:04 - How to use Databricks for any Pyspark/Spark Project?
    25:09 - Low-Level Design Code
    40:39 - Job, Stages, and Action in Spark
    45:22 - Designing a code base for the Spark Project
    51:40 - Applying first business Logic in the transformer class
    57:34 - Difference between Lag & Lead window function
    01:28:42 - Broadcast Join in Apache Spark/pyspark
    01:47:50 - Difference between Partitioning and Bucketing in Apache Spark/pyspark
    2:07:00 - Detailed Summary of the first pipeline
    2:14:00 - Second pipeline Goal
    02:24:57 - collect_set() and collect_list() in Spark/pyspark
    02:48:53 - Detailed Summary of the second pipeline
    02:51:03 - Why is Delta Lake when we already have DataLake?
    02:54:51 - Summary
    #databricks #delta #pyspark #practice #dataengineering #apachespark #problemsolving
    #spark #bigdata #interviewquestions #sql #datascience #dataanalytics
  • Наука та технологія

КОМЕНТАРІ • 133

  • @mohinraffik5222
    @mohinraffik5222 3 дні тому +1

    Appreciate your great effort and share your knowledge brother!👍

  • @shafimahmed7711
    @shafimahmed7711 Місяць тому +2

    Thank you for time and patience to prepare this video. this will definitely help many .

  • @SaivarunNamburi
    @SaivarunNamburi Місяць тому +1

    Really amazing end-to-end DE project, learned a lot in these 3 hours

    • @TheBigDataShow
      @TheBigDataShow  Місяць тому

      Thank you for your kind words. This motivates me to make more and better videos. Please check the community section/tab of the channel. We have created and collected over 1000 of the most asked Data Engineering questions. We have made all these questions in the form of MCQs so that you can solve them and learn from them.
      Search our Channel Name - The Big Data Show, on UA-cam -> Go to the channel -> Then click on community tab

    • @TheBigDataShow
      @TheBigDataShow  Місяць тому

      After creation of the project, try creating a GitHub and share the link of GitHub repo & your LinkedIn profile. I will give shout out to your profile on LinkedIn. It will help you to grow your network & help finding job by showcasing your skills as full Data Engineering project.

  • @RupeshPatel-ry6jb
    @RupeshPatel-ry6jb 2 місяці тому +2

    Thank you for doing this project, it is quite enriching experience for learning. I would love to see more of these kind of videos in future. Keep up great work!

    • @TheBigDataShow
      @TheBigDataShow  Місяць тому

      Thank you for your kind words. This motivates me to make more and better videos. Please check the community section/tab of the channel. We have created and collected over 1000 of the most asked Data Engineering questions. We have made all these questions in the form of MCQs so that you can solve them and learn from them.
      Search our Channel Name - The Big Data Show, on UA-cam -> Go to the channel -> Then click on the community tab

    • @TheBigDataShow
      @TheBigDataShow  Місяць тому

      Thank you for your kind words. This motivates me to make more and better videos. Please check the community section/tab of the channel. We have created and collected over 1000 of the most asked Data Engineering questions. We have made all these questions in the form of MCQs so that you can solve them and learn from them.
      Search our Channel Name - The Big Data Show, on UA-cam -> Go to the channel -> Then click on the community tab

    • @TheBigDataShow
      @TheBigDataShow  Місяць тому

      Once you finish the project, please create a GitHub repository and share the link to the repository, along with a link to your LinkedIn profile. I will give a shout-out to your profile on LinkedIn. This will help you expand your network and showcase your skills through a complete Data Engineering project, which can assist you in finding a job.

  • @shouviksharma7621
    @shouviksharma7621 Місяць тому +1

    This is a great demonstration, appreciate the team's effort for putting together an awesome end-to-end project.

    • @TheBigDataShow
      @TheBigDataShow  Місяць тому +1

      Thank you Shouvik.
      I have also started one playlist with the name "Kafka for Data Engineers" Do check it out in your free time

    • @TheBigDataShow
      @TheBigDataShow  Місяць тому +1

      After creating the project, please create a GitHub repository and share the link to the repository as well as your LinkedIn profile. I will give a shout out to your profile on LinkedIn. This will help you grow your network and showcase your skills as a full Data Engineering project, which can help you in finding a job.

  • @LalitSharma-up5hl
    @LalitSharma-up5hl 2 місяці тому +6

    Good project learning experience Ankur. It took me around 10 hours to debug and write code even after watching you step by step. Nice way to explain complex logics.

    • @TheBigDataShow
      @TheBigDataShow  2 місяці тому +2

      Great job! This is the best way to learn. The ten hours you spent will always help you write production-ready pipelines. Debugging is an art that requires patience. Merely following the steps won't help as much as implementing them yourself after seeing the steps. This is the true way of learning and ensures that you won't forget the code flow.
      Don't forget to check out our channel's community section/tab. I have created over 1000 Data Engineering questions for practicing and improving your skills.

    • @muhammadsamir2243
      @muhammadsamir2243 Місяць тому +1

      please share your github code

    • @TheBigDataShow
      @TheBigDataShow  Місяць тому +1

      @@muhammadsamir2243 Please check the decription of the video. You will able to find the link of all notebook in form of HTML file. You will able to import it in any python notebook editor. Open the HTML files in chrome. It will give you the import option.
      & Please check the community section/tab of the channel. We have created and collected over 1000 of the most asked Data Engineering questions. We have made all these questions in the form of MCQs so that you can solve them and learn from them.
      Search our Channel Name - The Big Data Show, on UA-cam -> Go to the channel -> Then click on community tab

    • @TheBigDataShow
      @TheBigDataShow  Місяць тому +1

      After completing the project, please create a GitHub repository and share the link to the repository, along with a link to your LinkedIn profile. I will give a shout out to your profile on LinkedIn. This will help you expand your network and showcase your skills through a complete Data Engineering project, which can assist you in finding a job.

    • @LalitSharma-up5hl
      @LalitSharma-up5hl Місяць тому +1

      @@TheBigDataShow sure

  • @RubyshaKhan-um5yg
    @RubyshaKhan-um5yg 2 місяці тому +2

    Excited to watch

    • @TheBigDataShow
      @TheBigDataShow  2 місяці тому

      We are also very excited to release it. I hope my hard work pays off and many aspiring Data Engineers create their Data Engineering project after watching it.

    • @TheBigDataShow
      @TheBigDataShow  Місяць тому

      Thank you for your kind words. This motivates me to make more and better videos. Please check the community section/tab of the channel. We have created and collected over 1000 of the most asked Data Engineering questions. We have made all these questions in the form of MCQs so that you can solve them and learn from them.
      Search our Channel Name - The Big Data Show, on UA-cam -> Go to the channel -> Then click on the community tab

  • @jjayeshpawar
    @jjayeshpawar 2 місяці тому +1

    Thanks for sharing!!!

    • @TheBigDataShow
      @TheBigDataShow  2 місяці тому

      My pleasure!!

    • @TheBigDataShow
      @TheBigDataShow  Місяць тому

      Thank you for your kind words. Once you finish the project, please create a GitHub repository and share the link with me, along with a link to your LinkedIn profile. I will give a shout-out to your profile on LinkedIn. This will help you expand your network and showcase your skills through a complete Data Engineering project, which can assist you in finding a job.

  • @ww_4776
    @ww_4776 2 місяці тому +2

    Thanks for doing such videos❤

    • @TheBigDataShow
      @TheBigDataShow  2 місяці тому

      My pleasure 😊

    • @TheBigDataShow
      @TheBigDataShow  Місяць тому

      Thank you for your kind words. Once you finish the project, please create a GitHub repository and share the link to the repository, along with a link to your LinkedIn profile. I will give a shout-out to your profile on LinkedIn. This will help you expand your network and showcase your skills through a complete Data Engineering project, which can assist you in finding a job.

  • @PraveenKumarBN
    @PraveenKumarBN 2 місяці тому +1

    This Channel is simply amazing 😍 Keep coming up with great content on Data Engineering like this

    • @TheBigDataShow
      @TheBigDataShow  2 місяці тому

      Sure

    • @TheBigDataShow
      @TheBigDataShow  Місяць тому

      Thank you for your kind words. This motivates me to make more and better videos. Please check the community section/tab of the channel. We have created and collected over 1000 of the most asked Data Engineering questions. We have made all these questions in the form of MCQs so that you can solve them and learn from them.
      Search our Channel Name - The Big Data Show, on UA-cam -> Go to the channel -> Then click on the community tab

    • @TheBigDataShow
      @TheBigDataShow  Місяць тому

      Thanks for your kind words. Once you complete the project, please create a GitHub repository and share the link to the repository, along with a link to your LinkedIn profile. I will give a shout-out to your profile on LinkedIn. This will help you expand your network and showcase your skills through a complete Data Engineering project, which can assist you in finding a job.

  • @anshusharaf2019
    @anshusharaf2019 2 місяці тому +4

    Excited to learn and implement real-time, Thanks #The_Big_Data_show

    • @TheBigDataShow
      @TheBigDataShow  2 місяці тому +4

      We are also very much excited like you to release it.
      You can solve more than 1000 Data Engineering questions that I have created on my Community page/tab/section of our UA-cam channel.
      I have collected all those questions from different interviews which my friends have given in recent times

    • @TheBigDataShow
      @TheBigDataShow  Місяць тому +1

      Thank you for your kind words. Once you finish the project, please create a GitHub repository and share the link to the repository along with a link to your LinkedIn profile. I will give a shout-out to your profile on LinkedIn. This will help you expand your network and showcase your skills through a complete Data Engineering project, which can assist you in finding a job.

  • @omkarm7865
    @omkarm7865 2 місяці тому +1

    Excited to complete this

    • @TheBigDataShow
      @TheBigDataShow  2 місяці тому +1

      Great 🤞

    • @TheBigDataShow
      @TheBigDataShow  Місяць тому

      Thank you for your kind words. This motivates me to make more and better videos. Please check the community section/tab of the channel. We have created and collected over 1000 of the most asked Data Engineering questions. We have made all these questions in the form of MCQs so that you can solve them and learn from them.
      Search our Channel Name - The Big Data Show, on UA-cam -> Go to the channel -> Then click on community tab

    • @TheBigDataShow
      @TheBigDataShow  Місяць тому

      Thank you for your kind words. This motivates me to make more and better videos. Please check the community section/tab of the channel. We have created and collected over 1000 of the most asked Data Engineering questions. We have made all these questions in the form of MCQs so that you can solve them and learn from them.
      Search our Channel Name - The Big Data Show, on UA-cam -> Go to the channel -> Then click on the community tab

  • @swapnilbop
    @swapnilbop Місяць тому +1

    Appreciate your efforts.. keep it up ❤

    • @TheBigDataShow
      @TheBigDataShow  Місяць тому

      Thanks a lot 😊

    • @TheBigDataShow
      @TheBigDataShow  Місяць тому

      Thank you for your kind words. Once you finish the project, please create a GitHub repository and share the link to the repository, along with a link to your LinkedIn profile. I will give a shout-out to your profile on LinkedIn. This will help you expand your network and showcase your skills through a complete Data Engineering project, which can assist you in finding a job.

  • @rohithb65
    @rohithb65 2 місяці тому +2

    Excited to learn

    • @TheBigDataShow
      @TheBigDataShow  2 місяці тому

      We are also excited just like you to release the full Apache Spark End-to-end pipeline. Please click on the bell icon to not miss the notification before the start of the live premiere. It will go live at 2:30 PM IST.

    • @TheBigDataShow
      @TheBigDataShow  Місяць тому

      Thank you for your kind words. Once you finish the project, please create a GitHub repository and share the link to the repository, along with a link to your LinkedIn profile. I will give a shout-out to your profile on LinkedIn. This will help you expand your network and showcase your skills through a complete Data Engineering project, which can assist you in finding a job.

  • @pradeepbehera3092
    @pradeepbehera3092 Місяць тому +1

    I was searching something like this for a long time. Than you for putting this together.. ..Already learning a lot from you ..I would love to connect with you .

    • @TheBigDataShow
      @TheBigDataShow  Місяць тому

      Thank you for your kind words Pradeep. You can connect with me on TopMate by scheduling one call from my calendar 🗓️ there. You can find the link 🖇️ in the description of the video.

    • @pradeepbehera3092
      @pradeepbehera3092 Місяць тому

      @@TheBigDataShow Will do thanks !!

    • @TheBigDataShow
      @TheBigDataShow  Місяць тому +1

      After you finish the project, please create a GitHub repository and share the link to the repository, along with a link to your LinkedIn profile. I will give a shout-out to your profile on LinkedIn. This will help you expand your network and showcase your skills through a complete Data Engineering project, which can assist you in finding a job.

  • @princeyjaiswal45
    @princeyjaiswal45 2 місяці тому +1

    Great 👍

    • @TheBigDataShow
      @TheBigDataShow  Місяць тому

      Thank you for your kind words. This motivates me to make more and better videos. Please check the community section/tab of the channel. We have created and collected over 1000 of the most asked Data Engineering questions. We have made all these questions in the form of MCQs so that you can solve them and learn from them.
      Search our Channel Name - The Big Data Show, on UA-cam -> Go to the channel -> Then click on the community tab

  • @syedkamran4121
    @syedkamran4121 2 місяці тому +1

    Exciting

    • @TheBigDataShow
      @TheBigDataShow  Місяць тому +1

      Thank you for your kind words. Once you finish the project, please create a GitHub repository and share the link to the repository, along with a link to your LinkedIn profile. I will give a shout-out to your profile on LinkedIn. This will help you expand your network and showcase your skills through a complete Data Engineering project, which can assist you in finding a job.

  • @shabnampathan8975
    @shabnampathan8975 Місяць тому

    Appreciate your efforts thank you

    • @TheBigDataShow
      @TheBigDataShow  Місяць тому

      Thank you for your kind words. This motivates me to make more and better videos. Please check the community section/tab of the channel. We have created and collected over 1000 of the most asked Data Engineering questions. We have made all these questions in the form of MCQs so that you can solve them and learn from them.
      Search our Channel Name - The Big Data Show, on UA-cam -> Go to the channel -> Then click on community tab

    • @shabnampathan8975
      @shabnampathan8975 Місяць тому

      @@TheBigDataShow thank you for your reply ,I will do that ,It will help me for interview preparation.Thank you so much again as you are putting lots of efforts in creating videos with high quality content .

    • @TheBigDataShow
      @TheBigDataShow  Місяць тому

      Thank you for your kind words. Once you finish the project, please create a GitHub repository and share the link to the repository, along with a link to your LinkedIn profile. I will give a shout-out to your profile on LinkedIn. This will help you expand your network and showcase your skills through a complete Data Engineering project, which can assist you in finding a job.

    • @shabnampathan8975
      @shabnampathan8975 Місяць тому

      @@TheBigDataShow I will do that

  • @dhruvingandhi1114
    @dhruvingandhi1114 14 днів тому

    Hello I am getting error to read delta table that is on default at 01:21:50
    IllegalArgumentException: Path must be absolute: default.customer_delta_table_persist.Please help me through that

  • @ashwinraje6520
    @ashwinraje6520 Місяць тому +1

    Just completed this project after a lot of debugging. Got to learn about factory design pattern.
    Is this pattern typically used in the production environments? Thank you Ankur for creating such a quality project!

    • @TheBigDataShow
      @TheBigDataShow  Місяць тому +1

      Yes a lot. Try learning builder, singleton and companion, low level design now.

  • @anshusharaf2019
    @anshusharaf2019 Місяць тому

    Hey Ankur This side Anshu, First of all, thanks for your amazing effort I'm a little bit confused about the source file (Extraction part) You explained to us in the videos We have used sources like CSV, Parquet, and Delta Table. But this is the type of file where you keep the data as a source then what is the Actual Source of data? For example, we have some ABC database I export the data in CSV or parquet and other file formats But my data source would be ABC Data Base) is it the right way I think?
    @Ankur

  • @TheBigDataShow
    @TheBigDataShow  2 місяці тому +4

    Please find the link all input files.
    drive.google.com/drive/folders/1G46IBQCCi5-ukNDwF4KkX4qHtDNgrdn6?usp=sharing
    Please let me know if you can access it or not.

    • @vlogsofsiriii
      @vlogsofsiriii Місяць тому

      Hi Ankur. I am not able to download the files

  • @tanayvaswani-24blue
    @tanayvaswani-24blue 12 днів тому +1

    Can you do even a small project using Kafka?

    • @TheBigDataShow
      @TheBigDataShow  12 днів тому

      Give me some time. I am already planning but I am currently getting less time due to my startup initial days. Please give me some time, I will upload it.
      I have already uploaded some of the Kafka videos. Please check the "Kafka for Data Engineers" playlist

  • @maazahmedansari4334
    @maazahmedansari4334 8 днів тому +1

    Getting in first pipeline.
    AnalysisException: Failed to merge fields 'customer_id' and 'customer_id'
    Any suggestion would be appreciated. Thank you

    • @TheBigDataShow
      @TheBigDataShow  8 днів тому

      @@maazahmedansari4334 please share your some more code snippet for debugging and have your created some GitHub repo for same.

  • @AshiChaudhary-lc8tk
    @AshiChaudhary-lc8tk Місяць тому +1

    Hi Ankur, very excited to go through the video, also, are you planning to implement through AWS as well, would be helpful

    • @TheBigDataShow
      @TheBigDataShow  Місяць тому

      Yes, stay tuned

    • @TheBigDataShow
      @TheBigDataShow  Місяць тому

      Thank you for your kind words. I am already working on one video involving AWS.
      Once you finish the video and complete project, please create a GitHub repository and share the link to the repository, along with a link to your LinkedIn profile. I will give a shout-out to your profile on LinkedIn. This will help you expand your network and showcase your skills through a complete Data Engineering project, which can assist you in finding a job.

    • @AshiChaudhary-lc8tk
      @AshiChaudhary-lc8tk Місяць тому

      @@TheBigDataShow That sounds amazing, sure will do soon.

  • @manibaddireddy5477
    @manibaddireddy5477 Місяць тому +2

    great explanation , but have small concern about datasets having small data.

  • @codjawan
    @codjawan 2 місяці тому +1

    Hey Ankur bhai, big thanks for this project was waiting eagerly from your channel to get one project video, hope this helps in interview to explain as a Real Time Project for exp candidates

    • @TheBigDataShow
      @TheBigDataShow  2 місяці тому +1

      Thank you for your kind words :)

    • @TheBigDataShow
      @TheBigDataShow  Місяць тому

      Thank you for your kind words. This motivates me to make more and better videos. Please check the community section/tab of the channel. We have created and collected over 1000 of the most asked Data Engineering questions. We have made all these questions in the form of MCQs so that you can solve them and learn from them.
      Search our Channel Name - The Big Data Show, on UA-cam -> Go to the channel -> Then click on community tab

    • @codjawan
      @codjawan Місяць тому +1

      I'm already learning from it, Unbelievable work done by you creating 1000+ MCQ is a challenging and boring task but thank you soo much Ankur Bhai for creating this Series. I'm 100% Sure that no one on UA-cam has created these many MCQs.
      Thanks again and hats off to you for this incredible work.

    • @TheBigDataShow
      @TheBigDataShow  Місяць тому +1

      @@codjawan Thank you. Keep motivating us and we will keep making valuable content

    • @TheBigDataShow
      @TheBigDataShow  Місяць тому +1

      Thank you for your kind words. Once you finish the project, please create a GitHub repository and share the link to the repository, along with a link to your LinkedIn profile. I will give a shout-out to your profile on LinkedIn. This will help you expand your network and showcase your skills through a complete Data Engineering project, which can assist you in finding a job.

  • @ParthGulavani
    @ParthGulavani 2 місяці тому +1

    Hey Ankur- thanks for the great information.
    I had 1 issue pop up- the initial run command to run other notebooks is not working for me. I am using the exact same command and file name. All my notebooks are under the appleAnalysis folder. Can you please suggest a solution for this. For now I am running the entire notebook code before the main file as a workaround.

    • @TheBigDataShow
      @TheBigDataShow  2 місяці тому

      Check if you are using all the notebooks using the same cluster and then check the command `%run "./name_of_notbook"`.
      I have even provided all my notebooks in the description of the video. I have exported them in the form of HTML. Could you try importing that and then match your code? If the issue persists then kindly let us know.

    • @LalitSharma-up5hl
      @LalitSharma-up5hl 2 місяці тому +1

      Try adding run command in different cells and it will resolve even I was facing the same issue

    • @TheBigDataShow
      @TheBigDataShow  Місяць тому

      @@LalitSharma-up5hl 👏👏

  • @Amarjeet-fb3lk
    @Amarjeet-fb3lk Місяць тому +1

    Thanks for this videos.
    But,
    I thinks in real time we would be processing a very large amount of data,
    So , It will be great if you can make a video ön processing large amounts of data with all the optimisation techniques we can use.
    Thanks in advance.

    • @TheBigDataShow
      @TheBigDataShow  Місяць тому

      Thank you for the kind words first but there is a need to understand that Apache Spark with large volumes of data mostly behave the same.
      For learning, better to stick to fundamentals and it's not necessary that all the optimization techniques like Broadcast JOIN, Salting, Skewness handling etc can be only done with large data.
      These are just a technique which can be implemented with any volume of data. One just has to keep his mind open when implementing these techniques. There is no need to memories by watching them. Just implement those, and even in real world and work, you will be pretty comfortable.
      I hope you will understand this and start implementing it instead of waiting for large data.
      I have not chosen a large dataset for this demonstration because after every run, spark will take more time & which will increase the length of the video.
      To learn technology like Apache Spark, one have to keep her or his imagination open and don't memories every thing by watching a demo. Better to implement.

    • @TheBigDataShow
      @TheBigDataShow  Місяць тому

      Thank you for your kind words. Once you finish the project, please create a GitHub repository and share the link to the repository, along with a link to your LinkedIn profile. I will give a shout-out to your profile on LinkedIn. This will help you expand your network and showcase your skills through a complete Data Engineering project, which can assist you in finding a job.

  • @dante421
    @dante421 Місяць тому +1

    Will i be able to switch into data engineering after watching and practicing the project ? Will i be able to tell my interview that i done this project in my current company?

    • @TheBigDataShow
      @TheBigDataShow  25 днів тому

      Yes but you have to work hard and learn all the concepts. Just completing one project will not help you to get a job. You have to learn multiple technology and frameworks for getting into Data Engineering domain.

  • @technomissilecraft4532
    @technomissilecraft4532 2 місяці тому

    How we do schedule pipeline? Thanks , What we use in industry to Schedule job.

    • @TheBigDataShow
      @TheBigDataShow  2 місяці тому

      Mostly Data Engineers use Airflow or Astronomer(Enterprise version of AirFlow). For the DataBricks environment, people also use the Workflow.
      Workflow is not available in the community version of DataBricks

  • @Sri-jf4sf
    @Sri-jf4sf 2 місяці тому +1

    Can you please make deployment as well?

  • @amaanabdul5195
    @amaanabdul5195 2 місяці тому +3

    Same here

    • @TheBigDataShow
      @TheBigDataShow  2 місяці тому +1

      Please click on bell 🔔 icon. So you don't miss the notification before the start of the video. We are as excited as you to make this video live

    • @TheBigDataShow
      @TheBigDataShow  Місяць тому

      Thanks for your kind words. After completing the project, please create a GitHub repository and share the link to the repository, as well as a link to your LinkedIn profile. I will give a shout-out to your profile on LinkedIn. This will help you expand your network and showcase your skills through a complete Data Engineering project, which can assist you in finding a job.

  • @0adarsh101
    @0adarsh101 2 місяці тому +3

    can we get next project on real time data using Kafka or something like that.

    • @TheBigDataShow
      @TheBigDataShow  2 місяці тому +2

      Already planning this.

    • @0adarsh101
      @0adarsh101 Місяць тому +1

      On utube there are some projects but they r very simple. Please plan one complex project with a proper problem statement n solution. It is a request.😊

    • @TheBigDataShow
      @TheBigDataShow  Місяць тому +1

      This was our first End-to-end project. Already some more complex projects in pipeline

    • @0adarsh101
      @0adarsh101 Місяць тому

      @@TheBigDataShow Thanks

  • @014_amitdwivedi6
    @014_amitdwivedi6 28 днів тому +1

    Sir in first pipeline I am getting error that str object has no attribute write

    • @TheBigDataShow
      @TheBigDataShow  27 днів тому

      Share the code snippet where you are getting errors and have you StackOverflow it?

    • @dante421
      @dante421 25 днів тому

      Sir can u please reply to my question ​@@TheBigDataShow

  • @gagansingh3481
    @gagansingh3481 Місяць тому

    Where do we learn pyspark from scratch to advance with databricks

  • @yudhveersingh8177
    @yudhveersingh8177 2 місяці тому +1

    Sir aapka pyspark ka full course available h kya

    • @TheBigDataShow
      @TheBigDataShow  2 місяці тому +1

      Not full course till now but we are releasing Apache Spark interview questions one by one. You can find an initial video in this playlist.
      ua-cam.com/video/NhYGVUuUVFg/v-deo.html&pp=gAQBiAQB

    • @saikatduttaece50
      @saikatduttaece50 Місяць тому

      @@TheBigDataShow batao, full course me v itna nahi sikhaega, fir v course chahie.

  • @shubhamkashid6919
    @shubhamkashid6919 Місяць тому +1

    Please break down the video into topics.

  • @srutishriyasahu1556
    @srutishriyasahu1556 Місяць тому +1

    hyyy!!!! There is an error in filepath while giving the table name of customer delta table it asking me to give absolute file name after giving the delta table name at 1:22 pls help me out

    • @TheBigDataShow
      @TheBigDataShow  Місяць тому

      Hi Sruti
      Will you please point out the timestamp from the video?
      The Delta table is deleted once your cluster is deleted after some time in the community version of DataBricks. I have explained this in the later part of the video.
      You can always restart a brand new cluster and again create the delta table again. Because after every auto delete of the cluster, the DELTA table will be deleted .
      You can always create the DELTA table. Either using DataBricks UI or notebook.
      You might have to delete the original files which are behind the delta table while recreating it.
      If you move forward in the video. I have demonstrated all these steps.

    • @srutishriyasahu1556
      @srutishriyasahu1556 Місяць тому +1

      i have already created a new cluster from there I create the delta table still it showing me( IllegalArgumentException: Path must be absolute: default.customer_delta_table_persistt) this type of error .the timestamp is 1:22

    • @TheBigDataShow
      @TheBigDataShow  Місяць тому

      @@srutishriyasahu1556 no worries, move ahead in the video. You will be able to solve it. Customer tables will be only used after our first problem statement which is near to 1 hour 45 min. I have demonstrated how to solve it. Don't worry. Move forward with the demonstration

    • @srutishriyasahu1556
      @srutishriyasahu1556 Місяць тому +1

      Ok thank you sir 😊

    • @srutishriyasahu1556
      @srutishriyasahu1556 Місяць тому +1

      Ok thank you sir 😊

  • @sangramshinde8599
    @sangramshinde8599 2 місяці тому +1

    Sir will do complete project today only

    • @TheBigDataShow
      @TheBigDataShow  2 місяці тому

      Yes, I have created a more than 3-hour video demonstrating two pipelines using Apache Spark today. After live, you can find the video in our PySpark Practice - Tutorial.
      Nd I am not Sir 😄 I am just Ankur. Only Ankur is fine

    • @TheBigDataShow
      @TheBigDataShow  Місяць тому

      Thank you for your kind words. This motivates me to make more and better videos. Please check the community section/tab of the channel. We have created and collected over 1000 of the most asked Data Engineering questions. We have made all these questions in the form of MCQs so that you can solve them and learn from them.
      Search our Channel Name - The Big Data Show, on UA-cam -> Go to the channel -> Then click on community tab

    • @TheBigDataShow
      @TheBigDataShow  Місяць тому

      Thank you for your kind words. Once you finish the project, please create a GitHub repository and share the link to the repository, along with a link to your LinkedIn profile. I will give a shout-out to your profile on LinkedIn. This will help you expand your network and showcase your skills through a complete Data Engineering project, which can assist you in finding a job.

  • @SurajJidge
    @SurajJidge 2 місяці тому +1

    Could you please provide a dataset links

    • @nishabansal2978
      @nishabansal2978 2 місяці тому +2

      drive.google.com/drive/u/0/mobile/folders/1G46IBQCCi5-ukNDwF4KkX4qHtDNgrdn6?usp=sharing
      Link for the dataset, if you face any access issue, do mention in comment

    • @TheBigDataShow
      @TheBigDataShow  2 місяці тому

      Please check the link, which Nisha has shared. Please let us know if it is accessible of not

    • @TheBigDataShow
      @TheBigDataShow  Місяць тому

      Thank you for your kind words & hope you were able to access the datasets. Once you finish the project, please create a GitHub repository and share the link to the repository, along with a link to your LinkedIn profile. I will give a shout-out to your profile on LinkedIn. This will help you expand your network and showcase your skills through a complete Data Engineering project, which can assist you in finding a job.

  • @maazahmedansari4334
    @maazahmedansari4334 8 днів тому

    Replied in my previous question but it seems not visible so making again.
    Getting in first pipeline.
    AnalysisException: Failed to merge fields 'customer_id' and 'customer_id'
    Any suggestion would be appreciated. Thank you
    Please find the ap code I am trying to follow along here:
    github.com/maaz-ahmed-ansari/apple-product-analysis/tree/main

    • @maazahmedansari4334
      @maazahmedansari4334 4 дні тому

      2nd pipeline is working as expected. Still bashing my mind around 1st pipeline. Can someone suggest how to resolve above error?