Designing a Data Pipeline | What is Data Pipeline | Big Data | Data Engineering | SCALER

Поділитися
Вставка
  • Опубліковано 7 лют 2025
  • Watch Shashank Mishra (Data Engineer III, Expedia) explain how to create a data pipeline in this exclusive tutorial video. Check out our FREE masterclasses by leading industry experts now: www.scaler.com...
    When designing data pipelines, there are several elements to consider, and early decisions have significant consequences for future performance.
    However, before we can grasp the design process, we must first understand what a data pipeline is.
    A data pipeline is a series of elements that automatically gather, organise, transfer, convert, and process data from one place to another, ensuring that the data is in a usable condition for enterprises to allow a data-driven culture.
    Put in simpler words,
    A data pipeline is a series of activities that ingest raw data from many sources and transport it to a destination for storage and analysis.
    The most important thing is to have a flexible, scalable data pipeline that can adapt to new use cases for data while scaling as your data volume grows.
    A data engineer is someone who acts as a gatekeeper and facilitator of data transit and storage. They create data reservoirs and are critical in managing such reservoirs.
    Data Engineering, as a profession is fast gaining pace in the data science ecosystem, and due to the strong need for Data Engineers, even FAANG companies are willing to spend a fortune for qualified individuals.
    The following topics are covered in this designing a Data Pipeline video:
    00:00 - Introduction
    00:57 - Understanding of data domains
    02:57 - Choosing data sources
    04:43 - Determine the data ingestion strategy
    08:37 - Design the data processing plan
    11:11 - Set up storage for the pipeline output
    13:19 - Plan the data workflow
    14:42 - Monitoring and governance tools
    16:22 - Designing a demo data pipeline
    --------------------------------------- About SCALER -------------------------------------------------
    A transformative tech school, creating talent with impeccable skills. Upskill and Create Impact.
    Learn more about Scaler: bit.ly/3PqyUyS
    📌 Follow us on Social and be a part of an amazing tech community📌
    👉 Meet like-minded coder folks on Discord - / discord
    👉 Tweets you cannot afford to miss out on - / scaler_official
    👉 Check out student success stories, expert opinions, and live classes on Linkedin - / scalerofficial
    👉 Explore value-packed reels, carousels and get access to exclusive updates on Instagram - / scaler_official
    📢 Be a part of our one of a kind telegram community: t.me/Scalercom...
    🔔 Hit that bell icon to get notified of all our new videos 🔔
    If you liked this video, please don't forget to like and comment. Never miss out on our exclusive videos to help boost your coding career! Subscribe to Scaler now!
    www.youtube.co...
    #datapipeline #dataengineering #bigdata #SCALER

КОМЕНТАРІ • 61

  • @SCALER
    @SCALER  2 роки тому +3

    Check out our FREE masterclasses by leading industry experts now: bit.ly/3Apojjv

    • @ankitKumar-js1ow
      @ankitKumar-js1ow 2 роки тому +2

      I think scaler should have separate course for Data engineering with Dsa and system design with industry level courses as most of guys are working in data engineer field than as Data science
      Waiting for such quality course to move into product based company

    • @sandeepdash5652
      @sandeepdash5652 11 місяців тому

      @@ankitKumar-js1ow Till now they do not have a plan/module for Data Engineering .They are simply not interested ..And what they have is DE is just not digestable

  • @akhilcoder
    @akhilcoder 2 роки тому +29

    Regular content. Can be easily searched over internet.

    • @coding3438
      @coding3438 Рік тому

      Haha

    • @sandeepdash5652
      @sandeepdash5652 11 місяців тому

      Paid Content is terrible .

    • @sanilkumarbarik9151
      @sanilkumarbarik9151 3 місяці тому

      ​@@sandeepdash5652 is it?

    • @sandeepdash5652
      @sandeepdash5652 3 місяці тому

      @@sanilkumarbarik9151 For DataEngineers its horrible ..not worth enough the time and money if you join for learning DE

  • @surajkinagimath9700
    @surajkinagimath9700 3 дні тому +1

    Great Content

  • @ArunSingh-rk7mm
    @ArunSingh-rk7mm 2 роки тому +3

    Thank you for talking about a demo pipeline, this could come in handy in interviews.

  • @NasimKhan-vu8oi
    @NasimKhan-vu8oi 7 місяців тому +1

    Excellent presentation. Presented very nicely, concisely, and to the point.

  • @AkashKumar-kx9vj
    @AkashKumar-kx9vj 2 роки тому +1

    Shashank just makes everything so easy to understand

  • @shaistaqureshi8408
    @shaistaqureshi8408 2 роки тому +1

    I just wanna say thank you for this video

  • @arunsundar3739
    @arunsundar3739 9 місяців тому +1

    helps to see the big picture, thank you very much :)

  • @madhivanandurai452
    @madhivanandurai452 Місяць тому +1

    Good one thanks

  • @NehaSingh-wp4mf
    @NehaSingh-wp4mf 11 місяців тому

    Very well explained and all important topics were covered, thankyou for your efforts. Very helpful.

    • @SCALER
      @SCALER  11 місяців тому

      Thanks! Glad this was helpful! 😃

  • @daniyaqureshi6201
    @daniyaqureshi6201 2 роки тому +1

    Thank you for brilliant video

  • @AmitSharma-xv6sh
    @AmitSharma-xv6sh Рік тому

    This is really really a very detailed and great explanation of end-to-end data pipeline building architecture. Hatsoff to your hardwork and putting this video out there for us brother. It will definitely clear the doubts and picture about how pipeline work for data migration/ingestion/integration based projects.
    Thanks a lot. 🙏

    • @SCALER
      @SCALER  Рік тому

      Thanks! Glad this was helpful! 😃

  • @healthificteam8465
    @healthificteam8465 2 роки тому

    Can't wait!

  • @Rk-mv8sz
    @Rk-mv8sz 2 роки тому +1

    Good content . Thank you🙏

  • @shrutiikarla1055
    @shrutiikarla1055 2 роки тому

    Thank you scaler

  • @marksun6420
    @marksun6420 Рік тому +1

    Thanks

  • @StartDataLate
    @StartDataLate 8 місяців тому

    here is a summary:
    00:57 - Understanding of data domains (example: finance data terminology, what is the relationship, primary key, foreign key. Give business side a clear image what can data engineers provide)
    02:57 - Choosing data sources (example: sql database, distributed file system, API, sensor data, web application generated)
    04:43 - Determine the data ingestion strategy( full load or incremental load)
    08:37 - Design the data processing plan (pipeline design real-time process, or batch process)
    11:11 - Set up storage for the pipeline output ( amazon s3 HDFS for datalake, AWS redshift, Hive for datawarehouse, dump back in transational databases)
    13:19 - Plan the data workflow (scheduler, Apache airflow, apache nifi, Azkaban)
    14:42 - Monitoring and governance tools (alert for pipeline failing, tools: Kibana, Grafana, DataDog, PagerDuty)

  • @krishnasaksena2364
    @krishnasaksena2364 2 роки тому

    Thanks scaler! 🔥

  • @MarkyGoldstein
    @MarkyGoldstein 6 місяців тому

    Well presented, thanks

  • @TheSoumyakole
    @TheSoumyakole Рік тому +2

    How can NOSQL (specifically Cassandra, MongoDB ) be good for ad-hoc analytical queries as mentioned during 12:05?

  • @umakantyadav9972
    @umakantyadav9972 2 роки тому +1

    Thanks Shashank for explaining in very understandable manner,
    But i have one question you have not discussed about Staging Area??

  • @endpermia
    @endpermia Рік тому

    Thank you! This was really helpful and well-explained.

    • @SCALER
      @SCALER  Рік тому

      Happy to hear that! 🙌🏼

  • @tamannamam3563
    @tamannamam3563 2 роки тому

    I easily understand this video

  • @saniyasharif9861
    @saniyasharif9861 2 роки тому

    Brilliant video again

  • @obiradaniel
    @obiradaniel 2 роки тому

    Thank you.

  • @panktikhurana8906
    @panktikhurana8906 2 роки тому +1

    Awesome content 🙂

  • @it3374
    @it3374 Рік тому +2

    Please 1 pipeline practical karke dikhao ...UA-cam PE Aisa ek bhi vdo nhiye Jo big data ki pipe line create karke dikhaya ho...

  • @FaizanKhan-ct7pc
    @FaizanKhan-ct7pc 2 роки тому +1

    As a data engineer, should you know all of these tech before getting a job or is it acquired during one?

    • @Watson22j
      @Watson22j Рік тому

      you can easily get an entry level job in data engineering if you know good sql, basic python, basic cloud and hadoop architecture.

  • @avshekraj
    @avshekraj Рік тому

    thank you for the nice explanantion

    • @SCALER
      @SCALER  Рік тому +1

      Happy to hear that! 🙌🏼

  • @asishjoshi5774
    @asishjoshi5774 2 роки тому

    very nice.. thanks a ton!

  • @divyanshtayal5077
    @divyanshtayal5077 2 роки тому

    Make more vedios Gurudev thankyou very much

  • @ruthmk
    @ruthmk 10 місяців тому

    Double like 👍🏽
    Thank you

  • @cutipy433
    @cutipy433 2 роки тому

    Very nice content

  • @ramangupta6159
    @ramangupta6159 2 роки тому +1

    Grafana is a really good monitoring tool

  • @abhisekchowdhury8584
    @abhisekchowdhury8584 2 роки тому

    Awesome Video

  • @justdataengineer3138
    @justdataengineer3138 2 роки тому

    When will complete Data Engineering course will be launched from Scaler?

  • @saniyapoetry8386
    @saniyapoetry8386 2 роки тому

    Very nice 🙂

  • @saibabatelagamsetty2538
    @saibabatelagamsetty2538 2 роки тому

    Really good Content

  • @nandlaljaiswal7217
    @nandlaljaiswal7217 2 роки тому +1

    Need full course for Data Engineer

  • @Sameerkhan-kt5jj
    @Sameerkhan-kt5jj 2 роки тому

    More Data engineering related content please

  • @prachiipandeyy
    @prachiipandeyy 2 роки тому

    🔥🔥🔥

  • @piyushjain419
    @piyushjain419 2 роки тому +1

    Scaler knows what us students are searching for on google before an exam lol

  • @shanayakhan839
    @shanayakhan839 2 роки тому

    Redshift is already setup on the cloud, what about Hive?

  • @parisreview4651
    @parisreview4651 2 роки тому

    You guys did a great job.

  • @PankajKumar-vv5db
    @PankajKumar-vv5db 2 роки тому

    Here the data source is MySQL, what if there was data coming in from multiple sources.

  • @bangalibangalore2404
    @bangalibangalore2404 2 роки тому

    Data Modelling part was missed I guess

  • @ashutoshrai5342
    @ashutoshrai5342 2 роки тому +1

    Bumb explanation.What he is explaining is based on his experience.Its not at all generic.He himself needs to improve

  • @nemodbuniversity
    @nemodbuniversity 2 роки тому

    Aadha adhura gyan

  • @sheenagupta896
    @sheenagupta896 2 роки тому +1

    Thank you for talking about a demo pipeline, this could come in handy in interviews.

  • @fazaila2047
    @fazaila2047 2 роки тому

    Grafana is a really good monitoring tool