What is Data Pipeline? | Why Is It So Popular?

Поділитися
Вставка
  • Опубліковано 16 гру 2024

КОМЕНТАРІ • 128

  • @livestriming289
    @livestriming289 2 місяці тому +5

    00:01 Data pipelines automate data collection, transformation, and delivery.
    00:38 Data pipeline involves stages like collect, ingest, store, compute, and consume.
    01:18 Data pipeline captures live data feeds for real-time tracking.
    01:56 Data pipeline involves batch and stream processing of ingested data
    02:40 Data pipeline tools like Apache Flink and Google Cloud are used for real-time processing of data streams.
    03:23 Data is transformed for analysis in storage phase
    04:05 Data pipelines enable various end users to leverage data for predictive modeling and business intelligence tools.
    04:47 Data pipeline enables continuous learning and improvement using machine learning models.
    Crafted by Merlin AI.

  • @ryanbent9368
    @ryanbent9368 Місяць тому +9

    I work as the PM in data enablement, this video was amazing for understanding each component in a data pipeline.

  • @petenjs3500
    @petenjs3500 6 місяців тому +80

    3:13 typo *AWS Glue.
    Love these vids, thanks!

    • @Nonenone-rj9yp
      @Nonenone-rj9yp 6 місяців тому +3

      bruh had me googling whats AWS glow

  • @robbrooks956
    @robbrooks956 20 днів тому +2

    just learned more in 5 minutes than I learned in 5 years. instant subscribe. thank you!

  • @SudhanvaDixit
    @SudhanvaDixit 6 місяців тому +52

    0:49 Shouldn't the last one be 'Consume'?

    • @TrusePkay
      @TrusePkay 2 місяці тому

      Yeah...error but you can the video has to be published. They cannot go back to edit from the beginning

  • @mrseanpaul81
    @mrseanpaul81 6 місяців тому +4

    I love the short video format, as I can dive deeper on topics and terms I am interested in on my own time :)

  • @prasenjeetrathore
    @prasenjeetrathore 6 місяців тому +4

    Amazing explanation, so far the most easy to digest video about data pipelines.

  • @Iamine1981
    @Iamine1981 2 місяці тому

    Great video. Showed me the fundamentals of data pipelines and processes from collection to consumption. There are so many tools/applications extensively used for data processing at various stages that I have never heard of, or only encounter in job descriptions, but since I am not a data specialist, I had no idea of! Thanks for putting these short summaries online. Helpful for people like myself!

  • @TrusePkay
    @TrusePkay 2 місяці тому +1

    The best animated introduction to data pipelines in just five minutes.

  • @gavins1910
    @gavins1910 7 днів тому

    Great overview of data pipelines! Thanks!

  • @atomtamadas
    @atomtamadas 6 місяців тому +4

    Spark is widely used in stream processing too, not only batch, see spark structured streaming.

    • @patrickm.4754
      @patrickm.4754 5 місяців тому +2

      For stream processing, Apache Flink is more suited. Even though both can do stream and batch processing.

  • @rohithreddy75
    @rohithreddy75 Місяць тому

    Your channel is a blessing.

  • @user-data_junkie
    @user-data_junkie 6 місяців тому +12

    What do you use to create these animations/info graphics

    • @knighthawk095
      @knighthawk095 6 місяців тому

      I think it could be either figma or canvas.

    • @user-data_junkie
      @user-data_junkie 5 місяців тому

      @@Biostatistics is there a video out there that shows how that is done in power point? I see these data like infographics a lot these days

    • @Biostatistics
      @Biostatistics 5 місяців тому +3

      @@user-data_junkieit’s says in the description of this video, he used Adobe illustrator and after effects. 😊

    • @user-data_junkie
      @user-data_junkie 5 місяців тому +1

      @@Biostatistics thanks. I did check at the time and did not see anything. Appreciate the update

  • @mwanthidaniel1254
    @mwanthidaniel1254 6 місяців тому +8

    Which tool do you use to create these animated presentations?

    • @jay51200
      @jay51200 6 місяців тому

      Trade secret 😂

    • @chrisalmighty
      @chrisalmighty 5 місяців тому

      I also want to know what he uses to create the presentation illustrations. They look neat

  • @gus473
    @gus473 6 місяців тому +3

    💯 Looking like your channel is on track for 1 million subscribers by year end! Great stuff! 😎✌️

  • @lumielz1495
    @lumielz1495 Місяць тому

    This video was amazing for understanding, thank you 🤗🤗

  • @ChrisPatt
    @ChrisPatt 5 місяців тому +1

    Loved how simply you explained this complicated concept! Also what are your thoughts on Irys, world's only provenance layer ensuring the data integrity and accountability.

  • @ttehir
    @ttehir 6 місяців тому +55

    Why do we mostly talk about data pipelines for BI or ML when many times we also need it for functional applications?

    • @personalbranddata
      @personalbranddata 6 місяців тому +3

      Those functional applications should likely use the same data platform for their functional applications, the only difference is how you're serving the transformed result. What's the difference then that you think should be talked about?

    • @manishshaw1002
      @manishshaw1002 6 місяців тому +17

      Functional applications are most likely consume very small amount of data while BI and AI ML models required way more likely gb to TB amount of data to work with.
      There's no possible way you can load 1gb of data in your web app or sql it just makes your app clogging and time consuming.

    • @JB-ve8sk
      @JB-ve8sk 6 місяців тому

      Because more and more non-traditionally technical business roles are leveraging data for business intelligence - so the demand for understanding these concepts is greater there (than in complex application architectures where more traditional technical skill accumulates).

    • @deadohiosky1701
      @deadohiosky1701 6 місяців тому

      Just call it messaging and you’re good to go

    • @JustinLietz
      @JustinLietz 5 місяців тому

      @@manishshaw1002this isn’t always true, at the health insurance company I work at we have functional applications that internal users and providers use to view data about members and there are vast amounts of data streaming to and from these applications

  • @aamirsuleman9815
    @aamirsuleman9815 5 місяців тому

    I think you meant AWS Glue 3:18. Appreciate these informative videos

  • @zobaidulkaziex
    @zobaidulkaziex 6 місяців тому +5

    Very good discussion

  • @Captplanet23
    @Captplanet23 5 місяців тому +1

    Why is Apache Flink not an option for batch processing? As I understand it, it makes more sense to use the same computation frameworks when doing both, so why not use Flink for both given Flink can support batch jobs?

  • @mdhalima5682
    @mdhalima5682 2 місяці тому

    tq very much .mind blowing explantion

  • @immanuelt613
    @immanuelt613 6 місяців тому +1

    Top quality work as always

  • @thetatso9462
    @thetatso9462 Місяць тому

    thanks for the knowledge you share

  • @uplifting_sounds
    @uplifting_sounds 6 місяців тому +2

    I like your presentations. What do you use to make them?

    • @chrisalmighty
      @chrisalmighty 5 місяців тому

      I also want to know what he uses to create the presentation illustrations. They look neat

    • @LUISITOTHETROLL
      @LUISITOTHETROLL 3 місяці тому

      @@chrisalmighty Adobe illustrator and after effects

  • @husseineldeeb
    @husseineldeeb 6 місяців тому +1

    Amazing video. Thanks for your great efforts!

  • @jaykukreja7125
    @jaykukreja7125 6 місяців тому

    Love it. This jargon cleared now

  • @harsh5402
    @harsh5402 4 місяці тому

    one-stop shop Video . loved it ♥

  • @EricaErica-ey2nb
    @EricaErica-ey2nb Місяць тому

    Love the presentation, Do you recommend some resource to do it?

  • @kartikmahajan4405
    @kartikmahajan4405 4 місяці тому

    this was very useful. thanks for sharing.

  • @fabiodonascimentopatao8544
    @fabiodonascimentopatao8544 4 місяці тому

    Very Good Video!! Easy to get!

  • @chrisc9725
    @chrisc9725 5 місяців тому

    Fantastic video and graphics, what program do you use to animate your graphics? It's great stuff.

  • @heangsok862
    @heangsok862 2 місяці тому +1

    Suppose we have 100 microservices deployed as different AWS Lambda functions. Out of these, more than 30 Lambda functions need to write data to MongoDB Atlas. Each of these 30 functions is triggered simultaneously via SNS (Simple Notification Service), and each function will be invoked 200,000 times to process different data.
    Given this setup, the MongoDB Atlas connection limit will likely be exhausted due to the large number of simultaneous requests.
    What would be the best approach to handle this scenario without running into connection problems with MongoDB Atlas? May you create a video for this scenario, sir?

  • @Paul__xa1r
    @Paul__xa1r 5 місяців тому

    Important information about refunds: what a joy

  •  6 місяців тому

    Is GA4 consider a data stream? And big query a storage and transform tools?

  • @chrisalmighty
    @chrisalmighty 5 місяців тому

    Video illustrations look neat. What tool did you use create the presentation illustrations?

  • @debajitkataki2085
    @debajitkataki2085 Місяць тому

    3:13 , what is AWS Glow ? Typo ??

  • @raj_kundalia
    @raj_kundalia 6 місяців тому

    Thank you for doing this!

  • @vlplbl85
    @vlplbl85 6 місяців тому

    Great video. Small remark: the AWS service for ETL is called AWS Glue, not Glow

  • @sreenivasreddypallerla9941
    @sreenivasreddypallerla9941 6 місяців тому

    Very informative !! But how you do all these animations ??what product do you use !!

  • @Jaan-g9r
    @Jaan-g9r 21 день тому

    4ra ke technical features sabse best hain, pura reliable platform hai bhai

  • @bladethirst1
    @bladethirst1 6 місяців тому

    Maybe some examples of simplified pipeline on specific application would make this video even better.

  • @marcgentner1322
    @marcgentner1322 6 місяців тому

    So i need to build a way so retrieve man many emails and categorize them with a ml model and then save them in the right system. Do i build this with kafka and pyspark? Or how can this be done easaly

    • @jay51200
      @jay51200 6 місяців тому

      Kafka dear

  • @jordanfarr3157
    @jordanfarr3157 6 місяців тому

    Always so so good

  • @yongguangli3304
    @yongguangli3304 6 місяців тому

    请问这些精美的图是怎么画的?太赞了

  • @kiyelhojakhalu8817
    @kiyelhojakhalu8817 21 день тому

    Mujhe 4ra pe fair play bohot pasand aaya, jitne ka pura chance milta hai

  • @kavitakokatare9085
    @kavitakokatare9085 21 день тому

    Bhai 4ra ke technical support bhi quick response dete hain, full satisfaction

  • @JasonLayton
    @JasonLayton 5 місяців тому

    Intimidating!

  • @mePrafull
    @mePrafull 2 місяці тому

    Thanks!

  • @NeelkantSharma-ku5cm
    @NeelkantSharma-ku5cm 21 день тому

    4ra pe games aur betting ka experience bohot acha hota hai, fair winnings aur good odds

  • @AdityaTyagi-c1m
    @AdityaTyagi-c1m 3 місяці тому

    I want to learn system design for data pipelines
    Could you please suggest how to proceed ? What books ?

  • @Mr_sunny_boy794
    @Mr_sunny_boy794 21 день тому

    4ra pe odds bhi bohot ache milte hain, jeetne ka chance hamesha high hota hai

  • @rishiraj2548
    @rishiraj2548 6 місяців тому +2

    Thanks

  • @markwallstrom9994
    @markwallstrom9994 6 місяців тому +1

    No mention of Apache Iceberg and such technology?

  • @mikedepacina8588
    @mikedepacina8588 6 місяців тому +1

    Aws glow or aws glue?

  • @saratpoluri
    @saratpoluri 5 місяців тому

    Bravo!

  • @Mr.Andrew.
    @Mr.Andrew. 6 місяців тому +2

    Your diagram had compute arrows twice when you verbally said compute and consume for the last two phases.

  • @helikopter1231
    @helikopter1231 Місяць тому

    Why would ETL here be considered as real time when ETL is slower as you need to transform every single extraction before you load it into a db warehouse?

  • @VikramPatilvp
    @VikramPatilvp 6 місяців тому +1

    Looks like your examples are only AWS or Google stack. Why not cover examples from MS Azure stack as well?

  • @VishnuVijayan7
    @VishnuVijayan7 6 місяців тому

    Did not make a mention on data lakehouse

  • @selvammanickam2743
    @selvammanickam2743 21 день тому

    4ra pe odds bohot ache hain, isliye maza aata hai bet karne me

  • @ArjunSingh-ko1py
    @ArjunSingh-ko1py 21 день тому

    Yaar 4ra pe bets lagana easy hai, aur winnings bhi achi hoti hain

  • @hoomanmohammadi-c6l
    @hoomanmohammadi-c6l 5 місяців тому

    you have an error in diagram, 2 computes, it should be compute and consume

  • @padamatimypalyadav
    @padamatimypalyadav Місяць тому +1

    ❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤

  • @sridevymourougappan3438
    @sridevymourougappan3438 21 день тому

    4ra pe betting ka maza hi kuch aur hai, odds bhi bohot ache milte hain, real fun

  • @arvindraj8877
    @arvindraj8877 3 місяці тому

    Tomorrow i have an interview :)

  • @Ashley-gi1vt
    @Ashley-gi1vt 20 днів тому

    Do data analysts build data pipelines?

  • @eddielim8888
    @eddielim8888 6 місяців тому

    AWS Glow or Glue?

  • @internetexplorer1593
    @internetexplorer1593 6 місяців тому +1

    Leaving out all Azure tools... really a shame

    • @scottedmiston6566
      @scottedmiston6566 6 місяців тому

      Maybe it's intentional. Many serious data scientists aren't fond of the Azure UI for big data pipelines.

    • @JB-ve8sk
      @JB-ve8sk 6 місяців тому

      Microsoft training has that covered

  • @williamchurch711
    @williamchurch711 5 місяців тому

    So basically a data pipeline is similar to a system flowchart?

  • @jay51200
    @jay51200 6 місяців тому

    "Trade Secret" name of the tool used to create the animations ...😂

  • @jaybabari.nagwada6079
    @jaybabari.nagwada6079 21 день тому

    4rabet pe betting ka maza isliye hai kyunki odds hamesha sahi milte hain

  • @andreslasvegas30
    @andreslasvegas30 6 місяців тому +1

    I dont know why but the gain of the microphone is too high, there is a little background noise and its a bit noticeable, keep it in check.
    Great video, as always in the channel.

  • @GAMINGULTRA-g3z
    @GAMINGULTRA-g3z 21 день тому

    4rabet pe winnings jaldi milti hain, wait nahi karna padta

  • @zyladd6176
    @zyladd6176 20 днів тому

    This seems so complicated

  • @slayerzerg
    @slayerzerg 2 місяці тому

    AWS Glue*

  • @dimitrikalinin3301
    @dimitrikalinin3301 6 місяців тому

    AWS Glue, not Glow

  • @CurmeLeila-m5w
    @CurmeLeila-m5w 2 місяці тому

    Lopez Robert Lee Gary Williams Christopher

  • @kamaljeet-wj3nf
    @kamaljeet-wj3nf 21 день тому

    4rabet pe betting karte waqt kabhi bhi unfair nahi laga, sab kuch transparent hai

  • @ONeilPoppy-l1k
    @ONeilPoppy-l1k 2 місяці тому

    Davis Jose Harris Christopher Jackson Ronald

  • @checkerist
    @checkerist 6 місяців тому

    apache hive logo is on acid

  • @thesimplicitylifestyle
    @thesimplicitylifestyle 6 місяців тому

    😎🤖

  • @vickyg1877
    @vickyg1877 6 місяців тому

    Rest api

  • @albinantony4998
    @albinantony4998 6 місяців тому

    looks like you need to change the mic you are currently using. there is some crackling noise when you talk.

  • @johnsmith21123
    @johnsmith21123 6 місяців тому +3

    Hadoop is dead

    • @praveens2272
      @praveens2272 6 місяців тому +1

      Why, what's the reason

    • @JohnS-er7jh
      @JohnS-er7jh 6 місяців тому +5

      they said that about Mainframe computers 30 years ago, but they are still here/in production. Large organizations are not going to adopt the latest solutions for all there data needs (for instance data that isn't accessed that often/specific use cases, or they might have support staff that is more familiar with legacy tools and they don't see the need to adopt latest methods at the moment). So I can guarantee Hadoop is NOT completely dead.

    • @angryktulhu
      @angryktulhu 6 місяців тому +3

      Lol it’s not dead at all, and its ecosystem tools are still widely used

    • @shilashm5691
      @shilashm5691 6 місяців тому +1

      😂 most uses hdfs as data lake, when you say hadoop.is dead be precise and say mapreduce.is dead, bcoz hadoop ecosystem is large and still functioning

    • @personalbranddata
      @personalbranddata 6 місяців тому

      @@shilashm5691 Most use AWS S3 as storage for their datalake, others Azure Data Lake Storage. MapReduce is dead and HDFS is on the brink of obscurity as well. I pity those who still have to work with some inhouse hdfs from the darkest and most painful era of data engineering (hadoop era)

  • @tetratessera8825
    @tetratessera8825 2 місяці тому

    I like your content a lot but you have a lot of mistakes. Not only in this video but also in the others.
    Mislabeling, duplicities. It might get confusing a lot for a beginner. Similarly if you are using acronyms I would recommend explaining them or at least stating the full name

  • @Jaan-g9r
    @Jaan-g9r 21 день тому

    4rabet pe betting aur winnings ka process bohot transparent hai, sab kuch fair lagta hai

  • @vijaydeshmukh3774
    @vijaydeshmukh3774 21 день тому

    Treu bro 4rabet pe bet place karna fast aur simple hai, kabhi bhi delay nahi hota

  • @AmitKumar-mi4hw
    @AmitKumar-mi4hw 21 день тому

    Bhai 4rabet pe betting itni fair hoti hai, kabhi bhi lagta nahi ke cheating ho rahi hai

  • @AnkitKumar-7114Class-IIA
    @AnkitKumar-7114Class-IIA 21 день тому

    4ra ke technical features sabse best hain, pura reliable platform hai bhai

  • @DeepanshiCHAUDHARY43
    @DeepanshiCHAUDHARY43 21 день тому

    Mujhe 4ra pe fair play bohot pasand aaya, jitne ka pura chance milta hai

  • @Rajusahu-ev1qx
    @Rajusahu-ev1qx 21 день тому

    Treu bro 4rabet pe bet place karna fast aur simple hai, kabhi bhi delay nahi hota

  • @madarsabjorammanavar6818
    @madarsabjorammanavar6818 21 день тому

    4ra pe betting ka maza hi kuch aur hai, odds bhi bohot ache milte hain, real fun

  • @jyotshah8956
    @jyotshah8956 21 день тому

    Bhai 4rabet pe betting itni fair hoti hai, kabhi bhi lagta nahi ke cheating ho rahi hai

  • @मानोयानमानो-छ2ब

    4rabet pe betting karte waqt kabhi bhi unfair nahi laga, sab kuch transparent hai

  • @WIXIFIR7X
    @WIXIFIR7X 21 день тому

    Bhai 4ra ke technical support bhi quick response dete hain, full satisfaction

  • @MamtaSharma-rl9jw
    @MamtaSharma-rl9jw 21 день тому

    4ra ke technical features sabse best hain, pura reliable platform hai bhai

  • @DwipKandar
    @DwipKandar 21 день тому

    4ra pe betting ka maza hi kuch aur hai, odds bhi bohot ache milte hain, real fun