Designing a Data Pipeline | What is Data Pipeline | Big Data | Data Engineering | SCALER
Вставка
- Опубліковано 7 лют 2025
- Watch Shashank Mishra (Data Engineer III, Expedia) explain how to create a data pipeline in this exclusive tutorial video. Check out our FREE masterclasses by leading industry experts now: www.scaler.com...
When designing data pipelines, there are several elements to consider, and early decisions have significant consequences for future performance.
However, before we can grasp the design process, we must first understand what a data pipeline is.
A data pipeline is a series of elements that automatically gather, organise, transfer, convert, and process data from one place to another, ensuring that the data is in a usable condition for enterprises to allow a data-driven culture.
Put in simpler words,
A data pipeline is a series of activities that ingest raw data from many sources and transport it to a destination for storage and analysis.
The most important thing is to have a flexible, scalable data pipeline that can adapt to new use cases for data while scaling as your data volume grows.
A data engineer is someone who acts as a gatekeeper and facilitator of data transit and storage. They create data reservoirs and are critical in managing such reservoirs.
Data Engineering, as a profession is fast gaining pace in the data science ecosystem, and due to the strong need for Data Engineers, even FAANG companies are willing to spend a fortune for qualified individuals.
The following topics are covered in this designing a Data Pipeline video:
00:00 - Introduction
00:57 - Understanding of data domains
02:57 - Choosing data sources
04:43 - Determine the data ingestion strategy
08:37 - Design the data processing plan
11:11 - Set up storage for the pipeline output
13:19 - Plan the data workflow
14:42 - Monitoring and governance tools
16:22 - Designing a demo data pipeline
--------------------------------------- About SCALER -------------------------------------------------
A transformative tech school, creating talent with impeccable skills. Upskill and Create Impact.
Learn more about Scaler: bit.ly/3PqyUyS
📌 Follow us on Social and be a part of an amazing tech community📌
👉 Meet like-minded coder folks on Discord - / discord
👉 Tweets you cannot afford to miss out on - / scaler_official
👉 Check out student success stories, expert opinions, and live classes on Linkedin - / scalerofficial
👉 Explore value-packed reels, carousels and get access to exclusive updates on Instagram - / scaler_official
📢 Be a part of our one of a kind telegram community: t.me/Scalercom...
🔔 Hit that bell icon to get notified of all our new videos 🔔
If you liked this video, please don't forget to like and comment. Never miss out on our exclusive videos to help boost your coding career! Subscribe to Scaler now!
www.youtube.co...
#datapipeline #dataengineering #bigdata #SCALER
Check out our FREE masterclasses by leading industry experts now: bit.ly/3Apojjv
I think scaler should have separate course for Data engineering with Dsa and system design with industry level courses as most of guys are working in data engineer field than as Data science
Waiting for such quality course to move into product based company
@@ankitKumar-js1ow Till now they do not have a plan/module for Data Engineering .They are simply not interested ..And what they have is DE is just not digestable
Regular content. Can be easily searched over internet.
Haha
Paid Content is terrible .
@@sandeepdash5652 is it?
@@sanilkumarbarik9151 For DataEngineers its horrible ..not worth enough the time and money if you join for learning DE
Great Content
Thank you for talking about a demo pipeline, this could come in handy in interviews.
Excellent presentation. Presented very nicely, concisely, and to the point.
Shashank just makes everything so easy to understand
I just wanna say thank you for this video
helps to see the big picture, thank you very much :)
Good one thanks
Very well explained and all important topics were covered, thankyou for your efforts. Very helpful.
Thanks! Glad this was helpful! 😃
Thank you for brilliant video
This is really really a very detailed and great explanation of end-to-end data pipeline building architecture. Hatsoff to your hardwork and putting this video out there for us brother. It will definitely clear the doubts and picture about how pipeline work for data migration/ingestion/integration based projects.
Thanks a lot. 🙏
Thanks! Glad this was helpful! 😃
Can't wait!
Good content . Thank you🙏
Thank you scaler
Thanks
here is a summary:
00:57 - Understanding of data domains (example: finance data terminology, what is the relationship, primary key, foreign key. Give business side a clear image what can data engineers provide)
02:57 - Choosing data sources (example: sql database, distributed file system, API, sensor data, web application generated)
04:43 - Determine the data ingestion strategy( full load or incremental load)
08:37 - Design the data processing plan (pipeline design real-time process, or batch process)
11:11 - Set up storage for the pipeline output ( amazon s3 HDFS for datalake, AWS redshift, Hive for datawarehouse, dump back in transational databases)
13:19 - Plan the data workflow (scheduler, Apache airflow, apache nifi, Azkaban)
14:42 - Monitoring and governance tools (alert for pipeline failing, tools: Kibana, Grafana, DataDog, PagerDuty)
Thanks scaler! 🔥
Well presented, thanks
How can NOSQL (specifically Cassandra, MongoDB ) be good for ad-hoc analytical queries as mentioned during 12:05?
Thanks Shashank for explaining in very understandable manner,
But i have one question you have not discussed about Staging Area??
Thank you! This was really helpful and well-explained.
Happy to hear that! 🙌🏼
I easily understand this video
Brilliant video again
Thank you.
Awesome content 🙂
Please 1 pipeline practical karke dikhao ...UA-cam PE Aisa ek bhi vdo nhiye Jo big data ki pipe line create karke dikhaya ho...
As a data engineer, should you know all of these tech before getting a job or is it acquired during one?
you can easily get an entry level job in data engineering if you know good sql, basic python, basic cloud and hadoop architecture.
thank you for the nice explanantion
Happy to hear that! 🙌🏼
very nice.. thanks a ton!
Make more vedios Gurudev thankyou very much
Double like 👍🏽
Thank you
Very nice content
Grafana is a really good monitoring tool
Awesome Video
When will complete Data Engineering course will be launched from Scaler?
Very nice 🙂
Really good Content
Need full course for Data Engineer
More Data engineering related content please
🔥🔥🔥
Scaler knows what us students are searching for on google before an exam lol
Redshift is already setup on the cloud, what about Hive?
You guys did a great job.
Here the data source is MySQL, what if there was data coming in from multiple sources.
Data Modelling part was missed I guess
Bumb explanation.What he is explaining is based on his experience.Its not at all generic.He himself needs to improve
Aadha adhura gyan
Thank you for talking about a demo pipeline, this could come in handy in interviews.
Grafana is a really good monitoring tool