35
23 257

Streaming Aggregates in Spark : Tumbling vs Sliding Windows with Kafka

7:11

Spark Structured Streaming Sinks and foreachBatch

8:51

Spark Structured Streaming Output Mode | Append| Update | Complete Modes

8:11

Spark Structured Streaming Checkpoint

6:26

Spark Structured Streaming Trigger Types

7:19

Spark Structured Streaming Introduction

10:42

Handling Late Arriving Data in Spark Structured Streaming with Watermarks

Spark Structured Streaming Sinks and foreachBatch Explained
In this video, we explore the different sinks available in Spark Structured Streaming and how to use the powerful foreachBatch sink for custom processing.
📽️ Chapters to Explore
0:00 Introduction
0:13 What is late data
3:29 State store
5:36 Rocks DB state store
5:59 Handle late data using watermarks
💻 Code is available in GitHub: github.com/anirvandecodes/Spark-Structured-Streaming-with-Kafka
🌟 Stay Connected and Level Up Your Data Engineering Skills!
🔔 Subscribe Now: www.youtube.com/@anirvandecodes?sub_confirmation=1
🤝 Let's Connect: www.linkedin.com/in/anirvandecodes/
🎥 Explore Playlists Designed for You:
🚀 Spark Structured Streaming with Kafka: ua-cam.com/play/PLGCTB_rNVNUNbuEY4kW6lf9El8B2yiWEo.html
🛠️ DBT (Data Build Tool): ua-cam.com/play/PLGCTB_rNVNUON4dyWb626R4-zrLtYfVLa.html
🌐 Apache Spark for Everyone: ua-cam.com/play/PLGCTB_rNVNUOigzmGI6zN3tzveEqMSIe0.html
📌 Love the content? Show your support! Like, share, and subscribe to keep learning and growing in data engineering. 🚀
Song: Dawn
License: Creative Commons (CC BY 3.0) creativecommons.org/licenses/by/3.0
open.spotify.com/artist/5ZVHXQZAIn9WJXvy6qn9K0
Music powered by BreakingCopyright: breakingcopyright.com

Відео

Streaming Aggregates in Spark : Tumbling vs Sliding Windows with Kafka

7:11

Streaming Aggregates in Spark : Tumbling vs Sliding Windows with Kafka

Переглядів 63Місяць тому

Streaming Aggregates in Spark - Tumbling vs Sliding Windows with Kafka In this video, we break down the concept of streaming aggregates in Spark Structured Streaming and explain the difference between tumbling and sliding windows. Using Kafka as the data source, we demonstrate how to effectively process and aggregate real-time data. 📽️ Chapters to Explore 0:00 Introduction 0:20 Use Case for str...

Spark Structured Streaming Sinks and foreachBatch

8:51

Spark Structured Streaming Sinks and foreachBatch

Переглядів 72Місяць тому

Spark Structured Streaming Sinks and foreachBatch Explained This video explores the different sinks available in Spark Structured Streaming and how to use the powerful foreachBatch sink for custom processing. 📽️ Chapters to Explore 0:00 Introduction 0:25 Types of sinks 0:50 Memory Sink 2:08 Kafka Sink 4:00 Delta Sink 4:30 toTable does not support update mode 4:53 How to use foreachBatch 💻 Code ...

Spark Structured Streaming Output Mode | Append| Update | Complete Modes

8:11

Spark Structured Streaming Output Mode | Append| Update | Complete Modes

Переглядів 49Місяць тому

Spark Structured Streaming Output Modes Explained In this video, we explore the output modes in Spark Structured Streaming, a crucial concept for controlling how processed data is written to sinks. Understanding these modes helps in designing efficient and accurate streaming pipelines. 📽️ Chapters to Explore 0:00 Introduction 0:45 Complete Mode 2:30 Update Mode 4:24 Append Mode 5:32 How to use ...

6:26

Spark Structured Streaming Checkpoint

Переглядів 62Місяць тому

Understanding Spark Structured Streaming Checkpoints In this video, we dive deep into checkpoints in Spark Structured Streaming and their critical role in ensuring fault-tolerant and stateful stream processing. 📽️ Chapters to Explore 0:00 Introduction 0:20 Why Checkpoint is required? 1:50 How to define checkpoint 2:15 Content of a checkpoint folder 4:26 Kafka offset information in checkpoint 5:...

Spark Structured Streaming Trigger Types

7:19

Spark Structured Streaming Trigger Types

Переглядів 68Місяць тому

In this video, we dive into Spark Structured Streaming Trigger Modes-a key feature for managing how your streaming queries process data. Whether you're working with real-time data pipelines, ETL jobs, or low-latency applications, understanding trigger modes is essential to optimize your Spark jobs. 📽️ Chapters to Explore 0:00 Introduction 0:40 Why Do We Need Trigger Types? 1:50 Default Trigger ...

10:42

Spark Structured Streaming Introduction

Переглядів 102Місяць тому

Welcome to this introduction to Spark Structured Streaming! In this video, we’ll break down the basics of Spark Structured Streaming and explain why it’s one of the most powerful tools for real-time data processing. Song: Dawn License: Creative Commons (CC BY 3.0) creativecommons.org/licenses/by/3.0 open.spotify.com/artist/5ZVHXQZAIn9WJXvy6qn9K0 Music powered by BreakingCopyright: breakingcopyr...

Databricks Setup for Spark Structured Streaming

3:36

Databricks Setup for Spark Structured Streaming

Переглядів 112Місяць тому

In this tutorial, we’ll guide you through setting up Databricks for Spark Structured Streaming, enabling you to start building and running real-time streaming applications with ease. Databricks offers a powerful platform for big data processing, and Spark Structured Streaming makes it easy to process streaming data with Spark’s DataFrame API. By the end of this video, you’ll have your environme...

Kafka Consumer Tutorial - Complete Guide with Code Example

7:56

Kafka Consumer Tutorial - Complete Guide with Code Example

Переглядів 587Місяць тому

In this in-depth Kafka Consumer tutorial, we’ll walk through everything you need to know to start building and configuring Kafka consumer applications. From understanding core concepts to exploring detailed configurations and implementing code, this video is your one-stop guide to Kafka consumers. Here's what you'll learn: Kafka Consumer Basics: Get an overview of Kafka consumers, how they work...

Kafka Producer Tutorial - Complete Guide with Code Example

10:59

Kafka Producer Tutorial - Complete Guide with Code Example

Переглядів 68Місяць тому

Welcome to this comprehensive Kafka Producer tutorial! In this video, we’ll dive deep into the fundamentals of Kafka producers and cover everything you need to know to get started building your own producer applications. Here's what we'll cover: Kafka Producer Basics: Learn what a Kafka producer is and how it fits into the Kafka ecosystem. Producer Workflow: Understand the steps for sending mes...

Setting Confluent Cloud for Kafka and walkthrough

6:13

Setting Confluent Cloud for Kafka and walkthrough

Переглядів 2222 місяці тому

☁️ Setting Up Confluent Cloud for Kafka | Spark Structured Streaming Series ☁️ In this video, we’re walking through the steps to set up Confluent Cloud for a seamless Kafka experience! Confluent Cloud offers a fully managed Kafka service, making it easier than ever to get started with real-time streaming without the hassle of self-managing Kafka infrastructure. Join me as we cover everything yo...

11:56

Kafka Fundamentals Part-2

Переглядів 412 місяці тому

In this video, we’ll dive into the essential roles of Kafka Producers and Consumers-the backbone of any Kafka-powered streaming application. Whether you're just starting with Kafka or brushing up on streaming concepts, this session will break down how data is sent to and retrieved from Kafka, making real-time streaming possible. What We’ll Cover: Kafka Producers: Learn how Kafka Producers send ...

15:16

Kafka Fundamentals Part -1

Переглядів 1022 місяці тому

🦅 Bird's Eye View of Kafka Components | Spark Structured Streaming Series 🦅 In this video, we’ll take a high-level look at the critical components that make Apache Kafka a robust and reliable platform for real-time data streaming. Whether you're new to Kafka or want to solidify your understanding of its architecture, this video will provide a clear overview of Kafka’s inner workings and how eac...

Understanding Apache Kafka for Real-Time Streaming

6:04

Understanding Apache Kafka for Real-Time Streaming

Переглядів 1142 місяці тому

🎬 Understanding Apache Kafka for Real-Time Streaming | Spark Structured Streaming Series 🎬 In this video, we’ll explore the basics of Apache Kafka to understand why it’s become the go-to solution for real-time data streaming. Whether you’re new to streaming or looking to expand your data engineering skills, this session will introduce the core concepts of Kafka and how it powers modern streamin...

Spark Structured Streaming with Kafka playlist launch

8:27

Spark Structured Streaming with Kafka playlist launch

Переглядів 2332 місяці тому

🔥 Welcome to my new UA-cam series: “Spark Structured Streaming with Kafka!” 🔥 In this series, we’re diving deep into the powerful combination of Apache Kafka and Spark Structured Streaming to master real-time data processing. 🚀 Get ready to learn all about building scalable, fault-tolerant streaming applications for real-world scenarios like financial transactions, fraud detection, and more! Wh...

DBT Tutorial: Snapshot - SCD type 2 in DBT

13:42

DBT Tutorial: Snapshot - SCD type 2 in DBT

Переглядів 5644 місяці тому

DBT Tutorial: Snapshot - SCD type 2 in DBT

2:41

DBT Tutorial: DBT Lineage

Переглядів 1,3 тис.10 місяців тому

DBT Tutorial: DBT Lineage

DBT Tutorial: DBT Tests | Generic and Singular Tests

13:35

DBT Tutorial: DBT Tests | Generic and Singular Tests

Переглядів 1,1 тис.10 місяців тому

DBT Tutorial: DBT Tests | Generic and Singular Tests

DBT Tutorial: How to generate automatic documentation in DBT

8:29

DBT Tutorial: How to generate automatic documentation in DBT

Переглядів 69410 місяців тому

DBT Tutorial: How to generate automatic documentation in DBT

DBT Tutorial: How to use Target | Deploy project to different environment

3:43

DBT Tutorial: How to use Target | Deploy project to different environment

Переглядів 41010 місяців тому

DBT Tutorial: How to use Target | Deploy project to different environment

8:22

DBT Tutorial: Macros in DBT

Переглядів 73210 місяців тому

DBT Tutorial: Macros in DBT

10:55

DBT Tutorial: Templating using Jinja

Переглядів 99210 місяців тому

DBT Tutorial: Templating using Jinja

DBT Tutorial: Project and Environment variables

7:29

DBT Tutorial: Project and Environment variables

Переглядів 1,3 тис.11 місяців тому

DBT Tutorial: Project and Environment variables

DBT Tutorial: Incremental Model | Updates, Appends, Merge

13:31

DBT Tutorial: Incremental Model | Updates, Appends, Merge

Переглядів 4,6 тис.11 місяців тому

DBT Tutorial: Incremental Model | Updates, Appends, Merge

DBT Tutorial : How to structure DBT project

9:31

DBT Tutorial : How to structure DBT project

Переглядів 72211 місяців тому

DBT Tutorial : How to structure DBT project

DBT Tutorial : How does DBT run your query ?

5:08

DBT Tutorial : How does DBT run your query ?

Переглядів 75411 місяців тому

DBT Tutorial : How does DBT run your query ?

DBT Tutorial : Everything you need to know about Sources and Models

16:37

DBT Tutorial : Everything you need to know about Sources and Models

Переглядів 1 тис.11 місяців тому

DBT Tutorial : Everything you need to know about Sources and Models

DBT Tutorial : Setting up your first dbt project

15:03

DBT Tutorial : Setting up your first dbt project

Переглядів 1,5 тис.11 місяців тому

DBT Tutorial : Setting up your first dbt project

14:25

DBT Tutorial : Introduction to DBT

Переглядів 1,9 тис.Рік тому

DBT Tutorial : Introduction to DBT

Spark Skewed Data Problem: How to Fix it Like a Pro

17:38

Spark Skewed Data Problem: How to Fix it Like a Pro

Переглядів 1,2 тис.Рік тому

Spark Skewed Data Problem: How to Fix it Like a Pro

КОМЕНТАРІ

@sivakrishnasiva9292 4 дні тому
How to add new columns to the existing incremental model
@anirvandecodes 2 дні тому
are you getting error while adding the new column in the configuration ?
@agyaatkavi 9 днів тому
But not column level
@anirvandecodes 4 дні тому
yes , But dbt cloud has that.
@avdeshmchandara245 18 днів тому
Super, simple, great video
@anirvandecodes 18 днів тому
Glad you liked it,please share it with your friends!
@parwezattar4228 Місяць тому
Is this playlist enough to get full dbt knowledge?
@anirvandecodes Місяць тому
Yes watch the complete playlist , You will more than enough knowledge to complete projects.
@parwezattar4228 Місяць тому
Way of explanation is too good....
@anirvandecodes Місяць тому
Glad you enjoyed it! Do share with your network.
@RaghavendraRao-hx6js Місяць тому
I need complete video on Pyspark
@anirvandecodes Місяць тому
sure will add more
@asntechies8017 Місяць тому
Can you create a project video where iot device data is processed using kafka streams in real time that would be great.thx in advance 😊
@anirvandecodes Місяць тому
Thanks for the idea! , Will try to create some project videos
@asntechies8017 Місяць тому
I have a query that you might be able to solve I am saving data from Kafka to timescaleDB But for each offset mesages I need to query db to get userId associated with IOT sensor. So for each offset processing one query is executed. Causing max connection error. Any solution for that (for now I added redis + connection pooling) but I don't think it will solve it for the long term 2. As data grows to 30-40gb of the single table inserts get slower in timescaleDB what should we do to make it fast
@asntechies8017 Місяць тому
Thx in advance
@anirvandecodes Місяць тому
You should try to batch your query (Task will be to mimize the db call) or you can copy the data from db to databricks , Checkout this one : ua-cam.com/video/n0RS7DB_y9s/v-deo.html
@asntechies8017 Місяць тому
Just subscribed your channel
@anirvandecodes Місяць тому
Thank you , Please share the playlist with your Linkedin network which will help this channel to grow.
@asntechies8017 Місяць тому
Nice video bro
@anirvandecodes Місяць тому
Thank you so much
@KBEERU Місяць тому
It's Short and Sweet and Very Descriptive. I have install as per the video. But i encountered with an error: Error from git --help: Could not find command, ensure it is in the user's PATH and that the user has permissions to run it: "git". Please let me know How to resolve this error.
@anirvandecodes Місяць тому
Thank you . So git path is not set properly , Check out this one : ua-cam.com/video/lt9oDAvpG4I/v-deo.html , Please share the playlist with your network which will help this channel to grow.
@KBEERU Місяць тому
@@anirvandecodes Thank you so much
@andriifadieiev9757 Місяць тому
Great content, thank you for sharing! Special respect for github link
@anirvandecodes Місяць тому
thank you , please share the playlist with your LinkedIn network so that it reach to wider audience.
@ur8946 Місяць тому
hi how to setup Kafka ? Do we have any video on this ?
@anirvandecodes Місяць тому
Yes , Checkout this video to setup Kafka on confluent cloud : ua-cam.com/video/miN4WLiJnRE/v-deo.html Playlist link : ua-cam.com/play/PLGCTB_rNVNUNbuEY4kW6lf9El8B2yiWEo.html
@hiteshkaushik7739 Місяць тому
Hey, great series, thanks. how can i make the producer, produce faster?
@anirvandecodes Місяць тому
There are a few configuration changes you can do Batching: Set linger.ms (5-50 ms) and increase batch.size (32-128 KB). Serialization: Opt for efficient formats like Avro or Protobuf. Partitions & Proximity: Add more partitions and deploy near Kafka brokers. In production people generally use more scalable solutions than just a Python producer app, Check out this : docs.confluent.io/platform/current/connect/index.html Do share the playlist with your LinkedIn community.
@ChinnaDornadula Місяць тому
14 completed ❤
@anirvandecodes Місяць тому
You are making a great progress , Please share with your friends and colleagues
@ChinnaDornadula Місяць тому
13 completed ❤
@ChinnaDornadula Місяць тому
12 completed ❤
@ChinnaDornadula Місяць тому
11 completed ❤
@ChinnaDornadula Місяць тому
10 completed ❤
@ChinnaDornadula Місяць тому
9 completed ❤
@ChinnaDornadula Місяць тому
8 completed ❤
@ChinnaDornadula Місяць тому
6 completed ❤
@ChinnaDornadula Місяць тому
5 completed ❤
@ChinnaDornadula Місяць тому
4 completed ❤
@ChinnaDornadula Місяць тому
3 completed ❤
@ChinnaDornadula Місяць тому
3 completed ❤
@ChinnaDornadula Місяць тому
2 completed ❤
@ChinnaDornadula Місяць тому
1 completed ❤
@mihirit7137 Місяць тому
I have copied the yml file in the folder staging, marts, I am getting the conflict to rename the yml sources , how do we effectively define sources in the models
@anirvandecodes Місяць тому
Can you share the complete error text and project structure?
@mihirit7137 Місяць тому
@@anirvandecodes so in your video you pasted the yml file containing sources in the all the 3 folders, since the source is the same for all 3 files I just pasted the model sql files inside the folder and kept the yml file outside the folder so this resolved the error, I believe with the new dbt version you cannot have 2 yml files having the same source referencing the same table at the same folder level currently my folder structure looks like models -staging - - staging_employee_details.sql -intermideate - - intermideate _employee_details.sql -marts - - marts_employee_details.sql -employee_source.yml in the video you pasting the yml file in each 3 folders (staging, intermideate, marts) which gives naming_conflict_error your videos have been very informative, I went through the whole playlist was struggling to install dbt on my system and understand it thank you so much ! 😄😄
@anirvandecodes Місяць тому
@ i think you might have same spirce name mentioned in two place , take a look into that
@mihirit7137 Місяць тому
@@anirvandecodes dbt found two sources with the name "employee_source_EMPLOYEE". Since these resources have the same name, dbt will be unable to find the correct resource when looking for source("employee_source", "EMPLOYEE"). To fix this, change the name of one of these resources: - source.dbt_complete_project.employee_source.EMPLOYEE (models\marts\marts_employee_source.yml) - source.dbt_complete_project.employee_source.EMPLOYEE (models\staging\stg_employee_source.yml)
@mihirit7137 Місяць тому
should the source name always be unique ?
@Shivanshpandey-c4e 2 місяці тому
bro, what if I don't want to share my data with confluent. Can we do the confluent kafka setup on premises?
@anirvandecodes 2 місяці тому
Absolutely , They call it self managed kafka , Check this out www.confluent.io/get-started/?product=self-managed
@digunpati 13 днів тому
Yes , you can do that. For that you need to install confluent platform on premises and you have to install and maintain kafka clusters and other artifacts b yourself. This is the self-managed offering.
@iWontFakeIt 2 місяці тому
best dbt playlist man! searched for a lot throughout youtube, no one comes closer to clarity of explanation!
@anirvandecodes 2 місяці тому
Made my day , Thank you , Do share with your network.
@iWontFakeIt 2 місяці тому
@@anirvandecodes you deserve it man!
@Sunnyb-u8g 2 місяці тому
How to see the column lineage?
@anirvandecodes 2 місяці тому
dbt core does not have any out of box column mapping lineage . You can explore column lineage in dbt cloud or check out this one tobikodata.com/column_level_lineage_for_dbt.html
@SnowEra-k9v 3 місяці тому
Hi @anirvan, thanks for your detailed explanation dbt concepts.which has helped me a lot
@anirvandecodes 3 місяці тому
Glad to hear that , Please share the content with your network.
@VikashKumar0409 3 місяці тому
Complted the tutorials, I loved it. Please create more tutorials playlist for more topics.
@anirvandecodes 3 місяці тому
Thank you for the support , Yes I will be publishing content on spark structured streaming with kafka.
@VikashKumar0409 3 місяці тому
loved your video, it cleared my doubt about sources and models and how we create sources.
@anirvandecodes 3 місяці тому
Glad it was helpful! , do share with your network
@guddu11000 3 місяці тому
Hi, I ran dbt dubug from command prompt and worked well, i am running from pycharm and getting error , The term 'dbt' is not recognized as the name of a cmdlet, function, script file
@anirvandecodes 3 місяці тому
looks like this is some pycharm path related issue , try to debug if path is coming properly in pycharm or you can also select different terminal as git bash , you can get more info on google
@VikashKumar0409 3 місяці тому
This error generally comes when the path is not added in the system ,try to use stackoverflow or chatgpt and you can try to do with git bash
@nguyenkhiem2318 4 місяці тому
Hey my man, just wanna say thanks for this whole series you did. Extremely helpful to people who are specifically looking for guidance in this new tool. Really appreciate your hard work man.
@anirvandecodes 4 місяці тому
Thank you so much, it really made my day :)
@ahmedmohamed-yo2hb 4 місяці тому
hello I have question dbt doesn't recognize my model as incremental I using incremental modling to take snap shot of table row count and insert it to build time serise table contain row conut for every day
@anirvandecodes 4 місяці тому
I will upload one video on snapshot soon , check that out !
@sandeshbidave565 5 місяців тому
How to achieve incremental insert in dbt without allowing duplicates base on specific columns?
@anirvandecodes 4 місяці тому
You can apply distinct in sql to remove the duplicates or use any other strategy to remove the duplicates
@mohammedvahid5099 6 місяців тому
Pleas teach on snowflake dbt integration and how dbt works on entire process SCD type 2 thnk u
@anirvandecodes 6 місяців тому
sure will create one video on it
@reddyreddy-np4zx 6 місяців тому
Man, amazing work. Can't wait....Subscribed! Do keep the videos coming, please?
@anirvandecodes 6 місяців тому
Thanks! Will do!
@reddyreddy-np4zx 6 місяців тому
I was looking for this and you are like a saviour. Thanks
@anirvandecodes 6 місяців тому
Glad I could help
@hemalathabuddula7923 6 місяців тому
Hiii
@anirvandecodes 6 місяців тому
hello
@divityvali8454 7 місяців тому
Are you teachjng dbt
@anirvandecodes 7 місяців тому
Yes I have a complete dbt playlist here : ua-cam.com/play/PLGCTB_rNVNUON4dyWb626R4-zrLtYfVLa.html
@jeseenajamal6495 7 місяців тому
Can you please share the dbt models as well
@anirvandecodes 7 місяців тому
sorry i lost the model file
@srinathravichandran8796 8 місяців тому
Awesome tutorials.. keep the good work going...when can we expect tutorials on other tools like airflow, airbyte etc ?
@anirvandecodes 8 місяців тому
thank you so much , I have two more videos dbt to complete the playlist , will plan after that
@balakrishna61 8 місяців тому
Nice explanation
@anirvandecodes 8 місяців тому
Keep watching
@saketsrivastava84 8 місяців тому
Very nice explained
@anirvandecodes 8 місяців тому
Thank you so much 🙂
@vshannkarind 9 місяців тому
how to deploy code from DEV to QA to PRD , Please make video on this... thank you
@anirvandecodes 9 місяців тому
yes , i am in the process on making video on how to deploy dbt project on cloud. stay tuned!
@SaiSharanKondugari 9 місяців тому
Hey Anirvan, Thanks for clearly explaining. I am currently learding dbt and I came across this question whether we can keep multiple where conditions in incremental load
@anirvandecodes 9 місяців тому
Yes, definitely. think it as a sql query with which you are filtering out the data.
@SaiSharanKondugari 9 місяців тому
@@anirvandecodes Hello Anirvan, any code snippet or any format suggestion from your end??

Anirvan Decodes

КОМЕНТАРІ