The Data Guy
The Data Guy
  • 411
  • 530 770

Відео

Snowflake Vs. AWS RedShift Vs. GCP BigQuery Vs. Azure Synapse for Data Warehousing!
Переглядів 422День тому
In this video, I'll compare and contrast some of the most popular data warehouse providers so you can choose the right tool for your use case! Join My Discord for Any Questions or Code: discord.gg/JkjvyYmFcx
How to Run Talend Tasks Using Apache Airflow and Create a Talend Operator!
Переглядів 115День тому
In this video, I'll show you how you can leverage the Talend Rest API to trigger tasks in Talend from Apache Airflow! Join My Discord for Any Questions or Code: discord.gg/JkjvyYmFcx
How to Collect and Visualize Lineage Data from your Data Pipelines with Apache Airflow!
Переглядів 31014 днів тому
In this video, I'll show you how you can collect and visualize Lineage data generated by your Apache Airflow DAG's, so that you can get a 360 view of your data! Join My Discord for Any Questions or Code: discord.gg/JkjvyYmFcx
How to Set Up a Data Lake in Production! Data Lake Best Practices Guide
Переглядів 36014 днів тому
In this video, I'll give you a full guide of all the things you'll need to consider when designing your own data lake to prevent it becoming a data swamp! Join My Discord for Any Questions or Code: discord.gg/JkjvyYmFcx
How to Use Polars, the Modern Pandas Alternative! Getting Started with Polars for Python!
Переглядів 33814 днів тому
In this video, I'll show you how you can get started with a modern alternative to Pandas, Polars! Join My Discord for Any Questions or Code: discord.gg/JkjvyYmFcx
How to Build a Production ML Pipeline with Apache Airflow, Databricks, Kafka, and MLFlow!
Переглядів 40721 день тому
In this video, I'll show you how you can build a complete MLOp's Pipeline to train a model on an ongoing basis using all production grade tools, so you can build a model that is up to snuff with modern organizations! Join My Discord for Any Questions or Code: discord.gg/JkjvyYmFcx
How to Use Apache Flink and Apache Kafka to Do Real Time Stream Processing!
Переглядів 77521 день тому
In this video, I'll show you how you can use Apache Flink and Apache Kafka to do real time stream processing!
Data Engineer Zero to Hero Guide!
Переглядів 57221 день тому
In this video, I'll give you a roadmap of all the things you'll need to learn to get started in Data Engineering!
How to Develop Spark Scripts Locally Before Deploying Them to a Databricks Cluster!
Переглядів 389Місяць тому
In this video, I'll show you how you can develop a spark script locally, before deploying the code into a remote Databricks cluster.
How to Build an ETL Pipeline with Airbyte, Apache Airflow, and Snowflake!
Переглядів 340Місяць тому
In this video, I'll show you how you can use Airbyte to pull data out of Salesforce before uploading it into Snowflake, all managed by Airflow!
How to Use AWS Lambda and Apache Airflow to Create an ETL and Machine Learning Pipeline!
Переглядів 272Місяць тому
In this video I'll show you how you can use Apache Airflow and AWS Lambda functions to create a robust parallel ETL and Machine Learning Pipeline!
How to Build an ELT Pipeline with AWS Redshift, Apache Airflow and dbt!
Переглядів 952Місяць тому
In this video I'll show you how you can build an Extract, Load, Transform pipeline with AWS, Apache Airflow and dbt!
dbt Core Vs. SQLMesh for SQL Transformations!
Переглядів 701Місяць тому
In this video, I'll break down two of the biggest fully open source tools on the market for SQL transformations, dbt Core and SQLMesh! www.getdbt.com/ sqlmesh.readthedocs.io/en/stable/
How to Build Auto-Refreshing Analytics Pipelines with Microsoft SQL Server, PowerBI & Apache Airflow
Переглядів 294Місяць тому
In this video, I'll show you how you can build an auto-refreshing PowerBI dashboard using Airflow, MS SQL Server, and Azure Blob Storage!
Apache Flink Vs. Apache Spark Vs. Apache Storm: Which Data Processing Tool is Right for You!
Переглядів 542Місяць тому
Apache Flink Vs. Apache Spark Vs. Apache Storm: Which Data Processing Tool is Right for You!
Collibra Vs. Monte Carlo Vs. Atlan: Data Lineage/Catalog Tools Compared!
Переглядів 289Місяць тому
Collibra Vs. Monte Carlo Vs. Atlan: Data Lineage/Catalog Tools Compared!
How to Build an ETL Pipeline to Process Google Ads Data with Apache Airflow and BigQuery!
Переглядів 367Місяць тому
How to Build an ETL Pipeline to Process Google Ads Data with Apache Airflow and BigQuery!
How to Build an ETL Pipeline with a Couchbase Database and Apache Airflow!
Переглядів 166Місяць тому
How to Build an ETL Pipeline with a Couchbase Database and Apache Airflow!
Apache Cassandra Vs. Redis Vs. MongoDB: NoSQL Databases Compared!
Переглядів 523Місяць тому
Apache Cassandra Vs. Redis Vs. MongoDB: NoSQL Databases Compared!
How to Build an ETL Pipeline with Apache Kafka and RedPanda Connect!
Переглядів 317Місяць тому
How to Build an ETL Pipeline with Apache Kafka and RedPanda Connect!
OLTP vs. OLAP Databases! OLTP and OLAP Databases Explained and Compared!
Переглядів 9942 місяці тому
OLTP vs. OLAP Databases! OLTP and OLAP Databases Explained and Compared!
How to Get Started with SQLMesh, a dbt Alternative!
Переглядів 1,1 тис.2 місяці тому
How to Get Started with SQLMesh, a dbt Alternative!
How to Build an ETL Pipeline with Google BigQuery and Apache Airflow!
Переглядів 5202 місяці тому
How to Build an ETL Pipeline with Google BigQuery and Apache Airflow!
How to Use Ray and Apache Airflow for Heavy ML/AI Processing Workloads!
Переглядів 2842 місяці тому
How to Use Ray and Apache Airflow for Heavy ML/AI Processing Workloads!
How to Get Started with Soda for Data Quality Checks!
Переглядів 5522 місяці тому
How to Get Started with Soda for Data Quality Checks!
Data Engineering Interview Guide! How to Get a Data Engineering Job!
Переглядів 8372 місяці тому
Data Engineering Interview Guide! How to Get a Data Engineering Job!
How to Build a BigQuery Ingestion Pipeline from API's, SFTP Servers, and Pub/Sub with Airflow!
Переглядів 2672 місяці тому
How to Build a BigQuery Ingestion Pipeline from API's, SFTP Servers, and Pub/Sub with Airflow!
End-to-End Parallel ETL Pipeline with Airflow, Snowflake, and S3 Buckets!
Переглядів 4783 місяці тому
End-to-End Parallel ETL Pipeline with Airflow, Snowflake, and S3 Buckets!
Best Practices for dbt With Real World Examples!
Переглядів 6633 місяці тому
Best Practices for dbt With Real World Examples!

КОМЕНТАРІ

  • @goldydoesyt
    @goldydoesyt День тому

    Awesome vid

  • @joestrinka9738
    @joestrinka9738 День тому

    This is just reading the website with added motion sickness. Terrible content.

  • @techienomadiso8970
    @techienomadiso8970 7 днів тому

    Kafka should be compared to Rabbit Mq while Flink to Spark

  • @uchihadayne6506
    @uchihadayne6506 7 днів тому

    We couldn’t see the dog! 🥹 thanks for the vid. Ever look at Palantir as an option to an overall edp?

  • @KKKBarracuda
    @KKKBarracuda 7 днів тому

    Thank you for the video, I am just starting to learn airflow it is great knowledge, would be great if you could do a video about executors of airflow in depth and another video of airflow architecture in-depth include the secret backend and meta database, kind a confusing me with the purpose and practical use of secret backend since there is already a database.

  • @ccc_ccc789
    @ccc_ccc789 8 днів тому

    Thanks! You're doing a good job!

  • @cdgtopnp
    @cdgtopnp 8 днів тому

    Man you provided so much info but documented none of it. If the background were a slide deck instead of a static image, the video would feel much more structured and engaging. Nonetheless you are a great orator and hope your channel grows exponentially !!

  • @sblowes
    @sblowes 9 днів тому

    Great explanation. Please invest in a can of WD-40.

  • @michael_day
    @michael_day 11 днів тому

    The more I hear about SQLMesh, the more I'm convinced. My org much prefers open source and thus we need a fuller solution from the get-go.

  • @yayif7699
    @yayif7699 12 днів тому

    Hi this is helpful!! Do you have a github profile?

  • @hugoclarke3284
    @hugoclarke3284 13 днів тому

    Are ya winning JSON

  • @FhariyaAseem
    @FhariyaAseem 13 днів тому

    Amazing!!

  • @santoshkumargouda6033
    @santoshkumargouda6033 15 днів тому

    Please make a video on medallion architectures in data pipelines.

  • @nixondanielhutahaean44
    @nixondanielhutahaean44 16 днів тому

    can you explain how is Airflow do in peoduction? Like how they deploy the DAG, collaborate for building DAG, and another production thigns

    • @razor-b2d
      @razor-b2d 3 дні тому

      different clouds have their managed airflow versions. ex: google cloud composer

  • @murugesanrajasekaran5032
    @murugesanrajasekaran5032 17 днів тому

    Thanks. Is there any GitHub link you can share to get the code snippets used in this example

  • @ballettyishappy2254
    @ballettyishappy2254 17 днів тому

    thank you sir

  • @shresthaupadhyay5739
    @shresthaupadhyay5739 17 днів тому

    Hey curious me wants to know can we transfer 150 million records of data with that ?

  • @JacobThorwarth
    @JacobThorwarth 17 днів тому

    Just getting started and and developing a huge interest in the field of Dara Engineering. I never leave comments but I think your content is simply amazing and invaluable, I have learnt so incredibly much from you, I cannot thank you enough for your effort and insight. Greetings from Germany ❤

  • @razor-b2d
    @razor-b2d 18 днів тому

    Can you run it

  • @ken-zlai
    @ken-zlai 18 днів тому

    Excellent video but the audio is a bit quiet :)

  • @whramijg
    @whramijg 19 днів тому

    so you came up with this all by yourself?

  • @groundingtiming
    @groundingtiming 22 дні тому

    YEs, could you please update on Git? makes it easier to follow along

  • @Achilles585
    @Achilles585 22 дні тому

    Could you upload it on git?

  • @luisrc99
    @luisrc99 24 дні тому

    Thanks for this video! 🔥🔥🔥

  • @rahuldsouza1985
    @rahuldsouza1985 25 днів тому

    What about DB2?

  • @canhnguyen9960
    @canhnguyen9960 25 днів тому

    Can you give me the source code in the video?

  • @jaimernandez94
    @jaimernandez94 26 днів тому

    Hello, I use cdk to create all my infra. Then I have an airflow DAG that runs some tasks, including a lambda function created by cdk. I'm able to trigger the lambda form the DAG but I'm not able to wait for the lambda's callback to continue with the rest of the tasks present in the DAG after the lambda finishes, any idea?

    • @thedataguygeorge
      @thedataguygeorge 17 днів тому

      You should use a sensor to detect the completion of the lambda job as an intermediary step since the operator itself won't wait

    • @jaimernandez94
      @jaimernandez94 16 днів тому

      @thedataguygeorge hey, thanks a lot. Btw, I wasn't able to do this with any type of sensor whatsoever. The solution for me was to override the LambdaInvokeFunctionOperator in order to be able to modify the tcp_keepalive, read_timeout and connect_timeout. A bit weird that there is not something out of the box that allows us to do it without complicating things this much. Happy to share if you are interested!

  • @rubayetsabbirfaruque3629
    @rubayetsabbirfaruque3629 27 днів тому

    I keep running into an error with the execution path. Even though I entered the container with astro dev bash and saw the dbt_venv, cosmos can't seem to it.

  • @BinPham-x1k
    @BinPham-x1k 29 днів тому

    you da goat my guy

  • @itzcallmepro4963
    @itzcallmepro4963 29 днів тому

    can you recommend resources for tooics such as airflow,dbt,spark ?

    • @thedataguygeorge
      @thedataguygeorge 17 днів тому

      Not sure about dbt/spark but for Airflow check out this link! academy.astronomer.io/

  • @jeffrey6124
    @jeffrey6124 29 днів тому

    Wazzup! Captain America with eye glasses 🤓 I was searching for "Data Architecture" and Google recommended your video .... only your video!!! 🤩 say hi to Data Dog 😍

    • @thedataguygeorge
      @thedataguygeorge 17 днів тому

      Wow crazy i'm the only one out there! Thanks so much for the kind words, the Data Dog says hi right back!

  • @sohanmsoni
    @sohanmsoni Місяць тому

    Thanks for detailed video on comparison, but now I would go deep in Kafta steams vs Flink. Are they competing services with each other and that too from Same owner ? (Apache) And on a lighter note, time to do servicing of that chair or get a new one 😊

    • @thedataguygeorge
      @thedataguygeorge 17 днів тому

      Apache is just an umbrella open source organization, so semi-competing projects but also designed differently, and thank you, saving up!

  • @jay_wright_thats_right
    @jay_wright_thats_right Місяць тому

    I feel like I'm being sold something. No doubt, thank you for your efforts.

  • @boseashish
    @boseashish Місяць тому

    God bless you!!! may you get success

  • @LuigiMolinaro
    @LuigiMolinaro Місяць тому

    You need a new chair :D

    • @thedataguygeorge
      @thedataguygeorge 17 днів тому

      Hahahaha yes this comment section has made that clear

  • @PengyuHou
    @PengyuHou Місяць тому

    Hello this is Pengyu from the Chronon team. Great job on explaining the concepts and even getting a demo working!

    • @thedataguygeorge
      @thedataguygeorge 17 днів тому

      Thanks so much Pengyu, thanks for making a great project!

  • @infamousprince88
    @infamousprince88 Місяць тому

    Very useful information! Are there end to end (or zero to hero) videos you’d recommend to get up to speed with this domain? I’ve been in and around data analytics for some years and have the Python, SQL, BI/Tableau portion. Just would like to see the Data Modeling, DBT, engineering, data source integration aspects

  • @Wakeful_Being
    @Wakeful_Being Місяць тому

    also i think the start and stop bits might be switched @4:26

  • @Wakeful_Being
    @Wakeful_Being Місяць тому

    thank you!!

  • @joefitzy
    @joefitzy Місяць тому

    Thanks for the video, but when it came to DBT/Cosmos it was pretty unclear what was happening.

    • @thedataguygeorge
      @thedataguygeorge 17 днів тому

      Apologies, didn't spend too much time since I have other videos on it but great point, will improve in the future

  • @user-xx3zp3qr1k
    @user-xx3zp3qr1k Місяць тому

    Nice guide! but i cannot figure out how to personalize this installation, i would like to deploy airbyte on my postgres, my previous installation was through a docker-compose file, now everything has changed! what can i do to "personalize" this tool like i did before with the compose file? i can't find a lot on the official documentation! thank you very much!

    • @thedataguygeorge
      @thedataguygeorge 17 днів тому

      You can customize the docker compose file for Airbyte too! They don't have great docs on how to do it but follows the same paradigms as other dockerized apps

  • @anibara
    @anibara Місяць тому

    I am just surprised with the quality of content you put out on regular basis. Thanks a lot and yes please do more AWS content.

  • @phethosilas8781
    @phethosilas8781 Місяць тому

    Is there a way to reach you? I would love to be mentored by you. All the way from South Africa

  • @PhilipPetersen-c1j
    @PhilipPetersen-c1j Місяць тому

    Shoutout Mr data guy 🙌

  • @dongtandung9671
    @dongtandung9671 Місяць тому

    do you have this on a repo so that we can take a look at the whole thing?

  • @itzcallmepro4963
    @itzcallmepro4963 Місяць тому

    great , any good source to learn airflow indepth ?

  • @TheMustafa-b8j
    @TheMustafa-b8j Місяць тому

    Nice content can you share the link of the github repo of the project

  • @hungdohoangminh2064
    @hungdohoangminh2064 Місяць тому

    Can we do that with PostgreSQL flexible server within Vnet ?

  • @alonbrim
    @alonbrim Місяць тому

    Great video ! Thanks a lot for the clear explanations!