- 411
- 530 770
The Data Guy
United States
Приєднався 6 кві 2013
Your one stop shop for all your Data needs! Have a hard problem you'd like solved but don't know how? Send it to me at gyatesofficial@gmail.com and I'll make a video on it!
www.linkedin.com/in/george-yates/
www.linkedin.com/in/george-yates/
How to Run Apache Airflow in Production! Best Practices for Running Apache Airflow at Scale!
In this video, I'll go through all the best practices you'll want to consider when choosing how to run and manage Airflow in production, and at large scale!
Переглядів: 599
Відео
Snowflake Vs. AWS RedShift Vs. GCP BigQuery Vs. Azure Synapse for Data Warehousing!
Переглядів 422День тому
In this video, I'll compare and contrast some of the most popular data warehouse providers so you can choose the right tool for your use case! Join My Discord for Any Questions or Code: discord.gg/JkjvyYmFcx
How to Run Talend Tasks Using Apache Airflow and Create a Talend Operator!
Переглядів 115День тому
In this video, I'll show you how you can leverage the Talend Rest API to trigger tasks in Talend from Apache Airflow! Join My Discord for Any Questions or Code: discord.gg/JkjvyYmFcx
How to Collect and Visualize Lineage Data from your Data Pipelines with Apache Airflow!
Переглядів 31014 днів тому
In this video, I'll show you how you can collect and visualize Lineage data generated by your Apache Airflow DAG's, so that you can get a 360 view of your data! Join My Discord for Any Questions or Code: discord.gg/JkjvyYmFcx
How to Set Up a Data Lake in Production! Data Lake Best Practices Guide
Переглядів 36014 днів тому
In this video, I'll give you a full guide of all the things you'll need to consider when designing your own data lake to prevent it becoming a data swamp! Join My Discord for Any Questions or Code: discord.gg/JkjvyYmFcx
How to Use Polars, the Modern Pandas Alternative! Getting Started with Polars for Python!
Переглядів 33814 днів тому
In this video, I'll show you how you can get started with a modern alternative to Pandas, Polars! Join My Discord for Any Questions or Code: discord.gg/JkjvyYmFcx
How to Build a Production ML Pipeline with Apache Airflow, Databricks, Kafka, and MLFlow!
Переглядів 40721 день тому
In this video, I'll show you how you can build a complete MLOp's Pipeline to train a model on an ongoing basis using all production grade tools, so you can build a model that is up to snuff with modern organizations! Join My Discord for Any Questions or Code: discord.gg/JkjvyYmFcx
How to Use Apache Flink and Apache Kafka to Do Real Time Stream Processing!
Переглядів 77521 день тому
In this video, I'll show you how you can use Apache Flink and Apache Kafka to do real time stream processing!
Data Engineer Zero to Hero Guide!
Переглядів 57221 день тому
In this video, I'll give you a roadmap of all the things you'll need to learn to get started in Data Engineering!
How to Develop Spark Scripts Locally Before Deploying Them to a Databricks Cluster!
Переглядів 389Місяць тому
In this video, I'll show you how you can develop a spark script locally, before deploying the code into a remote Databricks cluster.
How to Build an ETL Pipeline with Airbyte, Apache Airflow, and Snowflake!
Переглядів 340Місяць тому
In this video, I'll show you how you can use Airbyte to pull data out of Salesforce before uploading it into Snowflake, all managed by Airflow!
How to Use AWS Lambda and Apache Airflow to Create an ETL and Machine Learning Pipeline!
Переглядів 272Місяць тому
In this video I'll show you how you can use Apache Airflow and AWS Lambda functions to create a robust parallel ETL and Machine Learning Pipeline!
How to Build an ELT Pipeline with AWS Redshift, Apache Airflow and dbt!
Переглядів 952Місяць тому
In this video I'll show you how you can build an Extract, Load, Transform pipeline with AWS, Apache Airflow and dbt!
dbt Core Vs. SQLMesh for SQL Transformations!
Переглядів 701Місяць тому
In this video, I'll break down two of the biggest fully open source tools on the market for SQL transformations, dbt Core and SQLMesh! www.getdbt.com/ sqlmesh.readthedocs.io/en/stable/
How to Build Auto-Refreshing Analytics Pipelines with Microsoft SQL Server, PowerBI & Apache Airflow
Переглядів 294Місяць тому
In this video, I'll show you how you can build an auto-refreshing PowerBI dashboard using Airflow, MS SQL Server, and Azure Blob Storage!
Apache Flink Vs. Apache Spark Vs. Apache Storm: Which Data Processing Tool is Right for You!
Переглядів 542Місяць тому
Apache Flink Vs. Apache Spark Vs. Apache Storm: Which Data Processing Tool is Right for You!
Collibra Vs. Monte Carlo Vs. Atlan: Data Lineage/Catalog Tools Compared!
Переглядів 289Місяць тому
Collibra Vs. Monte Carlo Vs. Atlan: Data Lineage/Catalog Tools Compared!
How to Build an ETL Pipeline to Process Google Ads Data with Apache Airflow and BigQuery!
Переглядів 367Місяць тому
How to Build an ETL Pipeline to Process Google Ads Data with Apache Airflow and BigQuery!
How to Build an ETL Pipeline with a Couchbase Database and Apache Airflow!
Переглядів 166Місяць тому
How to Build an ETL Pipeline with a Couchbase Database and Apache Airflow!
Apache Cassandra Vs. Redis Vs. MongoDB: NoSQL Databases Compared!
Переглядів 523Місяць тому
Apache Cassandra Vs. Redis Vs. MongoDB: NoSQL Databases Compared!
How to Build an ETL Pipeline with Apache Kafka and RedPanda Connect!
Переглядів 317Місяць тому
How to Build an ETL Pipeline with Apache Kafka and RedPanda Connect!
OLTP vs. OLAP Databases! OLTP and OLAP Databases Explained and Compared!
Переглядів 9942 місяці тому
OLTP vs. OLAP Databases! OLTP and OLAP Databases Explained and Compared!
How to Get Started with SQLMesh, a dbt Alternative!
Переглядів 1,1 тис.2 місяці тому
How to Get Started with SQLMesh, a dbt Alternative!
How to Build an ETL Pipeline with Google BigQuery and Apache Airflow!
Переглядів 5202 місяці тому
How to Build an ETL Pipeline with Google BigQuery and Apache Airflow!
How to Use Ray and Apache Airflow for Heavy ML/AI Processing Workloads!
Переглядів 2842 місяці тому
How to Use Ray and Apache Airflow for Heavy ML/AI Processing Workloads!
How to Get Started with Soda for Data Quality Checks!
Переглядів 5522 місяці тому
How to Get Started with Soda for Data Quality Checks!
Data Engineering Interview Guide! How to Get a Data Engineering Job!
Переглядів 8372 місяці тому
Data Engineering Interview Guide! How to Get a Data Engineering Job!
How to Build a BigQuery Ingestion Pipeline from API's, SFTP Servers, and Pub/Sub with Airflow!
Переглядів 2672 місяці тому
How to Build a BigQuery Ingestion Pipeline from API's, SFTP Servers, and Pub/Sub with Airflow!
End-to-End Parallel ETL Pipeline with Airflow, Snowflake, and S3 Buckets!
Переглядів 4783 місяці тому
End-to-End Parallel ETL Pipeline with Airflow, Snowflake, and S3 Buckets!
Best Practices for dbt With Real World Examples!
Переглядів 6633 місяці тому
Best Practices for dbt With Real World Examples!
Awesome vid
This is just reading the website with added motion sickness. Terrible content.
Kafka should be compared to Rabbit Mq while Flink to Spark
We couldn’t see the dog! 🥹 thanks for the vid. Ever look at Palantir as an option to an overall edp?
Thank you for the video, I am just starting to learn airflow it is great knowledge, would be great if you could do a video about executors of airflow in depth and another video of airflow architecture in-depth include the secret backend and meta database, kind a confusing me with the purpose and practical use of secret backend since there is already a database.
Thanks! You're doing a good job!
Man you provided so much info but documented none of it. If the background were a slide deck instead of a static image, the video would feel much more structured and engaging. Nonetheless you are a great orator and hope your channel grows exponentially !!
Great explanation. Please invest in a can of WD-40.
The more I hear about SQLMesh, the more I'm convinced. My org much prefers open source and thus we need a fuller solution from the get-go.
Hi this is helpful!! Do you have a github profile?
Are ya winning JSON
Amazing!!
Please make a video on medallion architectures in data pipelines.
can you explain how is Airflow do in peoduction? Like how they deploy the DAG, collaborate for building DAG, and another production thigns
different clouds have their managed airflow versions. ex: google cloud composer
Thanks. Is there any GitHub link you can share to get the code snippets used in this example
thank you sir
You're very welcome!
Hey curious me wants to know can we transfer 150 million records of data with that ?
Definitely!
Just getting started and and developing a huge interest in the field of Dara Engineering. I never leave comments but I think your content is simply amazing and invaluable, I have learnt so incredibly much from you, I cannot thank you enough for your effort and insight. Greetings from Germany ❤
Thank you so much from New York!
Can you run it
Excellent video but the audio is a bit quiet :)
Thank you for the heads up!
so you came up with this all by yourself?
All by reading severally articles online lol
YEs, could you please update on Git? makes it easier to follow along
Could you upload it on git?
Thanks for this video! 🔥🔥🔥
No problem, my pleasure
What about DB2?
Never heard of it!
Can you give me the source code in the video?
Hello, I use cdk to create all my infra. Then I have an airflow DAG that runs some tasks, including a lambda function created by cdk. I'm able to trigger the lambda form the DAG but I'm not able to wait for the lambda's callback to continue with the rest of the tasks present in the DAG after the lambda finishes, any idea?
You should use a sensor to detect the completion of the lambda job as an intermediary step since the operator itself won't wait
@thedataguygeorge hey, thanks a lot. Btw, I wasn't able to do this with any type of sensor whatsoever. The solution for me was to override the LambdaInvokeFunctionOperator in order to be able to modify the tcp_keepalive, read_timeout and connect_timeout. A bit weird that there is not something out of the box that allows us to do it without complicating things this much. Happy to share if you are interested!
I keep running into an error with the execution path. Even though I entered the container with astro dev bash and saw the dbt_venv, cosmos can't seem to it.
Thank you so much and will do!
you da goat my guy
Thanks big dawg!
can you recommend resources for tooics such as airflow,dbt,spark ?
Not sure about dbt/spark but for Airflow check out this link! academy.astronomer.io/
Wazzup! Captain America with eye glasses 🤓 I was searching for "Data Architecture" and Google recommended your video .... only your video!!! 🤩 say hi to Data Dog 😍
Wow crazy i'm the only one out there! Thanks so much for the kind words, the Data Dog says hi right back!
Thanks for detailed video on comparison, but now I would go deep in Kafta steams vs Flink. Are they competing services with each other and that too from Same owner ? (Apache) And on a lighter note, time to do servicing of that chair or get a new one 😊
Apache is just an umbrella open source organization, so semi-competing projects but also designed differently, and thank you, saving up!
I feel like I'm being sold something. No doubt, thank you for your efforts.
I don't work for either org, all unbiased!
God bless you!!! may you get success
Thank you so much!
You need a new chair :D
Hahahaha yes this comment section has made that clear
Hello this is Pengyu from the Chronon team. Great job on explaining the concepts and even getting a demo working!
Thanks so much Pengyu, thanks for making a great project!
Very useful information! Are there end to end (or zero to hero) videos you’d recommend to get up to speed with this domain? I’ve been in and around data analytics for some years and have the Python, SQL, BI/Tableau portion. Just would like to see the Data Modeling, DBT, engineering, data source integration aspects
Working on creating some myself right now!
also i think the start and stop bits might be switched @4:26
thank you!!
Thanks for the video, but when it came to DBT/Cosmos it was pretty unclear what was happening.
Apologies, didn't spend too much time since I have other videos on it but great point, will improve in the future
Nice guide! but i cannot figure out how to personalize this installation, i would like to deploy airbyte on my postgres, my previous installation was through a docker-compose file, now everything has changed! what can i do to "personalize" this tool like i did before with the compose file? i can't find a lot on the official documentation! thank you very much!
You can customize the docker compose file for Airbyte too! They don't have great docs on how to do it but follows the same paradigms as other dockerized apps
I am just surprised with the quality of content you put out on regular basis. Thanks a lot and yes please do more AWS content.
Is there a way to reach you? I would love to be mentored by you. All the way from South Africa
Yes! Join the Data Guy discord I just created!
discord.gg/JkjvyYmFcx
Thank you
Shoutout Mr data guy 🙌
do you have this on a repo so that we can take a look at the whole thing?
great , any good source to learn airflow indepth ?
Nice content can you share the link of the github repo of the project
Can we do that with PostgreSQL flexible server within Vnet ?
Definitely!
Great video ! Thanks a lot for the clear explanations!
Thanks, glad you found it helpful!