- 68
- 163 542
DataEngineerOne
United States
Приєднався 10 кві 2020
DataEngineerOne is all about helping people understand how to do data engineering.
Handling Zip Files in Kedro Using the de1 python package!
In this video I demonstrate how to install and use the brand new `de1` package and the ZipFileDataSet in order to handle zip files in a pipeline.
Checkout the package here: github.com/dataengineerone/de1-python
Checkout the package here: github.com/dataengineerone/de1-python
Переглядів: 874
Відео
Let's Look at Kedro 0.17.0!
Переглядів 7383 роки тому
Kedro was released last month with the newest minor version to date: 0.17.0! In this video we explore some of the newest features.
How to Lazily Evaluate Chunks of a Big Pandas DataFrame
Переглядів 1,8 тис.3 роки тому
When we have a case where we wish to do a group by, usually we rely on a big beefy machine to do the processing. Using the technique described in this video, one can instead save on memory and save on resources by skillfully chunking the data one wishes to process, and cleaning up after oneself when done.
How To Make Project Docs in Kedro With Pydocs and `kedro build-docs`
Переглядів 9083 роки тому
In this video we show a simple tutorial for creating documentation for your kedro projects. We'll be exploring how we can add Pydocs to our functions and how, using kedro's built in `build-docs` feature, we automatically can pull all of that information into a beautiful HTML documentation library.
Creating Custom Kedro Starters for Your Boilerplate Code
Переглядів 6633 роки тому
In this video I show you how to create a custom starter so that you can share the joy of not having to write boiler plate code.
What is Kedro? Why is it useful? A Non-Technical Intro to Kedro
Переглядів 6 тис.4 роки тому
In this video I explain what kedro is and why it is useful for non-technical persons!
Deployable REST Enabled Data Pipelines with Flask, Docker, Kedro
Переглядів 2,4 тис.4 роки тому
Turning your pipeline into a REST API has never been easier, thanks to Flask and Docker. In this video, I show you how!
How To Create Dataset Save and Load Hooks
Переглядів 6994 роки тому
Dataset Save and Load hooks don't actually exist in Kedro! With this tutorial, I show you how to utilize Dataset Transformers in conjunction with existing hooks to create functions that can act as effective dataset hooks. TIMESTAMPS: 0:00 Introduction 0:22 There are no Dataset Hooks! 0:49 Why do we want them? 1:08 What are Transformers? 1:38 Writing a simple Transformer 3:02 Add the Transformer...
Using Streamlit To Make GUIs for your Kedro Parameters!
Переглядів 2,5 тис.4 роки тому
In this video, we show you how to use streamlit to control your kedro parameters. Streamlit is a webapp generating library that allows you to modify parameters and view data on the fly. It's a fantastic system, and combining it with kedro makes it even better! Today's episode is brought to you by kedro.community, the newest open community, bringing data pipeliners from around the world, togethe...
Creating Shared Catalogs for your Kedro Projects on GitHub
Переглядів 6084 роки тому
As kedro becomes more and more popular, the need to share your data catalog will become ever more likely. Thanks to kedro's new hooks, it makes it super easy to share catalog entries between project teams. In this episode, I show you how this can be accomplished. The code for today's video can be found here: gist.github.com/tamsanh/2075e293a089e76baa24cf29e3c566f1 TIMESTAMPS: 0:00 Introduction ...
Let's Look at Kedro 0.16.5 Release Notes
Переглядів 3034 роки тому
The Newest Kedro was Released! 0.16.5 Congrats to the Kedro team for another awesome release. Snarky Canadian's "pyproject.toml" Explanation: snarky.ca/what-the-heck-is-pyproject-toml/ TIMESTAMPS: 0:00 Intro 0:40 Pipeline to Hooks Transition 1:44 Initial thoughts on Hooks 3:04 Standardizing pyproject.toml 4:00 Disabling Plugins with the .kedro.yml Configuration 4:31 Not Totally Backwards Compat...
The Complete Beginner's Guide to Kedro - How to use Kedro 0.16.4
Переглядів 19 тис.4 роки тому
This video brought to you by kedro.community/! The newest and nicest place to talk and learn with other data pipeliners. This is the first video a kedro newbie should watch, if they wish to understand how to use kedro! In this complete guide, you'll learn about Pipelines, Nodes, DataSets, Catalogs, Parameters, and how you can leverage all of these in your kedro project. We'll be walking through...
How To Customize Your Kedro CLI Options
Переглядів 6424 роки тому
In this video we cover how to add custom a custom CLI for kedro! We add a "cool-run" command which will run multiple pipelines for us with a single run. You can use this method to create all sorts of different configurations for your pipelines. TIMESTAMPS: 0:00 Intro 0:28 We will be editing the kedro_cli.py file's click 1:46 Explaining how the normal 'run' command works 2:20 Overview of our new...
How to Create and Reuse Pipelines with "Package and Pull" CLI
Переглядів 5204 роки тому
The Pipeline CLI command has great options to enable pipeline reuse. In this episode, we take a closer look on what pipeline reuse looks like, and the caveats of reuse. TIMESTAMPS: 0:00 Intro 0:39 Looking at the Kedro Pipeline Command 1:11 `kedro pipeline create` for Creating Pipelines 2:40 Prepare the Pipeline for Packaging 3:17 Quick Look at Other Pipeline Options 3:29 `kedro pipeline describ...
How to Get/Write Data from/to a SQL Database
Переглядів 2,2 тис.4 роки тому
Data Engineering is a tough job, and it can be made tougher by complex, difficult to understand data pipelines. In this series, we will be covering Kedro and how to use it to make data pipelines easier to read, write, and maintain. In this video we cover: Accessing SQL Data: * Use SQLTableDataSet to load and save entire DataFrames * Use if_exists parameter to manage table behavior. * Use SQLQue...
Finishing our SG API Pipeline with Chronocoding - How I Write Pipes Part V
Переглядів 3974 роки тому
Finishing our SG API Pipeline with Chronocoding - How I Write Pipes Part V
Parallelize Pipeline Processing With Sub Node Parallelization - How I Write Pipes Part IV
Переглядів 1,1 тис.4 роки тому
Parallelize Pipeline Processing With Sub Node Parallelization - How I Write Pipes Part IV
Using Component Pipelines to Optimize Data Science Iteration - How I Write Pipes Part III
Переглядів 5814 роки тому
Using Component Pipelines to Optimize Data Science Iteration - How I Write Pipes Part III
Adding our New Nodes to Our Pipeline - How I Write Data Pipelines - Part II
Переглядів 8464 роки тому
Adding our New Nodes to Our Pipeline - How I Write Data Pipelines - Part II
How I Write Data Pipelines - Part I
Переглядів 1,7 тис.4 роки тому
How I Write Data Pipelines - Part I
How to Combine Multiple CSV Files into a Single DataFrame
Переглядів 3,7 тис.4 роки тому
How to Combine Multiple CSV Files into a Single DataFrame
How to Setup PySpark for your Kedro Pipeline
Переглядів 2,3 тис.4 роки тому
How to Setup PySpark for your Kedro Pipeline
Let's Take a Look at the Kedro 0.16.3 Release Notes!
Переглядів 3824 роки тому
Let's Take a Look at the Kedro 0.16.3 Release Notes!
How To Import Pipelines in Other Python Scripts
Переглядів 7404 роки тому
How To Import Pipelines in Other Python Scripts
How To Use a Parameter Range to Generate Pipelines Automatically
Переглядів 1,6 тис.4 роки тому
How To Use a Parameter Range to Generate Pipelines Automatically
How to Begin Profiling Your Data with Pandas Profiling
Переглядів 7574 роки тому
How to Begin Profiling Your Data with Pandas Profiling
Two Tricks to Optimize your Kedro Jupyter Flow
Переглядів 8514 роки тому
Two Tricks to Optimize your Kedro Jupyter Flow
Advanced Configuration with TemplatedConfigLoader
Переглядів 7404 роки тому
Advanced Configuration with TemplatedConfigLoader
How to Contribute New Code back to Kedro
Переглядів 2274 роки тому
How to Contribute New Code back to Kedro
If you are facing the Error: No such command 'run' do: Still in the venv - cd <kedro directory> pip install -r requirements.txt After that the run command will work.
an example of how to get data from parallel workers and join that dat would have been great
kedro-wings last commit Jul 28, 2020, so basically dead
Would be great to have synchornization in the opposite direction and automatic notebook creation with all of the dependencies
this channel awesome! Thank you
That helped me a lot! Thank You!
How i can pass parameters from KEDRO CLI in catalog.yml?? Any reference to it would be helpful.
awesome!
can i have a codebase of this video ?
Is there any newer method to use structured streaming with Kedro? Do you have another suggestions?
What's the better it gives on comparison to prefect ?
Thanks for the video!
How can we developed an engine which determines which rules applied on given dataset automatically
To be able to make a full series on kedro while constantly smiling, your dealer must be lit! Hook me up please! (pun unintended)
How do we call async await func with kedro node?
Excelent tutorial, now I can get all the benefit of using parameters, modular pipelines, and viz. This has made a positive change in my workflow. In my work I complement with "kedro catalog create --pipeline name_pipeline" and everything flows very smooth. Thanks a lot for sharing!
Great video and great example of a custom command as often need to run multiple pipelines. Quick question how do you find the kedro_cli.py file? I've created a new kedro project but there's no file there like you've shown?
Great approach! What to do when your latest partition also needs historical partitions to calculate variables such as "total sales last 6 months" but prior partitions had their variables already computed?
Kedro seems to be full of compatibility issues. Better not to use such a tool.
compatibility issues with what?
Thanks for video. kedro 0.18.2 has no 'load_context' function in the class 'kedro.framework.context.context', what shall we do?
good high level explaination. Think things are broken with new versions. Need a updated tutorial.
is it possible kedro running every nodes/function in different environment (conda, pip, ,docker) in one pipelines?
I need some help with setting up and running kedro on aws EMR
can anyone help?
will u pls help me to code Deployable REST Enabled Data Pipelines with Django, Kedro?
Traceback (most recent call last): File "C:\Users\sduque\.conda\envs\kedro-tutorial\lib\site-packages\kedro\framework\cli\cli.py", line 682, in load_entry_points entry_point_commands.append(entry_point.load()) File "C:\Users\sduque\.conda\envs\kedro-tutorial\lib\site-packages\pkg_resources\__init__.py", line 2458, in load return self.resolve() File "C:\Users\sduque\.conda\envs\kedro-tutorial\lib\site-packages\pkg_resources\__init__.py", line 2464, in resolve module = __import__(self.module_name, fromlist=['__name__'], level=0) File "C:\Users\sduque\.conda\envs\kedro-tutorial\lib\site-packages\kedro_viz\launchers\cli.py", line 8, in <module> from kedro.framework.cli.project import PARAMS_ARG_HELP ImportError: cannot import name 'PARAMS_ARG_HELP' from 'kedro.framework.cli.project' (C:\Users\sduque\.conda\envs\kedro-tutorial\lib\site-packages\kedro\framework\cli\project.py) Error: Loading global commands from kedro-viz = kedro_viz.launchers.cli:commands Help!
Great stuff
how to combine csv without pandas?)
Can we build a Redis + Celery backed queue runner?
Very useful! Just getting into Kedro. Excited to try this tomorrow :)
Thanks you for the tuto ! Have a question could share pls the documentation steps and i have a question: how to link the env with kedro using ssh connexion on pycharm ?
can u tell versions of kedro, python and kedro great used in this video please
great how happy und smiling you are all the time! thanks for the demo! 😀 Is it possible to save the raw data in two new tables? One in which only goes the success rows and one in which goes the failure rows? (So that the next ETL steps can run on a "validation_passed_table")
The Kedro documentation on their site is utter shit. Heck their "hello world" tutorial doesn't even work. They don't give examples on how to implement features....just really hard to get into it
You're the goat lmk if you need a job
Hi, Awesome video! If we are not going to use credentials.yml file in production, how are we going to access the required credentials?
Hey, Great videos, really enjoying it! I had a question. I tried using transcoding for the Byte64DataSet as well, but I found that I get the following error, kedro.pipeline.pipeline.OutputNotUniqueError: Output(s) ['iris_scatter_plot'] are returned by more than one nodes. Node outputs must be unique. I simply changed my catalog file to look like this, iris_scatter_plot@base64: type: iris.io.base64_data_set.Base64DataSet filepath: data/iris64.txt What exactly is the issue here?
Hi, great video, it's really a great series of tutorial videos on kedro, i'm rewatching a bunch of them whenever needed! So as i understand you have refactored one node , that you wrapped with a function that provides it with the datasets in the given dict and thus a new node is created. But let's say that i have a pipeline that has multiple nodes that does etl work on a txt file, and now i want to pass a bunch of txt files in a folder to the pipeline, is the only solution availlable to wrap all my nodes(or pure function python function) into one? Cause then in kedro-viz i would only see one node i.e. the one that is encompassing all the others.
cristal clear, thanks for that kind of video.
kedro is unable to save pyspark model. it says TypeError: cannot pickle '_thread.RLock' object. is there any method to save a pyspark model?
Hi, I'm working with Kedro but when I tried to visualize with kedro viz it is empty. In the PyCharm terminal, there is a warning saying that the catalog and parameters files are empty (which is not the case, both have yml files) Any idea why kedro viz can not see the yml files? Thanks!
Thank you so much for the tip by changing "script" to "module", was stuck at setting the run profile for a while and it works like a charm!
Best business description I have ever seen from data engineer!😍
Thank you for sharing! This is very useful. My team benefit from you a lot, by the way.
You have created great stuff. Thanks for all your efforts!! In entire video series I missed few advanced concepts like Kedro Plugins. Why don't you make video on creating kedro plugins?
This was a big help, thank you so much! And I love your energy!
Really great video! Thanks for taking the time to craft a very easy-to-follow demo, Tam!
Please don't say "KLI". It's called a "C" (letter C pronounced "see") "L" (letter L) "I" (pronounced "eye"). Thanks!
Kedro is interesting! well while creating a new project I don't see the option of asking for example pipeline why is that so ? could you please help me
Really enjoying your Kedro tutorials! Is the kedro-wings usage different in Kedro version 0.17.5? There is no `run.py` file.
What if you don’t want to save the table to sql? But instead want to save the metadata of the dataset in SQL, and store the dataframe itself on s3? (If it it is a large dataframe), similar to how one might save an image for a profile? This is important when you have multiple (timeseries) datasets you want to save.