Discuss this video with other data pipeliners at discourse.kedro.community/t/the-complete-beginners-guide-to-kedro-de1-youtube/35! Clone the repo right here: git clone github.com/tamsanh/kedro-introduction-tutorial
Great tutorial. At 3:45 how did you get that subset of history commands to appear at the bottom with "r" and "u" highlighted? When I start to type "run" my terminal doesn't do that awesome sub-setting of history commands
Aha! Yes, what I did was I pressed "CTRL+r" to open up the reverse history search. Then, I also have a special version of it installed called fzf history. It's a fun tool!
Hello DataEngineerOne, Thanks for it.. just had one thing.. after cloning your GIT and creating the VE and then installing Kedro in it ... when I am using the 'run' command I am getting this error: ''Error: No such command 'run'.'' Since it's a project-specific Kedro command, it must work here but it isn't, do you know why !!?
Great video! Thanks for it. Most of my transformation occur in a relational database Teradata or Hive. Can I use kedro to transform data using database as execution enigne? Sometimes I have big size tables that doesn't fit in memory so I need to use databases.
Aha, great question. This is something that I've talked about with the kedro team, and they recommend using dbt. Personally, I think we're just missing the proper abstraction, and we should be able to get pretty far if we were to build something similar spark dataframes, but for database tables. Of course, I am unfamiliar with such things. The alternative is to pass around a SQL connection through your nodes, and use `pandas.read_sql` with the connection. That's not too bad, I think.
Great video! Would you use Kedro for the experimental part of a data science project (with Notebooks)? Or it's more when you want to wrap things in a src that you would use Kedro? Thanks!
Great question! I actually love using kedro for the very beginning, since it has such convenient functions for loading data. Then, I use Jupyter to explore using the kedro integration. Finally, at the end, I copy code from jupyter into kedro so that I may create the data assets I'd like to reuse. After that, I repeat using the newly created data assets to explore and experiment with, in Jupyter notebook. Does that make sense?
@@DataEngineerOne I'm just starting out with it. I've used MLflow for it's tracking capability but never used a tool for the workflow. Once I get more familiar with it, I'll let you know if I'm doing something similar to you! :) Thanks!
Hello DataEngineerOne, Thanks so much for this great video! I followed through your steps and I encountered this error traceback while executing kedro run --pipeline final-pipeline on pycharm terminal kedro.io.core.DataSetError: Failed while saving data to data set MemoryDataSet(). TransformNode instances can not be copied. Consider using frozen() instead. I would like some help on how to resolve this please :) Thanks again for this video, Kedro really is amazing!
Aha! Yes, this is an unfortunate problem with the matplotlib Figure, and it doesn't allow itself to be copied. You Must create a matplotlib.MatplotlibWriter DataSet as the output. Please add one to your catalog, with the same name as your file-pipeline output. :)
Hi, I downloaded the project from Git and getting error in hooks.py file while listing or running the pipelines. Pycharm is showing error in "from kit.pipelines import ( hello_world, survival_breakdown, gender_survival_breakdown, class_gender_survival_breakdown, )" line of code which is in hooks.py file. Kindly help to resolve this error.
The Kedro documentation on their site is utter shit. Heck their "hello world" tutorial doesn't even work. They don't give examples on how to implement features....just really hard to get into it
Discuss this video with other data pipeliners at discourse.kedro.community/t/the-complete-beginners-guide-to-kedro-de1-youtube/35!
Clone the repo right here: git clone github.com/tamsanh/kedro-introduction-tutorial
Really great video! Thanks for taking the time to craft a very easy-to-follow demo, Tam!
Great tutorial. At 3:45 how did you get that subset of history commands to appear at the bottom with "r" and "u" highlighted? When I start to type "run" my terminal doesn't do that awesome sub-setting of history commands
Aha! Yes, what I did was I pressed "CTRL+r" to open up the reverse history search. Then, I also have a special version of it installed called fzf history. It's a fun tool!
If you are facing the Error: No such command 'run' do:
Still in the venv - cd
pip install -r requirements.txt
After that the run command will work.
Great video, thanks a lot!
Very good video. Thanks
good high level explaination. Think things are broken with new versions. Need a updated tutorial.
Hello DataEngineerOne, Thanks for it.. just had one thing.. after cloning your GIT and creating the VE and then installing Kedro in it ... when I am using the 'run' command I am getting this error:
''Error: No such command 'run'.''
Since it's a project-specific Kedro command, it must work here but it isn't, do you know why !!?
Facing same problem
Thanks for the video. Can you cover kedro build-docs ?
Sure! That's a great suggestion
Can you please what is this survival breakdown module?
DO you have any idea of integrating kedro with aws sagemaker ?
Great video! Thanks for it. Most of my transformation occur in a relational database Teradata or Hive. Can I use kedro to transform data using database as execution enigne? Sometimes I have big size tables that doesn't fit in memory so I need to use databases.
Aha, great question. This is something that I've talked about with the kedro team, and they recommend using dbt. Personally, I think we're just missing the proper abstraction, and we should be able to get pretty far if we were to build something similar spark dataframes, but for database tables. Of course, I am unfamiliar with such things.
The alternative is to pass around a SQL connection through your nodes, and use `pandas.read_sql` with the connection. That's not too bad, I think.
If you're using Hive, though, you should be able to take advantage of Spark, to some degree.
Great video! Would you use Kedro for the experimental part of a data science project (with Notebooks)? Or it's more when you want to wrap things in a src that you would use Kedro? Thanks!
Great question! I actually love using kedro for the very beginning, since it has such convenient functions for loading data. Then, I use Jupyter to explore using the kedro integration. Finally, at the end, I copy code from jupyter into kedro so that I may create the data assets I'd like to reuse. After that, I repeat using the newly created data assets to explore and experiment with, in Jupyter notebook. Does that make sense?
@@DataEngineerOne I'm just starting out with it. I've used MLflow for it's tracking capability but never used a tool for the workflow. Once I get more familiar with it, I'll let you know if I'm doing something similar to you! :) Thanks!
How to understand a data landscape of a company if you do not have Studio?
What about adding transcripts to your videos? That would be really helpful to take notes. UA-cam can generate them for you
Hello, Tam!
I would like to know if is it possible to use Kedro as a serverless tool on Azure Cloud.
Hello DataEngineerOne, Thanks so much for this great video! I followed through your steps and I encountered this error traceback while executing kedro run --pipeline final-pipeline
on pycharm terminal
kedro.io.core.DataSetError: Failed while saving data to data set MemoryDataSet().
TransformNode instances can not be copied. Consider using frozen() instead.
I would like some help on how to resolve this please :)
Thanks again for this video, Kedro really is amazing!
Aha! Yes, this is an unfortunate problem with the matplotlib Figure, and it doesn't allow itself to be copied. You Must create a matplotlib.MatplotlibWriter DataSet as the output. Please add one to your catalog, with the same name as your file-pipeline output. :)
Hi, I downloaded the project from Git and getting error in hooks.py file while listing or running the pipelines. Pycharm is showing error in "from kit.pipelines import (
hello_world,
survival_breakdown,
gender_survival_breakdown,
class_gender_survival_breakdown,
)" line of code which is in hooks.py file. Kindly help to resolve this error.
Hi Amrinder! Could you please post your entire error in kedro.community?
beautiful voice
The Kedro documentation on their site is utter shit. Heck their "hello world" tutorial doesn't even work. They don't give examples on how to implement features....just really hard to get into it
Please don't say "KLI". It's called a "C" (letter C pronounced "see") "L" (letter L) "I" (pronounced "eye"). Thanks!
useless.....