Follow Up Notes: Chin Hwee's talk got caught off, but here's a link to her notes: ep2020.europython.eu/talks/8DboZjY-speed-up-your-data-processing/ Also, the girl snacking in the background teaches yoga. Here's her video on arm balancing! ua-cam.com/video/bFlxi8iAozs/v-deo.html
@@DataEngineerOne Hydra as in Facebook Hydra for config management. Kedro's config is great but I was trying to use Hydra because of some of the good features it has. The problem is that Hydra also has a CLI and when you Kedro Run a pipeline, it will clash with Hydra.
Hey, this is cool! Thanks a lot for sharing it; I'll have to give it a go. I'm almost positive that Kedro can be hooked up with Hydra, with some ProjectContext tweaks, but without trying it out myself, I can't say for certain. Maybe you could you describe a little more to me about which features you are using most, from Hydra, in your projects? I'm definitely going to make a video about it, so whatever you can help me understand about your use case will directly influence that future video. :)
@@DataEngineerOne Hydra has this property which changes the current directory to a new directory which is based on the run. It saves all the configs related to the run to that directory and when you save any artifact, it will go into that specified directory. Therefore, you'll have a nice versioning of every run. In addition to that, it has the ability to have multiple configurations for a certain group of parameters (multiple yaml files which have the same keys) and the config is chosen based on the master yaml which points to the desired config.
@@SiavashSakhavi Cool! I see I see. Kedro does have versioning as part of their datasets, but that's on a dataset by dataset basis, and not all datasets support it. I can see the utility in having it all cut and dry, in one folder. Interesting, so for that master yaml file, it can point to multiple yaml files it chooses from to create the desired config? Am I understanding that correctly? If that's the case, that does make it a little easier to manage than the current environment paradigm kedro uses.
Follow Up Notes:
Chin Hwee's talk got caught off, but here's a link to her notes: ep2020.europython.eu/talks/8DboZjY-speed-up-your-data-processing/
Also, the girl snacking in the background teaches yoga. Here's her video on arm balancing! ua-cam.com/video/bFlxi8iAozs/v-deo.html
Great video man!
Just subbed!
Hey DE1,
Have you thought about using Hydra with a data science pipeline in Kedro?
I tried but didn't manage to get it to work.
Hi Siavash, which Hydra are you referring to?
@@DataEngineerOne Hydra as in Facebook Hydra for config management. Kedro's config is great but I was trying to use Hydra because of some of the good features it has. The problem is that Hydra also has a CLI and when you Kedro Run a pipeline, it will clash with Hydra.
Hey, this is cool! Thanks a lot for sharing it; I'll have to give it a go. I'm almost positive that Kedro can be hooked up with Hydra, with some ProjectContext tweaks, but without trying it out myself, I can't say for certain. Maybe you could you describe a little more to me about which features you are using most, from Hydra, in your projects? I'm definitely going to make a video about it, so whatever you can help me understand about your use case will directly influence that future video. :)
@@DataEngineerOne Hydra has this property which changes the current directory to a new directory which is based on the run. It saves all the configs related to the run to that directory and when you save any artifact, it will go into that specified directory. Therefore, you'll have a nice versioning of every run.
In addition to that, it has the ability to have multiple configurations for a certain group of parameters (multiple yaml files which have the same keys) and the config is chosen based on the master yaml which points to the desired config.
@@SiavashSakhavi Cool! I see I see. Kedro does have versioning as part of their datasets, but that's on a dataset by dataset basis, and not all datasets support it. I can see the utility in having it all cut and dry, in one folder.
Interesting, so for that master yaml file, it can point to multiple yaml files it chooses from to create the desired config? Am I understanding that correctly? If that's the case, that does make it a little easier to manage than the current environment paradigm kedro uses.