MLOps Tutorial #1: Intro to Continuous Integration for ML

DVCorg

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 21 гру 2024

КОМЕНТАРІ • 109

@dvcorg8370 2 роки тому ⁺³
Please note we have deprecated the dvcorg/cml-py3 container image.
You can get the same results with:
- container: docker://dvcorg/cml-py3:latest
+ steps:
+ - uses: actions/checkout@v3
+ - uses: iterative/setup-tools@v1
@nagarjunavarikoti8160 4 роки тому ⁺²⁷
You made a complex topic sound very simple with your easy walkthrough steps! Please keep up the good work.
@dvcorg8370 4 роки тому ⁺²
really appreciate it, Nagarjuna! Always feel free to let us know if there's a topic you'd like to see :)
@sakshamgulati1578 3 роки тому
@@dvcorg8370 could you please make a video on how to make unit tests for models in MLOps?
@BudiArsana Рік тому ⁺¹
That diff report in pull request is awesome, thank you for sharing. I will try to use this technique in the future.
@malcolmdecuire7529 3 роки тому ⁺³
Starting in ML from a non-CS background was already hard enough, but Elle came thru and just made me smile and feel better about this complex subject.
I'm rewatching this entire series again. After looking at udemy, coursera, and even a few other websites there isn't someone talking about how to go from making ML projects on ur laptop to production environment.
Honestly, I'm grateful for the inspiration and I'm more committed to this self-learning route.
@phanikirans4728 3 роки тому
I doff my hat to you Elle...for a very crisp,easy to understand and uncluttered explanation of MLOps...
@DaredevilGotU 4 роки тому ⁺⁸
This is so cool. I Loved it. We can use this for writing test cases in PRs. Thank you.
@091carl 3 роки тому ⁺²
Wow, incredible clarity in your presentations. Thanks for all the great work, Elle!
@t.ganesh1692 4 роки тому ⁺³
Thank you for the excellent tutorial Elle and @DVCorg!
@sayakpaul3152 4 роки тому ⁺⁷
Excellent walkthrough! Would be cool to incorporate experiment tracking tools like Weights and Biases to automatically report metrics. But for starters, this is really a job well done!
@itsravimalhotra3 3 роки тому ⁺¹
Wow. This was soo good. She made it so easy to understand.
@MLOps 4 роки тому ⁺⁸
Soo cool to see this Elle! thank you for sharing and teaching us a thing or two in the community!
@stopznak86 7 місяців тому ⁺¹
Great stuff, I'm learning
@יהונתןאיזנשטיין 2 роки тому ⁺²
Great tutorial. Thank you!
@dvcorg8370 Рік тому
Glad it was helpful!
@AleksandrBlekh 4 роки тому ⁺²
Excellent tutorial. Keep it up!
@dvcorg8370 4 роки тому ⁺¹
Thanks Aleksandr! Much appreciated :)
@AleksandrBlekh 4 роки тому ⁺¹
@@dvcorg8370 It's my pleasure! :-)
@Kommalapatin 3 роки тому
pretty to explain the topics about the MLOps..keep it up.good work elle.
@mayurlohana 3 роки тому
You are defining things in rightful manner and things are understood easily. AMAZING 🤩
@dvcorg8370 3 роки тому
Thanks so much, Mayur! The kind words are really appreciated :)
@shroukmansour7642 3 роки тому ⁺¹
What is special about github actions and CML so I use them instead of using something like jenkins for example??
@johannesallgaier5722 3 роки тому ⁺¹
Great video! Such precise and clear explanations! Thank you for sharing.
@bhagwatchate7511 3 роки тому ⁺¹
Great explanation
@dvcorg8370 3 роки тому
Glad you think so!
@regularSenseAppeal 4 роки тому ⁺¹
Very good thank you. Superbly explained.
@hyattBaker 3 роки тому ⁺¹
Thank you that was very helpful!
@DataScienceGarage 3 роки тому ⁺¹
Hi! That's is the tutorial I was searching for. Thanks a lot!
@iPondrio 6 місяців тому
Do you have any video showing how to configure the token ? I’m having a hard time with that config
@mehrdat 6 місяців тому
thank you very much. but why i have errors. i couldn't run after first commit. i tried nearly everything. it is deom the the line of the importance plot. what it could be?
@Chevignay Рік тому
Really great video thank you
@IrtizaKaleem 3 роки тому ⁺¹
Hi Elle, can you shed some light if I can do the same, but with a different docker image, such as continuum/anaconda3, so I can do the same for a conda environment? Other than the docker image link, what else would I need to change?
@tanim980 Рік тому ⁺¹
you are just amusing!
@danielbaena4691 3 роки тому
Thank you so much for this video and all your work, it is just amazing!
@dvcorg8370 2 роки тому
You're very welcome!
@ris2043 Рік тому ⁺¹
Excellent
@dvcorg8370 Рік тому
Thank you! Cheers!
@toilinginobscurity3091 2 роки тому
Let's say we have a couple of commits in the experiment branch and we want to merge the branch with squashed option. What would happen then? All the reports would be combined?
@jackbauer322 4 роки тому ⁺²
What's the main difference with DVC ? How they articulate together ? or not ? thanks again !
@dmitrypetrov3542 4 роки тому ⁺³
DVC and CML complement each other. CML was created by the DVC team - see cml.dev
A bit more tech details: DVC is usually used to transfer data to CI/CD (CML) runners.
@jackbauer322 4 роки тому ⁺¹
@@dmitrypetrov3542 Ok ! So from my understanding DVC is for experiment tracking and CML is more for for CI/CD MLOps ?
@dmitrypetrov3542 4 роки тому ⁺¹
@@jackbauer322 exactly. DVC - data & ML experiments. CML - team collaboration & ML training.
@philiperiskallaleal6010 3 роки тому ⁺¹
Dear Elle, what would be the required changes for implementing CML into GITLAB? Does GITLAB has some type of "GitHub Actions" functionality? If so, where can I check for it?
@dvcorg8370 3 роки тому ⁺¹
Good q- GitLab has something called GitLab CI, which is extremely similar and gives you must of the same functionality! There are a few subtle differences in how you setup things like environmental variables/secrets, but it's not too bad. We have some docs here: dvc.org/doc/cml/start-gitlab
@philiperiskallaleal6010 3 роки тому
Awesome presentation. Thank you for your great work
@dvcorg8370 3 роки тому
Thanks Phillipe!
@soumantadas8564 4 роки тому
This is extremely helpful Elle and DVCorg. Had a follow-up question - if I wanted to generate multiple metric files and residual plots from the train.py script (say because I am running a loop varying max_depth over [5,10,15] or varying some other hyperparameters), what would be the best way to modify the workflow so that I can see all the data and viz in one commit?
A crude way could be to store the metrics and plots with diff names in train.py and in the cml.yml file add them separately to report.md. However, as the no of loops increase, this wouldn't be a scalable method.
@dvcorg8370 4 роки тому ⁺²
So what if you were to write out your metrics in one file using longform? So for example....
max_depth | accuracy
5. | 87
10. | 90
15. | 92
And likewise, put all your plots on one axis- so like, many lines of different colors, using your favorite plotting library.
Then you'd be able to print your table and your summary plot in your cml report with only one line of code each, no matter how long your loop is.
@soumantadas8564 4 роки тому ⁺¹
@@dvcorg8370 Ahh yes, a very nice workaround. Thanks.
@MohammedBakheet 4 роки тому
Very nice explanation indeed, thank you so much, keep it up
@rostyslavbryiovskyi4591 3 роки тому ⁺¹
Hi, thanks for comprehensive explanation!)
But I have one more question. Can I use CML with Azure TFS ?
@dvcorg8370 3 роки тому
Yes you can! See these docs: cml.dev/doc/cml-with-dvc. And please join us in our Discord server if you have more questions! discord.gg/rpgRdvfyAf
@philiperiskallaleal6010 3 роки тому
Dear Elle, would you be so kind as to show/describe how one can implement a dvc pull request that is meant to be run by a .github/workflows "yaml"'s file, so that it is only run on the git remote repository? An approach through which would be possible to "gitignore" the dvc data, while allowing the git remote a temporary access to the data to properly test the CML commited. Perhaps use some kind of data cache by the git remote repository, and later an automatic deletion of this cached data?
@dvcorg8370 3 роки тому
One approach is using a local DVC config file, which lets you have a different data remote/different credentials for when you're working locally than what's in your CI/CD system. That means you can still have a DVC config file that gets pushed to your Git repo, but you'll have a local version that gets used when you're developing in your workspace. Docs here: dvc.org/doc/command-reference/remote#example-add-a-default-local-remote
Another thought that comes to mind is that you could make the credentials to pull from the DVC remote only available to the runner (via secrets). You might then write a control flow statement... if those environmental variables are present, then run dvc pull. else, don't. : If you want to discuss this in more detail, stop by the CML channel on our Discord: discordapp.com/invite/dvwXA2N
@muhammadfarjadaliraza4546 3 роки тому
Awesome video, want to know how to use tpu and gpu ?
@gdibble 2 роки тому ⁺¹
🔥🔥🔥
@jjpp1993 3 роки тому
this is great! thanks for sharing
@anikethdeshpande8336 4 роки тому
Awesome tutorial!
@dvcorg8370 4 роки тому
Thanks Aniketh!
@OmarHisham1 2 роки тому ⁺²
15:08
- I made an an amazing model
cat in the background : Yaaa
@dvcorg8370 Рік тому
Congrats!
@SheeceGardazi 3 роки тому
thanks for sharing the talk
@shaunirwin2016 4 роки тому ⁺¹
Very nice tutorial! I really like this concept of integrating into the normal software stack.
How would one handle the situation of adding new metrics over time? E.g. If you begin a project only displaying F1 score, but as you train more models you realise you are also interested in seeing and comparing the precision. Could this be catered for using CML?
@dmitrypetrov3542 4 роки тому ⁺¹
Yep, using the existing software stack for ML is one of the ideas behind CML.
That's a really good question. The flow relies on Git a lot. So, if the scores were stored\commited then you can derive F1 as well as precision. However, if the scores were not stored/committed you might need to return back, create another experiment just to get the right scores to compare. How do you do that with the other tools or approaches?
One relevant discusion - github.com/iterative/dvc/issues/4210
@shaunirwin2016 4 роки тому
@@dmitrypetrov3542 thanks very much for the reply! Yes, I thought the solution might be something along those lines. For database approaches such as MLFlow one can log metrics later on to previous experiments/runs. I suppose with a git-based system of storing metrics one could manually add an extra commit with the new scores? Or of course rerun the experiment in the normal way with the new scores included, as you suggest. Although for long training times that could be a problem, if you are actually just wanting to do scoring, not training.
@dmitrypetrov3542 4 роки тому
@@shaunirwin2016 yes, an additional commit is one of the solutions.
Re long-running experiments - you are right, but the same happens with logging tools like mlflow - you need to retrain to get the metrics. The only difference, the commit is not needed.
@vishal-rana 4 роки тому
Beautiful.
@carloslopez7204 3 роки тому
How can I set a secret token in GitHub actions? My program is calling an API so a need to write the secret token but I don't know if it's correct to write it in cml.yaml because it gonna be public
@dvcorg8370 3 роки тому ⁺¹
You can add the secret to your GitHub repository, which will give the runner access to it via an environmental variable. You can set it so the variable will be hidden even in logs- check out their docs! docs.github.com/en/actions/reference/encrypted-secrets
@mirmohammadjaber2676 4 роки тому ⁺¹
Have you deleted the experiment branch from the repository?
@dvcorg8370 4 роки тому ⁺²
Yes, but you can see the closed PR and browse the branches at previous points in time github.com/andronovhopf/wine/pull/2
@sayakpaul3152 4 роки тому
One thing I figured that the actions do not always trigger upon a new commit to a branch. Is there a way to prevent it?
@dmitrypetrov3542 4 роки тому
They trigger on push requests. For several local commits and a single push it will run only the last one. So, you need to push on each of the commits.
@fabianpena2776 4 роки тому
Thx. The tutorial is amazing. In comments, I am not able to see the PNG files, only the links. Do I need to configure something more?
@dvcorg8370 4 роки тому ⁺¹
Hm, that sounds like you might be missing a flag in your cml-publish function. Do you have `cml-publish --show-md >> report.md`? If you don't have the `--show-md` flag, you'll get a link to your image instead of an embedded picture.
@fabianpena2776 4 роки тому ⁺¹
@@dvcorg8370 Thank you again! Now, it works for me :)
@jordieclive 4 роки тому
what can this CML tool do that circleci Continous Integration can't do?
@dvcorg8370 4 роки тому
To be clear, CML isn't a competitor to Circle CI. Circle CI is more analogous to GitHub Actions or GitLab CI; it's a continuous integration system.
CML is a toolkit that works with a continuous integration system to 1) provide big data management (via DVC & cloud storage), 2) help you write model metrics and data viz to comments in GitHub/Lab, and 3) orchestrate cloud resources for model training and testing.
Currently, CML is only available for GitHub Actions and GitLab CI. But it could in the future integrate with Circle CI (i.e., as an Orb).
@jordieclive 4 роки тому ⁺¹
@@dvcorg8370 thanks for detailed reply. I've got it clear in my head now 😃, I watched the other bids in the series and you explain very clearly..I look forward to videos setting up cloud workflow with CML and versioniglng like S3 , gcp. I'm not sure if you are planning to do DL content.. As a suggestion I Would love to see pytorch workflows on cloud with say multigpus . And like basic training tests in CML workflow , like sanity check :fitting/ evaluation on single batch etc.
Please keep up tutorials!
@dvcorg8370 4 роки тому
@@jordieclive No problem! Let us know any other questions you have :)
@leilainigodelacruz3648 4 роки тому ⁺¹
Hi, Thanks for your very useful video. I have a question , because I was trying to replicate this example in my own repo and failed in this part of the cml.yaml
` steps:
- uses: actions/checkout@v2
- name: train_model
env:
repo_token: ${{ secrets.GITHUB_TOKEN }}`
do you mean by GITHUB_TOKEN a secret key that I assign in Settings/Secrets tab from the repo? which is a private key. If this is true, I dont know why ifI put my own private key name it doesnt work :(
@dvcorg8370 4 роки тому
Hi Leila! You don't have to assign any value to GITHUB_TOKEN- it is assigned by default in a GitHub repository. Please delete any secrets you might have added and try again. If it doesn't work, stop by our Discord channel where we can do more hands-on troubleshooting :) discord.gg/bzA6uY7
@leilainigodelacruz3648 4 роки тому ⁺¹
@@dvcorg8370 Thanks! It did work!
@derekcorcoran5129 4 роки тому
Hello Elle, this looks great, it seems that it works for Python only? I develop Machine Learning tools in R, and I would love to help integrate this if possible
@dvcorg8370 4 роки тому ⁺²
The tools we're using here (GitHub Actions and CML) work with any language! Here's a blog about a project using R: mribeirodantas.xyz/blog/index.php/2020/08/10/continuous-machine-learning/
There's a GitHub Action for getting R on your runner, too: github.com/r-lib/actions
@derekcorcoran5129 4 роки тому
DVCorg thanks, you are doing an amazing job
@jwc7663 4 роки тому
Scenario: Need NN model and want to test in using GPU. Is it possible as well?
@dvcorg8370 4 роки тому ⁺⁴
Yes! We'll be covering that use case in a video soon. For now we have some an example project to browse: github.com/iterative/cml_cloud_case
@jwc7663 4 роки тому
@@dvcorg8370 That looks good. Will it support local machine(not cloud) as well?
@dvcorg8370 4 роки тому ⁺¹
@@jwc7663 Yes- you can set GitHub Actions (& GitLab CI, too) to use self-hosted runners, which can be a local machine. Check out the docs here: docs.github.com/en/actions/hosting-your-own-runners/about-self-hosted-runners
@efels_com 4 роки тому ⁺²
@@dvcorg8370 I would love to see the self hosted GPU flow with the ability to compare the results from the model that is in the master branch repo. And using dvc to roll the data set back to the data set that was used to train the model in master branch. So we could compare both models, on new and old data.
@dvcorg8370 4 роки тому ⁺³
@@efels_com We can do this! Adding this to the list of to-dos.
@davidbalakirev5963 3 роки тому
Hands up if you also had an espresso while watching this.
@hamdikhaled6955 4 роки тому
Thanks a lot
@jackbauer322 4 роки тому
How would mlflow come in here?
@dvcorg8370 4 роки тому ⁺⁶
Good question- you can integrate lots of tools with CML. For example, you can use it with Tensorboard to get a link to your Tensorboard in a PR whenever the model trains. Check out this use case: github.com/iterative/cml_tensorboard_case/pull/3
We haven't tried with MLFlow in particular yet, but expect there could be a similar approach.
@jackbauer322 4 роки тому ⁺²
@@dvcorg8370 Thanks ! Can't wait for the next videos :)
@jalaj1 4 роки тому ⁺¹
Hi can you make video on mlcertific.com It is providing free certification on MLOps
@drm8164 Рік тому
i love u
@dvcorg8370 Рік тому
🦉 We love you too!
@jeremykusnadi5148 3 місяці тому
how do you get around the " `GLIBC_2.28' not found " error?
@dvcorg8370 2 місяці тому
This error typically occurs when trying to run a program that was compiled with a newer version of the GNU C Library (GLIBC) than what's installed on your system. Check that version requirements match up and you should be all set!

Наступне

Автоматичне відтворення

MLOps Tutorial #2: When data is too big for Git