@mikegchambers Excellent Content! I was lost in Abalone code and you saved me, Thanks again. Can you please let me know, how I can use this endpoint to create an inference pipeline (independent of this notebook)
how you use VS code 1) locally on your computer and make calculations remotely on aws sagemaker, then how you connected to aws sagemaker or 2) use VS running on aws , then how you set up vs code?
Thanks for your great video. i assume that every single line of code of this jupyter notebook can not run on a local host. We need to run it on sagemaker notebook. Is that right? So, where the python s cript should be located ??
Nice video! I'd be interested to see how to deploy an endpoint with a custom inference script for the input and output_handlers, if you got time on your hands!
@@kscolina so I’ve been working on a demo project. Can I confirm what is wanted in terms of input and output handlers? Are we talking pipeline model with data pre-processing?
In Canvas, is there a python script for training the data (includes the algorithm used to train the data) gets created that I can download, other than the model notebook ?
@mikegchambers great video - just wondering if there would be any additional adaptations to make if you were training a deep learning model with eg. pytorch/fastai?
great video, SM_MODEL_DIR, SM_CHANNEL_TRAIN, SM_CHANNEL_TEST, these have default /opt/ml locations defined in sagemaker python sdk? And once we pass the s3 bucket location, the data from the s3 bucket is automatically pulled into the sagemaker training job containers?
thanks for the video mike! it was insightful, it was really helpful. it would be even more better if you can take a model which is in local and convert it to sagemaker compatible python script and recording it parallely.
great! As an MLOps engineer trying to persuade data scientists to use sagemaker, I found this useful. Basically, they can use a same notebook to pass different hyper params and data to generate different training jobs, am i right?
Absolutely. You can use notebooks as you would normally, and use the SageMaker SDK to train jobs at scale, and use many more tools like AutoML, Data Wrangler, etc etc. So much power, yet in a familiar interface.
Hello Mike, I need additional python modules which I wish to place in requirements.txt file, will that be picked up by Sklearn Container to install the modules?
This is awesome! I've been struggling with using the sagemaker SDK and this allows me to use pure python and its open source ML packages but on top of the compute resources of AWS. I like to separate my ML pipeline into their unique notebooks (ie. processing, training, tuning, etc.). Can we use multiple scripts? You've earned my subscription.
It seems unnecessarily complicated that sagemaker demands we put the training code in a separate script, it would be easier if we could just put it into the notebook with everything else. It also makes it difficult to monitor and debug the actual training scipt when it's implemented separately and run as a monolith. Why is this required and could you just as well just put everything in the notebook?
Hey. I hear your frustration, and I have some thoughts here. So this method is all about running the code at scale, and specifically not running it inside the notebook itself. In other words we are using the notebook, not for ML code, but to orchestrate other infrastructure to run our ML code. So, when I comes to debugging, you would want to do list debugging earlier in in the process, with the ML code (probably in a notebook somewhere) and once your happy with it, we move to this method described here. As for debugging the ‘at-scale’ production deployment, there are ways to do this, that I didn’t cover in this video but I think I should in a future video. I hope that helps put things into perspective. I appreciate you raising that point, and I’ll see how I can clarify for the future.
@@mikegchambers Thank you for the response :) When you say "in a notebook somewhere" are you talking about somewhere in AWS Sagemaker? I would like to use Sagemaker both for development (running on smaller datasets, check that model is correctly setup, monitor convergence etc.) and then later maybe for large scale training. Where do I turn for the former?
@@hejden yes absolutely. So you can spin up a notebook in SageMaker Notebooks, SageMaker Studio, or even SageMaker Studio Labs (for free) and run the ML code in the notebook ’locally’ (to the notebook server). When you’re happy you can ‘migrate’ the code into prod scale as shown here. It’s basically the setup I run through here. I show how the ML code works in the notebook, then get it working in the container using SageMaker. Maybe what I could clarify, is that in this video I use a the same notebook to explore the solution and then get it working in SageMaker managed containers. There is no need to have both these steps in the same notebook, and in many real world scenarios you probably wouldn’t. Steps: - Get your ML code working. Could be done on your own machine, or on a notebook server like SagaMaker Notebooks, Studio, or Studio Labs. This code should include methods to load data, and to serialise and deserialise the model for storage. - Transfer the code into a .py file with the necessary function hooks that SageMaker will be looking for, for the lifecycle of the ML tasks. (Load data, save model, etc). - Create some SageMaker code to orchestrate getting your .py file into a managed SageMaker container. This code can also run anywhere you have access to AWS SDKs, so your own machine, and EC2 instance, a SageMaker Notebook or Studio (but probably not Studio Labs at this time.) - Run your orchestration code and SageMaker will handle the rest. As a preference, I run all code in SageMaker when I can. I don’t like local development and dealing with dependencies etc. it sounds like this is you’re preference too. Make sense? (I’m typing this on my phone, fingers crossed there are not too many typos!)
Hi Mike. Very informative presentation. I need to create a model trained only on Mainframe code artifacts (COBOL, JCL, DB2, etc.) I have a full set of GPT prompts, scripts and templates that generate all variations of full program code for my industry the first 4 months of this year. The biggest drawback that prevents companies from adopting the LLM approach is that public models don't give them secure protection of their code and data. If someone could guide me how to create a locally housed model that can be language, token or template interrogated, we can make a lot of money. The model doesn't need to be trained for email replies, excel formulas, document summaries, etc. It needs to absorb our entire code base and add it to any working model that has some level of intelligent COBOL / Mainframe code generating prowess. Is there some way to co-opt the ChatGPT 4 code base for COBOL, SQL, JCL and add it to our code base on a local machine? I think in 2 years this will be the standard method of project development. Some companies may soon be overrun by those that are willing to be the initial movers in this arena.
Extremely helpful, thank you.
Outstanding, Mike - thank you!
Glad you enjoyed it!
Very good one, straight to the point. Thanks for making it
Thank you very much this is really awesome
Hi Mike, How will this end point be used by other application outside AWS ? a production mobile application, for example ?
For that architecture you would want to place it behind an API. Typically you would use the API Gateway and a Lambda function.
@mikegchambers Excellent Content! I was lost in Abalone code and you saved me, Thanks again. Can you please let me know, how I can use this endpoint to create an inference pipeline (independent of this notebook)
how you use VS code 1) locally on your computer and make calculations remotely on aws sagemaker, then how you connected to aws sagemaker or 2) use VS running on aws , then how you set up vs code?
Thanks for your great video. i assume that every single line of code of this jupyter notebook can not run on a local host. We need to run it on sagemaker notebook. Is that right? So, where the python s
cript should be located ??
Nice video! I'd be interested to see how to deploy an endpoint with a custom inference script for the input and output_handlers, if you got time on your hands!
Sounds like a plan! :)
@@mikegchambers Yes pleasee.
@@kscolina so I’ve been working on a demo project. Can I confirm what is wanted in terms of input and output handlers? Are we talking pipeline model with data pre-processing?
@@mikegchambers That I am not sure of yet. By the way, I raised a question in a separate reply. :)
In Canvas, is there a python script for training the data (includes the algorithm used to train the data) gets created that I can download, other than the model notebook ?
@mikegchambers great video - just wondering if there would be any additional adaptations to make if you were training a deep learning model with eg. pytorch/fastai?
Is this endpoint needs an API gateway, and maybe lambda, so I can inference from outside AWS world ?
great video, SM_MODEL_DIR, SM_CHANNEL_TRAIN, SM_CHANNEL_TEST, these have default /opt/ml locations defined in sagemaker python sdk? And once we pass the s3 bucket location, the data from the s3 bucket is automatically pulled into the sagemaker training job containers?
thanks for the video mike! it was insightful, it was really helpful. it would be even more better if you can take a model which is in local and convert it to sagemaker compatible python script and recording it parallely.
Yeah you can do that. Hosting models for inference only. I’ll keep in mind for other videos.
great! As an MLOps engineer trying to persuade data scientists to use sagemaker, I found this useful. Basically, they can use a same notebook to pass different hyper params and data to generate different training jobs, am i right?
Absolutely. You can use notebooks as you would normally, and use the SageMaker SDK to train jobs at scale, and use many more tools like AutoML, Data Wrangler, etc etc. So much power, yet in a familiar interface.
@@mikegchambers Do you mind doing video on AutoML, data wrangler, etc. I swear you can explain ML to my grandmother lol
@@Kmysiak1 on the way!
Hello Mike, I need additional python modules which I wish to place in requirements.txt file, will that be picked up by Sklearn Container to install the modules?
Great content!!!
how do you think we can perform distributed computing on GPU with pytorch/tensorflow in script mode?
This is awesome! I've been struggling with using the sagemaker SDK and this allows me to use pure python and its open source ML packages but on top of the compute resources of AWS. I like to separate my ML pipeline into their unique notebooks (ie. processing, training, tuning, etc.). Can we use multiple scripts? You've earned my subscription.
It seems unnecessarily complicated that sagemaker demands we put the training code in a separate script, it would be easier if we could just put it into the notebook with everything else. It also makes it difficult to monitor and debug the actual training scipt when it's implemented separately and run as a monolith. Why is this required and could you just as well just put everything in the notebook?
Hey. I hear your frustration, and I have some thoughts here.
So this method is all about running the code at scale, and specifically not running it inside the notebook itself. In other words we are using the notebook, not for ML code, but to orchestrate other infrastructure to run our ML code.
So, when I comes to debugging, you would want to do list debugging earlier in in the process, with the ML code (probably in a notebook somewhere) and once your happy with it, we move to this method described here. As for debugging the ‘at-scale’ production deployment, there are ways to do this, that I didn’t cover in this video but I think I should in a future video.
I hope that helps put things into perspective. I appreciate you raising that point, and I’ll see how I can clarify for the future.
@@mikegchambers Thank you for the response :) When you say "in a notebook somewhere" are you talking about somewhere in AWS Sagemaker? I would like to use Sagemaker both for development (running on smaller datasets, check that model is correctly setup, monitor convergence etc.) and then later maybe for large scale training. Where do I turn for the former?
@@hejden yes absolutely. So you can spin up a notebook in SageMaker Notebooks, SageMaker Studio, or even SageMaker Studio Labs (for free) and run the ML code in the notebook ’locally’ (to the notebook server). When you’re happy you can ‘migrate’ the code into prod scale as shown here.
It’s basically the setup I run through here. I show how the ML code works in the notebook, then get it working in the container using SageMaker.
Maybe what I could clarify, is that in this video I use a the same notebook to explore the solution and then get it working in SageMaker managed containers. There is no need to have both these steps in the same notebook, and in many real world scenarios you probably wouldn’t.
Steps:
- Get your ML code working. Could be done on your own machine, or on a notebook server like SagaMaker Notebooks, Studio, or Studio Labs. This code should include methods to load data, and to serialise and deserialise the model for storage.
- Transfer the code into a .py file with the necessary function hooks that SageMaker will be looking for, for the lifecycle of the ML tasks. (Load data, save model, etc).
- Create some SageMaker code to orchestrate getting your .py file into a managed SageMaker container. This code can also run anywhere you have access to AWS SDKs, so your own machine, and EC2 instance, a SageMaker Notebook or Studio (but probably not Studio Labs at this time.)
- Run your orchestration code and SageMaker will handle the rest.
As a preference, I run all code in SageMaker when I can. I don’t like local development and dealing with dependencies etc. it sounds like this is you’re preference too.
Make sense? (I’m typing this on my phone, fingers crossed there are not too many typos!)
Hi Mike. Very informative presentation. I need to create a model trained only on Mainframe code artifacts (COBOL, JCL, DB2, etc.) I have a full set of GPT prompts, scripts and templates that generate all variations of full program code for my industry the first 4 months of this year. The biggest drawback that prevents companies from adopting the LLM approach is that public models don't give them secure protection of their code and data. If someone could guide me how to create a locally housed model that can be language, token or template interrogated, we can make a lot of money. The model doesn't need to be trained for email replies, excel formulas, document summaries, etc. It needs to absorb our entire code base and add it to any working model that has some level of intelligent COBOL / Mainframe code generating prowess. Is there some way to co-opt the ChatGPT 4 code base for COBOL, SQL, JCL and add it to our code base on a local machine? I think in 2 years this will be the standard method of project development. Some companies may soon be overrun by those that are willing to be the initial movers in this arena.