I was able to test your Jupyter lab notebook and it generates the `adapter.npz` file and everything works! But how do I create a new model that has the `adapters.npz` embedded inside of it? I am running an Ollama server; how would we load it with this newly fine-tuned model, because we're using proprietary data so everything has to remain local to my machine and can't be uploaded to the Internet.
@@AlissonSantos-qw6db Great suggestion. While I may not be the best source for MLOps, I can definitely include more details around implementation of specific use cases.
Thanks I have been using Unsloth remotely for fine tuning. Once the cloud bills start coming in, I am hoping to convince my boss that a macbook pro can be an option. My MLX are still just open tabs, glad to see someone doing it as well.
A folder should be created after training with the base model. Additionally, an adapters.npz file should appear which contains the adapters learned from LoRA. For running MLX models with Ollama, this video seems helpful: ua-cam.com/video/3UQ7GY9hNwk/v-deo.html
I've been playing around with this trying to see how you'll respond if I made horrible comments about your content - managed to get one slightly angry response 😁. But on a serious note, I love the work and a big fan of the channel now!
I have not but it should be as easy as replacing "mistralai/Mistral-7B-Instruct-v0.2-4bit" with "mlx-community/Meta-Llama-3.1-8B-Instruct-4bit" in the example code.
Love the video thank you for these concise tutorials! On initial inference before moving onto Fine-Tuning I can't get the generation step to produce any tokens.
There are some rumors going around that 16GB should now be the standard memory configuration offered on the new Mac Mini. Any chance that when the M4 Mac Mini launches you can do a video on that as well?
Great suggestion! Haven’t heard that rumor but makes sense. I might be switching to a single (beefy) MacBook Pro, could do a break down of I use it for ML projects if there’s interest :)
took me a moment to find this: parser.add_argument( "--data", type=str, default="data/", help="Directory with {train, valid, test}.jsonl files", ) worth mentioning that data file are picked up from /data by default
thanks for the great video .. based on your varied experience, can you make a separate video on data-preparation techniques/methods for fine tuning related task on open source models . hope to get a response from Shaw-human than Shaw-GPT..(just kidding)..😅
Any advice or guidance on how I could deploy this model so that I can use it as a Telegram bot? I've been able to plug it into Telegram's API and I'm able to get the bot up and running (locally on my mac), and well, I don't wanna keep my Mac alive just to run the bot! Cheers, thanks for the video!
Good question! Two options come to mind. 1) buy a Mac to serve your app or 2) rent an M-series Mac from cloud provider e.g. www.scaleway.com/en/hello-m1/
Good question. With the M-series chips there's no CPU vs GPU memory. The important thing here is using MLX allows you to make full use of your 96GB when training models!
Sounds like an interesting use case! This is definitely possible. Potential challenges I see are: 1) handling that much video data and 2) figuring out how to pass that data into the model (e.g. you could use a multi-modal model or find a effective way to translate it into text)
Really excited to finally get this working! I know many people had asked for it. What should I cover next?
I was able to test your Jupyter lab notebook and it generates the `adapter.npz` file and everything works! But how do I create a new model that has the `adapters.npz` embedded inside of it? I am running an Ollama server; how would we load it with this newly fine-tuned model, because we're using proprietary data so everything has to remain local to my machine and can't be uploaded to the Internet.
Please, talk about the MLOps life cycle and how to implement it.
@@rdegraci Great question! The original mlx-example repo shows how to do this: github.com/ml-explore/mlx-examples/tree/main/lora#fuse-and-upload
@@AlissonSantos-qw6db Great suggestion. While I may not be the best source for MLOps, I can definitely include more details around implementation of specific use cases.
Wow, that was incredibly precise and helpful! Thank you, and keep up the fantastic work with your videos!
Thanks I have been using Unsloth remotely for fine tuning. Once the cloud bills start coming in, I am hoping to convince my boss that a macbook pro can be an option. My MLX are still just open tabs, glad to see someone doing it as well.
Didn't know you could do this on Mac! Amazing, thank you!
🔥🔥
An easy video w/ great explanation to watch 👍🏽
Glad you liked it!
I binge watched your videos - high quality great content. Thank you so much, please keep it up!
Thanks for watching :)
great tutorial, thanks. One question, I didn’t understand where is the fine tuned model on my Mac and is it possible to run the model in Ollama?
A folder should be created after training with the base model. Additionally, an adapters.npz file should appear which contains the adapters learned from LoRA.
For running MLX models with Ollama, this video seems helpful: ua-cam.com/video/3UQ7GY9hNwk/v-deo.html
Thanks, great content! I really like the calm way you explain it all 👌
I've been playing around with this trying to see how you'll respond if I made horrible comments about your content - managed to get one slightly angry response 😁. But on a serious note, I love the work and a big fan of the channel now!
LOL I wonder what that entailed
I was waiting for this video. Thank you so much.
Nice telecaster !
Thank you, you are awesome
Really cool and helpful. Thank you very much. Have you perform fine-tuning in llama3.1 models successfully with this method?
I have not but it should be as easy as replacing "mistralai/Mistral-7B-Instruct-v0.2-4bit" with "mlx-community/Meta-Llama-3.1-8B-Instruct-4bit" in the example code.
Amazing video. Thanks for sharing such valuable content.
Thanks Abid! I've been waiting 7 months for another video from you 😜
@@ShawhinTalebi coming soon 😄
Yes YES YES
Happy to help :)
Love the video thank you for these concise tutorials!
On initial inference before moving onto Fine-Tuning I can't get the generation step to produce any tokens.
Glad you like them :)
Not sure what could be going wrong. Were you able to successfully install mlx_lm?
@@ShawhinTalebi I appreciate you responding, I was able to figure it out! Thank you again for the video.
There are some rumors going around that 16GB should now be the standard memory configuration offered on the new Mac Mini. Any chance that when the M4 Mac Mini launches you can do a video on that as well?
Great suggestion! Haven’t heard that rumor but makes sense. I might be switching to a single (beefy) MacBook Pro, could do a break down of I use it for ML projects if there’s interest :)
took me a moment to find this:
parser.add_argument(
"--data",
type=str,
default="data/",
help="Directory with {train, valid, test}.jsonl files",
)
worth mentioning that data file are picked up from /data by default
Thanks for calling this out!
thanks for the great video .. based on your varied experience, can you make a separate video on data-preparation techniques/methods for fine tuning related task on open source models . hope to get a response from Shaw-human than Shaw-GPT..(just kidding)..😅
Great suggestion! There can be a lot of art in data prep (especially in this context). Added it to the list.
What can i expect to achieve on M3 pro 64gb ?
You could likely run full fine-tuning on some smaller models (
Any advice or guidance on how I could deploy this model so that I can use it as a Telegram bot? I've been able to plug it into Telegram's API and I'm able to get the bot up and running (locally on my mac), and well, I don't wanna keep my Mac alive just to run the bot! Cheers, thanks for the video!
Good question! Two options come to mind. 1) buy a Mac to serve your app or 2) rent an M-series Mac from cloud provider e.g. www.scaleway.com/en/hello-m1/
Can i do it on mac m1 8gb ram
Might be worth a shot. You can try reducing batch size to 1 or 2 if you run into memory issues.
it worked?
what if you have Apple M2 Max with 96gb memory? does that mean technically there is a 96gb memory GPU?
Good question. With the M-series chips there's no CPU vs GPU memory. The important thing here is using MLX allows you to make full use of your 96GB when training models!
I ll give it a try
How about fine tuning with an Intel processor on a Mac?
MLX won't help, but if you have a graphics card there may be tools out there that can. I just haven't done that before.
Can I capture video and audio all day, with a camera in my shoulder, and finetune a model with the data every night?
Sounds like an interesting use case! This is definitely possible. Potential challenges I see are: 1) handling that much video data and 2) figuring out how to pass that data into the model (e.g. you could use a multi-modal model or find a effective way to translate it into text)
@@ShawhinTalebi some steps in between to filter the input for usability could be handy. Maybe SAM?
@@daan3298 Without knowing any details, I can imagine that being helpful. Segment with SAM then object detection with another model.
Will this run on 8 gb memory?
It might be worth a try. You can also reduce the batch size if you run into memory issues.
@@ShawhinTalebi I am running it now. 20 epochs have run successfully so far
@@saanvibehele8185 Awesome!
Has anyone tried this on an 3.8GHz 8-core intel Core i7 chip?
MLX is specifically made for M-series chips. This example won't work with an i7.
8GB RAM RIP :(
LOL this still might be worth a try! If you run into memory issues you can reduce the batch size to 1 or 2. Curious to hear how it goes :)
@@ShawhinTalebi risking my only device to wrack up is totally worth it
@@clapclapapp lol