Fine Tune a model with MLX for Ollama

Matt Williams

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 13 січ 2025

КОМЕНТАРІ • 152

@GeertTheys 4 місяці тому ⁺⁶¹
I am doing tech for 20 years You Sir are an excellent teacher. Pointing to the documentation and providing us pointers to nice tools to stitch it all together in comprehensive way.
@umutcelenli2219 4 місяці тому ⁺³⁶
I love the way Matt explains things in a way that is both detailed and yet really easy to understand. Thank you man.
@technovangelist 4 місяці тому
Thanks so much
@vaitesh 4 місяці тому
Totally, if there's something I don't understand at that moment just rewinding it to couple of mins and then it makes more sense.. one of creators who isn't just walking through the code and saying what the command does in the notebooks Matt is really someone who knows how to empower the other person
@tal7atal7a66 4 місяці тому
yes he is the best beast of explanation ❤ 💪 🥇 🔥
@tsomerville1970 4 місяці тому ⁺⁶
Matt, your energy is so calm. I did fine tune with MLX, but I freaked myself out with all the steps and feel like it’s hard to do again.
When you explain it so nicely, my fear goes away and I’m ready again.
You’re spot on about the data prep is the “dark arts”. So true!!
@8eck 4 місяці тому ⁺⁶
I'm glad that ollama went so far ahead and how creating a standards for open-source LLMs, like dockerfile-like specification files and so on.
@fabriai 4 місяці тому ⁺¹
Thanks a lot for this tutorial Matt. It is by far the most straightforward fine tuning tutorial I have ever seen.
@itlackey1920 3 місяці тому ⁺²
I have heard many folks talk about doing a fine tuning to make a model work better with the aider tool. It seems like everyone is struggling to figure out what the dataset should be. It would be fantastic to get your thoughts on it!
Thanks for all the great content!
@wilfredomartel7781 2 дні тому
Thanks for the explanation Matt.
@cwvhogue 4 місяці тому ⁺¹¹
Thanks great breakdown of the process!
A note about JSONL not being an array. It can be processed by old school unix tools like awk, grep, sed - and used in streaming data with unix pipes where lines are the delimiters. These tools don't do well with json array syntax on large datasets.
@Joooooooooooosh 3 місяці тому ⁺¹
This is exactly correct. A JSON array, even if by convention has an object on each line, is not "valid" JSON unless the entire array, including the closing bracket, is pulled into memory. JSONL ensures that each line is its own mini valid JSON document.
@counterfeit25 4 місяці тому ⁺¹
Love it, thanks for sharing. It's great to see LLM fine-tuning become increasingly accessible to more people.
@davidteren8717 4 місяці тому ⁺⁵
Nicely done! It's worth noting that what Matt demonstrated is "Fine-Tuning with LoRA" and not Actual Fine Tuning. Low rank adaptation (LoRA) makes customising a model more accessible than actual fine tuning of a model by "freezing" the original weights and training a small subset of parameters.
Actual Fine-Tuning: Adjusts all parameters, requires significant resources, but yields high-quality results.
Low rank adaptation (LoRA): Trains fewer parameters using low-rank matrices, reducing memory and compute needs while maintaining quality.
@technovangelist 4 місяці тому ⁺³
Sure. But when most folks talk about fine tuning it’s lora.
@Joooooooooooosh 3 місяці тому
Thanks for pointing out that fine tuning is about teaching new behaviors, not injecting new data. So many misunderstand that.
@sureshchaudhari4465 12 днів тому
Bro u r awesome i have been trying to do this for 6 months but did not know how to get started
@sakchais 19 днів тому
Subscribed! This is exactly what I’ve been looking for.
@blackswann9555 3 місяці тому
I like your delivery stating it’s easy. I am going to try fine tuning and training with my new data
@hugogreg-hf8zl 4 місяці тому ⁺⁹
Sorry if unrelated, am I the only one who thinks that Matt’s voice has that soothing-gentle-teacher like voice? Like I can hear him narrate for a natgeo documentary
@Bitjet 4 місяці тому
Facts
@interspacer4277 4 місяці тому
This is a great vid! Especially if you're at least a hobbyist.
The best complete layperson onramp I've seen for fine-tuning? Is Cohere. And it is free. After that, dip into more of the dark arts. But to whet folks' whistle I usually point them there. It's their whole biz model. It quickly gets old, but you can whip up a trained, finetuned bot in a half day depending on dataset.
@ichigo_husky 3 місяці тому ⁺¹
Goat in teaching fine tuning
@JunYamog 4 місяці тому
Thanks I tried the mlx fine tune a few months ago. I think this mlx-lm might be more straightforward.
@1Ec-cb3cg 4 місяці тому
I was totally agreed with you sir, u are the most easier way to let me learn about mlx in the past 2 month I’m keep finding UA-cam for all the information. Thank you so much for the video.
@mbottambotta 4 місяці тому
Thanks Matt, your explanations are effective and entertaining.
If you could in a future video , would you dive into more detail about fine-tuning? E.g., why would you want to, how to choose your data, etc. Thank you!
@ts757arse 4 місяці тому
Matt, this is utterly awesome and I can't thank you enough. I'd seen the compute resources people were using and the code and gone "that's just too time and money intensive to investigate further".
Now, I just need the script from terminator, a code interpreter and, oooh, 5 minutes?
Don't worry, I'll keep control of it...
@talktotask-ub5fh 3 місяці тому
I live this kinda explanation full of details and step by step.
Thanks, for sharing!
@y.m.o6171 4 місяці тому ⁺²
i so wish you could explain what loras are an how to do one. thank you for this amazing video i already fee much better
@pauledam2174 3 місяці тому ⁺²
He said that fine-tuning is only for dealing with how the model responds but it's also for increasing domain expertise as far as I know
@posiczko 3 місяці тому
Hi Matt!
Excellent series! Love your no-hype/nonsense approach to education!
JFYI, in order to run mlx-lm against llama or mistral models on HF, you must first agree to the terms published on the HF in the repo of interest. Otherwise you mlx-lm.lora command will exit with
`mlx_lm.utils.ModelNotFoundError: Model not found for path or HF repo: mistralai/Mistral-7B-Instruct-v0.3.`
Cheers!
@technovangelist 3 місяці тому
Better still to download it first
@PenicheJose1 4 місяці тому
I need to say thank you I appreciate everything you're teaching us, you make things extremely easy to understand... Thank you.❤
@mojitoism 3 місяці тому ⁺²
Thanks for the great video! Could you use an llm to generate question answer pairs for the dataset out of basic text or documents? Would be interested in such a video!
@janwillemaltink2216 2 місяці тому
super clear and helpfull instruction, thanks so much! I think the jsonl format has to do with training with super large datasets, making it possible to handle them row by row?
@fernandogonzalezhenr 2 місяці тому
This is amazing content. Thank you!
@JohnnyOshika Місяць тому ⁺²
I used this technique to fine-tune Mistral 7B on my MacBook Pro M2 16" with 32GB of RAM. My training set was 96 with each being ~3000 to 5000 tokens. The first time my whole laptop crashed after about 30 hours. The second time it completed after 60 hours. A training set of 96 is clearly not enough as the fined-tuned model behaves very poorly as it does a very poor job of following the training set examples when structuring unstructured data into JSON. I'm now fine-tuning qwen2.5 0.5B with a training set of > 500 and it's going much quicker. It should complete within an hour or so. I dropped the batch-size to 1 as it crashed with a batch-size of 2.
@mithun-ytcom 17 днів тому
3:16 It's because JSONL makes it easier to handle JSON stream from the server. The client cannot take advantage of the streaming if the server streams array of objects as it will only become valid once the streaming is completed. With JSONL, the server streams valid JSON line by line, and the client starts handling the stream line by line without having to wait for the server to finish streaming.
@solyarisoftware 4 місяці тому
Thanks, Matt-super spot-on video as usual. You raised a doubt in my mind: You mentioned that fine-tuning is not suitable for adding new information to the original LLM (perhaps I misunderstood). This leaves me a bit perplexed, and I know it’s a debated issue within the community. I agree with you that the best use of fine-tuning is to personalize the style and tone, rather than being used in the "traditional" way to train older (pre-GPT) models like BERT. However, many people argue that fine-tuning could be an alternative to RAG for injecting specific domain knowledge into the LLM. Personally, I’ve never tried fine-tuning a model due to the costs, especially with cloud-based LLMs. In any case, I think it would be valuable to explore these topics further.
My hope is that fine-tuning could become a native feature in Ollama in the future.
Lastly, it would have been useful to see the fine-tuning JSONL data (at least an example). I have my own answer to your question: why JSONL? It might be because of its line-by-line simplicity in Unix pipe scripting.
@technovangelist 4 місяці тому
What I read is that you can add knowledge but apparently it makes it slower.
@solyarisoftware 4 місяці тому ⁺¹
@@technovangelistBy "slower," do you mean that the fine-tuned model has increased latency during inference compared to the original model? That's interesting-I’ve never heard about that before.
@fotisj321 2 місяці тому
@@solyarisoftware I think Matt has been finetuning an instruct model. Afaik instruct finetuning is usually done after training the model on the next word prediction task which is the step where the general knowledge is injected into the weights. The next step, the instruct finetuning is supposed to make the model better at following instructions and producing responses aligned with user intent.
@technovangelist 2 місяці тому ⁺¹
It’s generally well understood that fine is not well suited for adding new knowledge
@solyarisoftware 2 місяці тому
@@technovangelist I agree :)
@JatinKashyap-Innovision Місяць тому
Video for Unsloth please. Thanks for the content.
@victorpalacios6752 Місяць тому
Hi Matt. Great content, thank you! You mention having 64GB of RAM. Most consumer macs have 8GB only. Have you tried fine-tuning with smaller RAM macs? I wonder if the process is longer or simply impossible.
@technovangelist Місяць тому
I would say most are at least 16 to 32. You can't even buy one with 8 anymore.
@thetrueanimefreak6679 4 місяці тому ⁺⁹
amazing video matt thank you
@technovangelist 4 місяці тому
Glad you enjoyed it!
@AndysTV 2 місяці тому
Awesome video! Nice glasses! what's the camera setup you're currently using?
@technovangelist 2 місяці тому
Thanks. you can find out about my entire setup with this video. It's been collected over years of doing this. ua-cam.com/video/LQe3DFjMYrE/v-deo.html
@remysanchez6579 Місяць тому
The point of JSONL is that if you decide to you can encode anything in JSON without ever having an actual line break. Meaning if you put all those JSON all after the other you can make a really easy parser to split all objects just based on line breaks. Which allows to iterate over the file without reading it whole in order to get each individual object. That's an easy way to save lots of RAM, basically. The other way being an iterative JSON parser but that's a lot more complex and a lot less performant.
@woolfel 2 місяці тому
the reason it isn't comma separated is to make it easy to distribute the training. This is common in Hadoop, Spark and other distributed frameworks. If it's comma separated and zipped, it ends up being harder to distribute the work across a large cluster. In hadoop and other distributed systems, it just splits the lines to the number of worker nodes.
@VictorCarvalhoTavernari 4 місяці тому
Amazing content, I will test it soon 🙏thanks!
@bigbena23 4 місяці тому
Thanks a lot for your fantastic videos. I'm actually using Unsloth to fine tune Llama3 for a text classification task. I'll be happy if you'll upload a video for such purposes
@golden--hand 4 місяці тому
I am interested in the idea of fine tuning, and I am starting to regularly come to your videos for stuff now that Ollama is my primary tool I am using to connect to other front end for serving my models. But jeez, i feel like an idiot sometimes with some of this stuff because this still feels complex to me. "Step 1" of curating the data honestly feels like the easy part to me.
I am curious about unsloth as its one I have looked at before but had decided to circle back to when I finally worked my way up to fine tuning. I am also curious about vision models, Llava or otherwise, I would be really curious to see how curating data for that would differ from an LLM.
Also, would be nice in future videos related to this so see a before and after test. I know we can assumed what you are suggesting is making an effect, but it would still be nice to see the results in action :)
@drhilm 4 місяці тому
love your explanations. thank you !
@stephenreaves3205 4 місяці тому
Would love to see you try out InstructLab
@gazzalifahim 2 місяці тому
Hey Matt, did you record any video on Unsloth? Would love to see it 😀
@8eck 4 місяці тому ⁺¹
jsonl is used to read line by line, it is easier for python, because it is reading line by line as far as i know. I.e. 1 iteration === 1 json from your dataset. Plus datasets are huge and reading whole json and parse it all in one go will take a decade and probably will crash your runtime.
@technovangelist 4 місяці тому ⁺¹
Ahhh, ok. So it’s accommodating for the weaknesses of Python.
@TheUserblade 4 місяці тому ⁺¹
@@technovangelist in fairness, it also allows you to do things without needing to parse the whole file - like cat something.jsonl | sort > sorted.jsonl or cat something.jsonl | head -n 10 > 10somethings.jsonl
In this case, I imagine it’s convenient for shuffling the entries, but the main generic advantage over a big json list is that you don’t need to read the entire file to begin parsing it (which is a really nice language-agnostic property for files that might become extremely large)
They’re definitely weird at first blush, but actually kinda clever and elegant for some use-cases IMO
@EhabMosilhy-m3j 4 місяці тому
Great video, thanks!
I only wonder what's the way to add new data to the model if finetuning is more about changing the format.
My use case is like this: I use a framework which changes syntax with each new version I want the LLM to be updated with the newest documentation for the last version.
How can I do that?
@nuttiplutt 4 місяці тому
If you could make a guide of installing Unsloth on Windows and train Llama 3.1 on Ollama to use my tone of voice to reply to emails AND have knowledge to answer the common questions I get, that would be a godsent! Thank you for the great videos!
@London-Outdoors 4 місяці тому
Great video! 👍 Thanks
@JustinBowen-p9p 20 днів тому
It would be good to find out how to teach Ollama about new format and facts. Looking to create an open source version of the NetCfg GPT LLM for Ollama
@ISK_VAGR 4 місяці тому
Nice. I just did not get when to use the different test and validate files in the process.
@JuanOlCr 4 місяці тому
Thank you Matt for again a great helpful video. It would be great to see samples of the test.jsonl, valid.jsonl and test.jsonl files. Or a template of them. Thanks
@morningraaga1424 4 місяці тому
I like your presentation ...Regarding the fine tuning heard unsloth I have seen many of the AI experts uses it. What is your thought on the same lines?
@technovangelist 4 місяці тому
I didn’t use it because of the limited hw support. But I will next time
@ScholasticusObscura 2 місяці тому
I don't want to nessesarily train it on "how to respond" but I want to expand it's knowledge base of Python code so it knows "what" to respond with. For instance, I'd like it to have better knowledge of some of the more advanced concepts of Python. So do I fine tune to accomplish this or what is your recommendation. I'm just starting out learning and all I've been able to create is a mock jpeg file corruption recovery tool that doesn't actually work because jpeg is a lossy format to begin with. But it looks like it works. Lol
@Zatchurz 4 місяці тому
Clarity and digestability 100%
@noame 2 місяці тому
Very amazing, I moved from LLMs are not for me to, LLMs are cool to configure. I want to help some clients automating locally email classification and response, I think it's within reach. Can you please help with more tutorials on n8n combined to fine tuned local LLM ?
@azoz158 2 місяці тому
Can you do one with unsloth or other free library? thanks
@technovangelist 2 місяці тому
Well mlx is a free library but I would like to do some others
@marcusk7855 3 місяці тому
Great tutorial. Can you do a non-mac version of this? I see things like qlora but I have no idea even where to start.
@technovangelist 3 місяці тому
I plan to. Unfortunately most of the non Mac tools kinda suck.
@ilanelhayani 4 місяці тому
thank you Matt, you are amazing. As I know, mlx is for apple silicone, what about finetune on nvidia rtx card? which library should we use ? can you make a video for this please ?
@technovangelist 4 місяці тому
Yup. I mentioned that I did this first for apple silicon. And I intend to do the same thing for unsling and maybe axolotl which are windows and Linux based
@myronkoch 4 місяці тому
dark arts, lol. Love your vids, man.
@gaborfeher741 25 днів тому
Need a program in which the user can load the model he wants to fine tune, and then give the model the example words in a text window. I hope one day someone releases such a fine tuning software.
@mbarsot 4 місяці тому
Very useful, however
1) is there anything we can do with 16 gig on m1?
2) can you maybe show how to do it? Step-by-step: it is a little hard to understand the MLX part thanks
@s.patrickmarino7289 3 місяці тому
Can a model be fine tuned to improve the way a model uses tools? Can fine tuning be used for chain of thought? One example would be to take a number of prompts, then a good chain of thought to solve that type of problem?
@gazorbpazorbian 4 місяці тому
so if finetuning is how you make the model respond in a better style, how do you teach it more stuff? which are the best ways to make the AI learn more aside from RAG
@imai_official 19 днів тому
What if I fine-tune it on other languages that LLaMA still does not generate? Will it work?
@ambroisemarche5128 4 місяці тому
hi, why do i need a validation dataset and a test dataset? can i create them but let them empty? because i don’t understand anyway how validation and test would work for a llm
@scaptor_com 4 місяці тому
Thanks you for this
@technovangelist 3 місяці тому
Thanks for the comment
@jackflash6377 2 місяці тому
What if you want to give the model more information, information specific to your project?
Say I took all the technical data sheets and all the forum posts I could find concerning an ATmel MCU. Could I fine tune a model using this data?
@technovangelist 2 місяці тому ⁺¹
Your best bet is some variation of rag
@jackflash6377 2 місяці тому
@@technovangelist How do you use RAG with the commercial LLMs ?
@technovangelist 2 місяці тому ⁺¹
I have a few videos about building rag systems
@i2c_jason 4 місяці тому
Could you do the same thing as fine tuning by creating a RAG database of examples, and just use the off-the-shelf LLMs? This might make your application LLM-agnostic and futureproof. Thoughts on pros/cons?
@technovangelist 4 місяці тому ⁺¹
Fine tuning and rag have different purposes. Rag adds new knowledge whereas fine tuning will mostly affect the way it outputs.
@TheLokiGT 4 місяці тому
@@technovangelist Mmhh yes and no. Full-parameters finetuning is OK for adding new knowledge in a more systematic way (after all, it's just continued pretraining..).
@hasanaqeelabd-alabbas3180 Місяць тому
Is this applicable for windows ?
@technovangelist Місяць тому ⁺¹
This one uses MLX which is an Apple framework. There are others for other platforms.
@hasanaqeelabd-alabbas3180 Місяць тому
@ thank you , so i m going to search for windows tutorials.
@user-fc9qy4wq6s 4 місяці тому
I forgot to break up my data into the three sections (validate, train, test) will the model work the same?
@ibrahimhalouane8130 4 місяці тому
Does unsloth worth that hype?
@MrOsodog 2 місяці тому
I’m curious as to why the fine tuned can’t be for new knowledge?
@user-fc9qy4wq6s 4 місяці тому ⁺¹
ok so i created the three sections, but now get a data formatting error heres a sample of some data of mine: {"prompt": "info.", "response": "info"} what should be different here?
@technovangelist 4 місяці тому
Use the format I used in the video. The. I show it in the next one. Just a text key.
@slickheisenberg8208 Місяць тому
It would’ve been useful if you would’ve explained what an adapter actually is and how it works.
@utvikler-no 4 місяці тому
Thanks for the awesome video! Would you know anything about using ubuntu/intel arc with ollama. If so would you consider a guide one beautiful day :)
@technovangelist 4 місяці тому ⁺¹
I haven’t played with any of the arc cards yet. I need to find a way to play with those
@bebetter7388 3 місяці тому
I would love to learn to fine tune Anthropic's Claude sonnet 3.5 on Jan ai... i'm finding it challenging
@technovangelist 3 місяці тому
that’s probably a topic that wouldn't happen here. Most of this channel focuses on local ai solutions due to the security and privacy risks of most online models.
@sleepybooksASMR 25 днів тому
im on an m3 macbook air 16 gigs of ram and im getting 3-5 tokens per second with the same 7b mistral model... is this normal? it feels reallllly slow..
@QorQar 4 місяці тому
Example for datasets?
@IAMTHEMUSK 3 місяці тому
I tried to fine tune llama3.1 on windows since I need nvidia gpu. Such a nightmare. I still didn’t figure out why my llm is not able to speak anymore, it just reply’s data that was in my dataset
@MaxJM74 4 місяці тому
parece até fácil olhando !
tks
@jjolla6391 Місяць тому
if fine-tuning is only to teach it "style" of questions and not for new data .. then how do we bake in new data to an llm inbuilt knowledge?
@technovangelist Місяць тому
That’s what rag is for
@QuizmasterLaw 14 днів тому
line delimited because most ppl aren't coders and its about vacuuming up as much data as possible for training.
@peterdecrem5872 4 місяці тому
Still not sure what the data file looks like for the framework? Is it a dataset? the below does not seem to work:
"text": "This is the first piece of text."}
{"text": "Here is another piece of text."}
{"text": "More text data for fine-tuning."}
@technovangelist 4 місяці тому
Yup that’s what I showed in the video. Well, except you missed the first bracket
@peterdecrem5872 4 місяці тому
@@technovangelist Agreed. The thing i learned is that data is the directory where you put train.jsonl test.jsonl and valid.jsonl with the format you describe. Thank you!
@Namhskal_Nivan_2062 4 місяці тому ⁺¹
sir i can't understand the way dataset jsonl file should be can u pls give me 1 block of the dataset jsonl file as example. i can't understand how to make 'em pls help me else someone out here pls help me too
@technovangelist 4 місяці тому ⁺¹
Take a look at the second video on this
@Namhskal_Nivan_2062 4 місяці тому ⁺¹
@@technovangelist which one sir "optimize your AI models" else which one sir can u pls say sir 🙇🛐
@technovangelist 4 місяці тому ⁺¹
The other one with fine tuning in the name
@Namhskal_Nivan_2062 4 місяці тому
@@technovangelist ok sir thanks a lot
@valeriomariani1704 2 місяці тому
Error: json: cannot unmarshal array into Go struct field .model.merges of type string
@technovangelist 2 місяці тому
you need to provide more info. Where did you get this? what version of ollama? what platform? how was it installed. Best to do all this on the discord.
@startingoverpodcast 4 місяці тому ⁺¹
I need to understand how json works
@8eck 4 місяці тому
Running fine-tuning is easy, but getting LLM to do what you are fine-tuning for may be not so easy and at times even very hard.
@60pluscrazy Місяць тому
🎉
@UnwalledGarden 4 місяці тому
Thanks! I can’t tell you how much I dislike Jupyter notebooks.
@8eck 4 місяці тому
mmmistral 😁
@MT-ny7ir 4 місяці тому
Finetune with crosswords so the llm know how many characters in his response
@flat-line 4 місяці тому
If you only can change the style of the answer why bother with fine tuning, I don’t need the answers to be like a pirate , why would you need this for creating an enterprise level application? Is rag the way to go for this ?
@technovangelist 4 місяці тому
Tweaking the style is a very important aspect for most enterprises. Some need it to respond as Sql every time or json or functions. Those don’t need new knowledge but rather told how the model should respond.
@8eck 4 місяці тому ⁺¹
I also hate jupyter notebooks... Agree that it is the worst for teaching... I always convert it to python file in the end and getting rid of all useless stuff...
@TheLokiGT 4 місяці тому
Matt, I had written down a long comment, but UA-cam deletes anything that has links to platform it doesn't like, probably. If you have time and will, please read my replies to your twitter thread related to this video, thanks.
@codecaine Місяць тому
I agree. I hate python notebooks.
@joeeeee8738 4 місяці тому
Mmmmmmmmistral hahaha 😂👏
@helloansuman 3 місяці тому
It will be good if you code rather than showing snippets
@AlexCasimirF 4 місяці тому
Python Notebooks have to be the worst format for teaching - Amen to that!
@GeorgeGaddis-k9j 3 місяці тому
Von View
@mal-avcisi9783 3 місяці тому
hey, cool channel, but this is too complicated. i want a 1-click solution. i want to do 1 click, and the ai should exactly learn how my texting style is. i want to use it to prank whatsapp friends.
@technovangelist 3 місяці тому
There are options for that. The one I have seen costs about 200 to 300 usd per fine tune run. Or you can spend 10 minutes doing it this was for free. Anyone can do this.
@mal-avcisi9783 3 місяці тому
@@technovangelist 10 minutes is too long, i will pay the 300 dollars
@theralfinator 3 місяці тому
@@mal-avcisi9783 😂

Наступне

Автоматичне відтворення