Yes, nice approach indeed, doing the reverse. I am also curious when they apply it to the bigger models or that vicunia or dolly2 are trained with these instruction datasets. Thanks for sharing new publications about the progress of llm.👍
I wonder how well one of these would perform if they were heavily specialized into a specific area of knowledge. Like, if you had a general advanced LLM, could you chain it to these domain specific LLMs to have them give specific facts that are then formatted appropriately for answers by the primary LLM? Regardless, interesting paper, and interesting video!
Also thinking this is a really interesting direction for if you want a super specialized model for a tight domain, or as you say in some kind of cascading system. They are so small you can fine tune them really quick too.
That is also the idea behind LoRA adapters - you can train those specifically as a "delta" against some general base model, and then tack the adapter on the base model to specialize it. Nothing keeps you from swapping out these adapters at runtime based on some context information (e.g. some other model running along with your chatbot and detecting topic changes in conversation). I haven't seen it done yet, though.
I have looked at the exactly this with LoRA and in our tests it took too long to be practical in production for one the fly changes. It does save a lot on space thought that you have have 5 LoRAs and just one main model. I would love to see a simple way to do it on the fly.
@@samwitteveenai Cool, thanks for sharing. How long did it took exactly? I mean the LoRA weights for Alpaca are just ~60 MB, how long can it take to load/apply?
Or perhaps, what would happen if we distilled only human reasoning capabilities, without any specific knowledge? With a large context window, this model could then operate efficiently on vast amounts of data.
Sam, thanks for all these videos! Question: would any of these models do well as a substitute for chatGPT in the langchain understanding of which task to use? Have you done any comparisons yet? It would great to have something small and free to use as a reasoning engine there. Thanks!
not the LaMini models, the big Vicuna/Kolas/Open Assistants are starting to get close but its just a matter of time before we have a decent Open Source alternative to ChatGPT. I have another video I am about to drop for that, so you can test them yourself.
Thank you for keeping us up to date with recent developments and providing the colabs. I just wondering if some theoretical principals of LLMs are not more valid. I thought: Max Entropy = log2(N) Where N is the number of possible states that the language model can be in. In the case of a language model, each state corresponds to a specific sequence of words. The number of possible states is determined by the number of parameters in the model. 🤔
Thanks you updating us daily on LLMs. One simple problem … i want to finetune any opensource model on domain data on QnA task . Can you assist which model can be used as a base model and then how can i finetune it!
Hey Sam, just a heads up, I think you might have a too aggressive threshold on your noise gate. Something seems to be really clamping down the volume at the end of your sentences, which distracts from otherwise really great content :)
thanks. yes I found out this was record with the wrong mic only after it was recorded so had a lot of noise reduction to try and get it to be useable. I appreciate you reaching out though.
hi sam , amazing video, this one too! thanks. do you know how I can run on my macbook the small LaMini (donwload models and what other files...) If I try I got an error (more likely 8bit with no GPU?)
Hi Sam, these models are quite cool, especially because the smaller ones (when quantized) would run on a lot of hardware at decent speeds. Have you considered doing a tutorial on how to convert these hugging face models to run locally?
Thats an interesting idea but the models are a bit hit and miss. Let me try it in a notebook and if it works I will turn it into a video. They are certainly small enough
@@samwitteveenai done this myself today. Ran the LaMini-Flan-T5-783M through the converter that comes with CTranslate2 and quantized to 8 bit integers, got some good results especially given the speed. This is similar to the technique used by Faster Whisper (also built on CTranslate2 for transformers).
The only issues with these kinds of models is their data sets are very basic. Honestly, we need to harvest team communication data and their project files. We should hire people to do tasks and record their process entirely. Like pay skilled and highly trained individuals to do research and other tasks with a data collection package and then use an LLM to distill their data into a timeline and then summarize it into discrete sets. Then we will have a number of recorded workflows that achieve the desired results. we could basically make humans obsolete one domain at a time by slowly analyzing the full thought process and workflow. People keep asking AI to do “magic” and infer what you want from basically no input and no domain specific knowledge. We are acting like the worlds worst boss on day one. We need more data, specifically, the algorithms our best and brightest use to accomplish tasks. What questions they ask of themselves, how they structure their ideas, how they test and iterate and how they determine their task is complete. With enough domain specific task completion and team work data, AI agents would be like studios or corporations in a box, each with their own workflow that achieves different and unique solutions to problems and tasks.
@@samwitteveenai I think the price will only be paid when people are sure it can be done. Also “correctness” will not be needed. As if nobody in any corporation has completed a collaborative project before. It would be an issue if you only brought on broke kids or young people - but the target group are corporate slaves looking to retire on royalties for harvesting their workflow data. Transparency and honesty is key. They must be aware that their actions are being used as an example for instruction. Even the person who “calls it” and “reigns in the scope” is part of the data.
I think we should have lots of small models with specialise tasks, and select one of them based on given prompt, and i am sure next week i will see one paper about it too 🤦♂️
Hey, thanks for the video! These colabs work well with the regular accounts!
oh yes I totaly forgot to mention these notebooks all work on the free Colab.
Man, thanks a lot for covering all of this!
Yes, nice approach indeed, doing the reverse. I am also curious when they apply it to the bigger models or that vicunia or dolly2 are trained with these instruction datasets. Thanks for sharing new publications about the progress of llm.👍
I wonder how well one of these would perform if they were heavily specialized into a specific area of knowledge.
Like, if you had a general advanced LLM, could you chain it to these domain specific LLMs to have them give specific facts that are then formatted appropriately for answers by the primary LLM?
Regardless, interesting paper, and interesting video!
Also thinking this is a really interesting direction for if you want a super specialized model for a tight domain, or as you say in some kind of cascading system. They are so small you can fine tune them really quick too.
That is also the idea behind LoRA adapters - you can train those specifically as a "delta" against some general base model, and then tack the adapter on the base model to specialize it. Nothing keeps you from swapping out these adapters at runtime based on some context information (e.g. some other model running along with your chatbot and detecting topic changes in conversation). I haven't seen it done yet, though.
I have looked at the exactly this with LoRA and in our tests it took too long to be practical in production for one the fly changes. It does save a lot on space thought that you have have 5 LoRAs and just one main model. I would love to see a simple way to do it on the fly.
@@samwitteveenai Cool, thanks for sharing. How long did it took exactly? I mean the LoRA weights for Alpaca are just ~60 MB, how long can it take to load/apply?
Or perhaps, what would happen if we distilled only human reasoning capabilities, without any specific knowledge? With a large context window, this model could then operate efficiently on vast amounts of data.
Super excited to see your thoughts on the MPT-7B family of models. They have one with a context window of 65k tokens!
Sam, thanks for all these videos! Question: would any of these models do well as a substitute for chatGPT in the langchain understanding of which task to use? Have you done any comparisons yet? It would great to have something small and free to use as a reasoning engine there. Thanks!
not the LaMini models, the big Vicuna/Kolas/Open Assistants are starting to get close but its just a matter of time before we have a decent Open Source alternative to ChatGPT. I have another video I am about to drop for that, so you can test them yourself.
Thank you for keeping us up to date with recent developments and providing the colabs. I just wondering if some theoretical principals of LLMs are not more valid. I thought: Max Entropy = log2(N)
Where N is the number of possible states that the language model can be in. In the case of a language model, each state corresponds to a specific sequence of words. The number of possible states is determined by the number of parameters in the model. 🤔
Amazing video, thank you so much!
Hey Sam, Great video as usual. How is the performance of these "smaller" models with zero-shot classification in downstream tasks?
I don't think its great but certainly more interesting that the base models for these etc.
Thank you for the video, Sam. Do you think one could train one of these models from scratch in colab?
they are fine tuning these models which you could do on colab yes.
@@samwitteveenai great, thank you
Excellent video as always.
May I suggest more Langchain content. Specifically using agents to implement complex chat flows
Thanks you updating us daily on LLMs.
One simple problem … i want to finetune any opensource model on domain data on QnA task . Can you assist which model can be used as a base model and then how can i finetune it!
Hey Sam, just a heads up, I think you might have a too aggressive threshold on your noise gate. Something seems to be really clamping down the volume at the end of your sentences, which distracts from otherwise really great content :)
thanks. yes I found out this was record with the wrong mic only after it was recorded so had a lot of noise reduction to try and get it to be useable. I appreciate you reaching out though.
hi sam , amazing video, this one too! thanks. do you know how I can run on my macbook the small LaMini (donwload models and what other files...) If I try I got an error (more likely 8bit with no GPU?)
I don't think the 8bit like this will be compatible with macOS GPUs etc.
Thank you for the video's bro!
Hi Sam, these models are quite cool, especially because the smaller ones (when quantized) would run on a lot of hardware at decent speeds. Have you considered doing a tutorial on how to convert these hugging face models to run locally?
Thats an interesting idea but the models are a bit hit and miss. Let me try it in a notebook and if it works I will turn it into a video. They are certainly small enough
@@samwitteveenai done this myself today. Ran the LaMini-Flan-T5-783M through the converter that comes with CTranslate2 and quantized to 8 bit integers, got some good results especially given the speed. This is similar to the technique used by Faster Whisper (also built on CTranslate2 for transformers).
Great video!!!
The only issues with these kinds of models is their data sets are very basic.
Honestly, we need to harvest team communication data and their project files.
We should hire people to do tasks and record their process entirely.
Like pay skilled and highly trained individuals to do research and other tasks with a data collection package and then use an LLM to distill their data into a timeline and then summarize it into discrete sets.
Then we will have a number of recorded workflows that achieve the desired results.
we could basically make humans obsolete one domain at a time by slowly analyzing the full thought process and workflow.
People keep asking AI to do “magic” and infer what you want from basically no input and no domain specific knowledge. We are acting like the worlds worst boss on day one.
We need more data, specifically, the algorithms our best and brightest use to accomplish tasks. What questions they ask of themselves, how they structure their ideas, how they test and iterate and how they determine their task is complete.
With enough domain specific task completion and team work data, AI agents would be like studios or corporations in a box, each with their own workflow that achieves different and unique solutions to problems and tasks.
Transcripts of "thinking out loud" about a problem should do great as a dataset
The challenge is teams like this ending up being really expensive and then there is debate to who's answer is right etc.
@@samwitteveenai I think the price will only be paid when people are sure it can be done. Also “correctness” will not be needed. As if nobody in any corporation has completed a collaborative project before.
It would be an issue if you only brought on broke kids or young people - but the target group are corporate slaves looking to retire on royalties for harvesting their workflow data.
Transparency and honesty is key. They must be aware that their actions are being used as an example for instruction. Even the person who “calls it” and “reigns in the scope” is part of the data.
is it something we can fine-tune ourselves as well, using Colab Premium?
Yeah I have a number of videos about fine tuning and PEFT, check those out.
Very nice
I think we should have lots of small models with specialise tasks, and select one of them based on given prompt, and i am sure next week i will see one paper about it too 🤦♂️
Can I run it on Windows with CPU only?
the smaller ones should be able to.
First again, surfing on the AI wave
Very cool! I love this approach….Tku for showing this🥳🦾