The BEST Open Source LLM? (Falcon 40B)

sentdex

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 27 вер 2024
TII Call for Proposals with Falcon 40B: falconllm.tii....
Falcon Github samples: github.com/Sen...
TermGPT: • Letting GPT-4 Control ...
GPT-4 Overview: • Sparks of AGI? - Analy...
Neural Networks from Scratch book: nnfs.io
Channel membership: / @sentdex
Discord: / discord
Reddit: / sentdex
Support the content: pythonprogramm...
Twitter: / sentdex
Instagram: / sentdex
Facebook: / pythonprogramming.net
Twitch: / sentdex

КОМЕНТАРІ • 193

@nuclear_AI Рік тому ⁺⁵
Just realised for the first time that Im watching your videosfor work... I used to watch them for fun, and now I get paid to watch them!!! Feeling quite humble ☺
@jolieriskin4446 Рік тому ⁺⁶⁶
I would guess the reason why some of the more modern models at much lower parameter counts are performing better than GPT3/3.5 is because the latter were trained pre the chinchilla paper on datasets that were too small in relationship to their parameter counts. Prior to chinchilla it was common to use a 2:1 ratio compared to post-chinchilla were 20:1 or 30:1 is now the norm.
@gigiopincio5006 Рік тому ⁺⁷
thanks for this insight, man. it's a bit hard to navigate the field right now as everybody and their cat are publishing.
@Zaratch Рік тому
Hi,
Can you share any suitable resources for your statement so that I can explore it a little
@ZeroRelevance Рік тому
@@ZaratchSearch up ‘Chinchilla’s Wild Implications,’ I think it’s a good overview
@theawesomeharris Рік тому
wow nice info!
@Zaratch Рік тому
@@ZeroRelevance I'll look into it. Thank you. Apart from it, do you recommend anything which can help me make my way in AI for my career?
@icandreamstream Рік тому ⁺⁴
Stoked for the fine-tuning video, can’t wait
@micycle8778 Рік тому ⁺⁴
Holy shit, last time I watched you you were teaching me game development in your bedroom, now you're living in a data center
@shephusted2714 Рік тому ⁺⁹
my comments would include - having model run on much more data and much more recent data and also training the model on all your docs plus having more aggregated data and aggregated plugins - i think the main bottleneck for most open source ai LLM is the amount of nvram (gpu ram) available - it would be nice to find ways around this via ram disks or lower cost gpu cluster nodes - eventually we will see more gpu with lots more ram but it could take a while - lots of growth and interest in ai will help push things forward quickly - the hardware is catching up it is just not quite there yet for the common man - in 10 years we will likely have quantum functions helping out and face a similar situation all over again but it is more than enough now to just enjoy what has been wrought and look at the constant daily improvement and be happy with that and not project too much
@jokmenen_ Рік тому ⁺¹⁵
This is awesome! Not having to send all your data to openai is crucial for privacy reasons. Wonder how it performs in languages different than english...
@fiqigading102 Рік тому ⁺¹
Tell me if you doing it pls, i read that it cant be work for language such as indonesian,malay and other asian languange
@vasilylukichev-pp4sh Рік тому
@@fiqigading102 mb you could just translate to indo?
@fiqigading102 Рік тому
@@vasilylukichev-pp4sh hmmm, i think the auto translate output is not very good too. There is a good translate that use AI but i think its not free xD
@jonathan-._.- Рік тому ⁺⁷⁷
🤔i think it would be neat to always have like 6 examples per prompt to get a good overview over a models capabilities
@sentdex Рік тому ⁺²⁷
Thank you for your feedback
@Brazil-loves-you Рік тому ⁺¹²
@@sentdexI agree with that guy
@jonathan-._.- Рік тому ⁺³
@youtubescroller886 lol 'that guy' 😅 1 post later and you already forgot the name 🤣
@filthyE Рік тому ⁺⁷
I too agree with that guy
@jlfazmo Рік тому ⁺⁹
@@jonathan-._.- I'm afraid you have to change your channel name to "that guy" now
@PujanPanoramicPizzazz Рік тому ⁺¹
Again as the comments suggest here.. can't wait for your Fine tuning video 😅
@MeanGeneHacks Рік тому ⁺²
Can't wait for qlora fine tuning video!
@BryanChance 9 місяців тому
The pace of development in LLM is at light speed. My brain hurts trying to keep abreast of the myriad of applications and use cases. The opportunities are endless and all of this (general public use) happened, as you have stated, within the last 12 months or so. I have a feeling it's going to continue for the next 2-5 years. Hopefully, I'll still be employed as a DevOPs engineer. LOL
@tizianonakamader8177 Рік тому
I still remember how I started following you my dude … with tutorials on trading with python 😂 a long time ago
@sonilakshya Рік тому ⁺²
i was trying to use falcon 40b instruct on a 96vCPU , 360 GB RAM and 4 NVIDIA T4 GPUs, but it takes almost an hour to give a single output. can someone please tell me if there is something that I might be doing wrong, for the inference time to be this high or does it usually take this much time to run?
@rog0079 Рік тому ⁺²
Great video! Do create a tut where we can finetune this model on our own custom dataset!
@ander300 Рік тому
Part 10 of Neural Net from Scratch, about analytical derivatives??? Please bring the series back!
@rudy9546 Рік тому
@21:30 is a spicy take , i'm in
@fuba44 Рік тому ⁺⁴
Ups is delivering my 100usd Nvidia P40 card tomorrow.. hoping I can make it run these models. Won't fit the 40b model tho.. maybe if I find one more card in my price range..
@sentdex Рік тому ⁺³
T40? I m not familiar. Do you mean a T4? If you got a T4 for $100 that'd be awesome. With 16GB of memory, you could check out the Falcon 7B model.
@fuba44 Рік тому ⁺²
@@sentdex no no, the older 24gb Nvidia Tesla p40, I'm just realizing that autocorrect changed it to T40.
@sentdex Рік тому
@fuba44 oh wow. Are you running into any trouble with the pascal architecture?
@fuba44 Рік тому
@@sentdex so fare no, no issues. Infact I'm considering ordering a second card.
@fuba44 Рік тому
@@w花b maybe I should be more clear here, I had no issues with the architecture, but the card it self is not "consumer grade" so it is not just "plug and play". It comes fan less so I needed to buy a special squirrel cage fan and design/3d-print a custom adapter in order to cool the card. And it's not your standard PCIe power adapter, it's a special plug so you need a special PCIe converter to power it. I found those hard to get where I live.
@horus4862 Рік тому
You ROCK!!! Love your Work!
@dabunnisher29 Рік тому ⁺⁵
I wonder how slow this would be on a Raspberry Pi 4.
@sentdex Рік тому ⁺¹³
You'd be wondering for quite some time I imagine :D
@danielmz99 Рік тому ⁺¹
Hi thanks for sharing this content. Could you create a video in fine tuning this model or create chain of thoughts/ fine tuning for complex tasks on Falcon-40b??
@trieule2012 Рік тому
Do you have any videos about fine-tune falcon model?
@Ryan-yj4sd Рік тому ⁺¹
How do you fine tune? Also, if you wanted an API endpoint, how would you host it without breaking the bank? It seems like it would be more expensive than Open AI
@shobhitbishop Рік тому ⁺²
Are these model capable enough to parse tabular data? Just like gpt turbo is after creating a csv agent?
@sentdex Рік тому ⁺¹
Haven't tried, but I would expect it to. Do you have a simple example that I could test?
@shobhitbishop Рік тому ⁺¹
I don’t know i shared you the link, but it got deleted somehow.
@shobhitbishop Рік тому
@sentdex can you try building the csv agent using the open-source LLM models? That would be really a game changer since reading and analysing data via LLM would be something on another level.
@nathank5140 Рік тому ⁺¹
Any chance you could cover possible techniques to running a model over multiple GPUs so that we could for example run 80 billion parameter models
@sentdex Рік тому ⁺¹
The examples shown in the github include that. It's fairly trivial these days. Just let transformers lib automatically map to your devices pretty much.
@LordYodel 11 місяців тому
What you think is the best free open source language model today? thx a bunch!
@erictheawesomest Рік тому ⁺¹
Do you think you could do a video explaining hardware requirements and cpu vs gpu?
Also how does the bit size affect ram and performance?
For example, I'm considering buying more ram for my pc 2 x 32bg for a total of 96gb (already have 32gb). But I have no idea if that would be enough for a 13-15b model (I would be running on cpu)
Unfortunately reddit is closed so I can't really ask these questions. But maybe this is an easy video for you if you were looking for content ideas.
@pbjandahighfive Рік тому ⁺²
FYI you want all of your RAM to be composed of the same size sticks and have an even number of them. There are weird technical reasons why, but having an uneven number of RAM sticks or having RAM sticks of different sizes cause performance losses. Anyways, you're going to need VRAM more than RAM, which means you need a specialty GPU with a lot of VRAM, something like an A6000. Most standard gaming GPUs don't have anywhere near enough VRAM until you get to the upper-upper end of modern consumer cards like the RTX 4090, but even then it still has less VRAM than something like an A6000 (4090 has 24GB VRAM and an A6000 has 48GB, there are also other workstation cards cheaper than an A6000 which also have 48GB VRAM as well). Usually these types of GPUs are classified as "workstation" GPUs and can be very expensive. Even then you might still actually need 2 of them to get the 40B model running. You could probably get the 7B model running on a GPU with around 16GB VRAM though.
@erictheawesomest Рік тому
@@pbjandahighfive thanks man
@fuba44 Рік тому ⁺²
Audio is fine for me.
@sentdex Рік тому
Thank you for confirming!
@serta5727 Рік тому
Really cool thing ❤
@spiderjerusalem Рік тому ⁺¹
This LLM is the best. He wears this necklace that resembles an egg with disoriented eyes and nose and mouth, looks a bit creepy, but he is a very good person.
@MuhammadZahidIqbal-y2z Рік тому
Does conversation history works with Falcon40B Instruct, anyone tried?
@DLmadison100 Рік тому
I have a question. I currently have GPT4All on my PC. I'm using a simple install package that uses the CPU and system RAM. My PC is a workstation (custom build) running with an Intel Xeon E5-2680 v4 CPU, (14 cores / 28 threads), and 32GB of system RAM. I also have an RTX 2060 graphics card, with 12GB of GDDR5 memory but, that would not seem needed for this framework I have installed. The largest chatbots available are 13B and have a minimum system requirement of 16GB RAM. I have dedicated 12 threads, which seems to be, functionally, just the same as when I had 8 dedicated to the Chatbot. It typically responds faster than the time it takes me, (usually) to ask it questions but, I notice it hallucinates.
My question is, would I be able to run this 40B local LLM (Falcon) with my current system and, if so, where would I download such a Instruct LLM like this; (that would run on my CPU /system RAM)?
@SRSGMAG Рік тому
Google GML
@fire17102 Рік тому
Thanks sentdex! Is there an open source implementation for function-calling like openai's that works with falcon or any locally runnable model?
@fire17102 Рік тому
Also can you cover easy workflows, for finetuning the instruct model to work with public/private self-curated big data such as pdfs or large text files
@sentdex Рік тому ⁺¹
I think the closest to it would be either: Doing it yourself with prompting or possibly LangChain Agents. I may eventually cover Langchain agents for this reason, but I am still undecided if I want to use that or just do it myself.
@fire17102 Рік тому
@@sentdex I also have some nice ideas to try, but regarding function-calling , I've seen things like MLQL and glimpses of others that manage to solve llms' formatting issues and prepare data for api calls but openai's way is by far the most simple and elegant, I just hope to hear soon that someone cracks it and generalize to accept and use any model. If you hear something please let us know!
Thanks for the reply 💛
@lmtr0 Рік тому
what is this lambda platform you talk about?
@mickelodiansurname9578 Рік тому
okay so yeah... better get my act together putting in a proposal there. My guess is they will need to see a MVP.
@deltaanalytics3407 Рік тому
Neat outro
@rafaelmartinsdecastro7641 Рік тому
Good stuff
@14zrobot Рік тому
I don't know about GPT4 not making mistakes as often as Falcon. I do not remember a single time when anything code came out with no syntax errors, or that ran on the first go
@pousoix Рік тому
Anyone know what RBRMs are at 9:40? Haven’t been able to Google it successfully.
Edit: Rule-Based Reward Models!
@Ryan-yj4sd Рік тому
Possible to use with functions? I want to extract json data from blocks of text. I have 8mm records, so open ai will be too expensive
@RabeeQasem 4 місяці тому
if you could do a tutorial on fine tunning the new version of it
thank you for your videos
@OtherTNSEE Рік тому
Just need a larger context window, 8k now really (6000) is about most of us with 24Gb cards can push now. Nvidia and AMD need to give us more vram.
@khalidal-reemi3361 Рік тому ⁺¹
can we fine tune this model to create custom embeddings ?
@sentdex Рік тому ⁺³
Yes
@EvenTheDogAgrees Рік тому
I just tried asking ChatGPT that practicing law question, and it got it right. I'm on the free plan, so that would still be 3.5, right?
@footballtocricket6989 Рік тому
But can't implement using code, getting error. Can you share any working codes or method
@USBEN. Рік тому
Can these models access the internet like bing gpt? If not, how will that be possible?
@sajjaddehghani8735 Рік тому
What is your opinion about mpt-30B
@evgeniermakov1522 Рік тому
Этот чел занимался нейронками ещё до того, как это бомбануло)
@techie_guy Рік тому
@sentdex - could you pls try a similar vide on MPT-30 by MosaicML #llm #ai #mpt30b #mosaicml
@NeuroScientician Рік тому
aaaa you are still alive :D
@Krath1988 Рік тому
Can you do a video about save the models and training data locally and running them from your own GPUs? Lots of people have GPUs, few people want to pay hourly for cloud services..
@sentdex Рік тому ⁺¹
The code shown here is mostly me running it locally. Used cloud too to speed it up but the code is identical. You may just not have enough gpu memory locally to run these models. Otherwise, might I ask what you're having trouble with locally?
@Krath1988 Рік тому ⁺¹
@@sentdex I wonder if AMD would sponsor if you showed a video chaining 4 radeon XTX7900's together and loading a 40-50bil parameter model =).
@marcol869 Рік тому
Will this run on a nivida p40?
@ghaithkhelifi5064 Рік тому
can i run it 49b falcon on my ryzen 9 5900x and rtx 3090
@Woollzable Рік тому
No. It would be too slow.
@davzmore3623 Рік тому
Can you do a comparison to mpt30b
@kevinbacon8716 Рік тому
Now I just need to put a couple A100s on layaway.
@1PercentPure Рік тому
hahahahahaha yes man
i feel you so much
@aungkhant502 Рік тому
Have you tried anything censorship related with the model? There were some posts about it acting weird about Saudi politics and LGBTQ topics.
@sentdex Рік тому ⁺¹
I've not seen those rumors, but I encourage you to try the model for yourself. I don't find that to be even remotely true. This model feels like one of the most un-manipulated general purpose models that I've used so far.
@cmp-nct Рік тому ⁺²
I've tested that quite a bit and it's not censoring. It has no preference for the muslim religion or any certain politics.
The Instruct models are "safety" flavored, that's because they contain OpenAI data. The foundation models show no artificial biases.
@bigphab7205 Рік тому
What would Jordi do?
@TylerMatthewHarris 7 місяців тому
This was wonderful news. Gpt4 is declining
@aleksay2142 Рік тому
How much did they pay for this promo!?
@graham8316 Рік тому
Making a mixture model with falcon might get it closer to gpt4
@jsalsman Рік тому
Is it true that the Falcon models won't usually say anything negative about the UAE or is that just a rumor? The only official word on censorship is that they removed adult content and machine generated text from its training data. (Also, how did they identify machine generated text? That's known to be an extremely hard problem.)
@sentdex Рік тому ⁺⁴
When I use and work with GPT-4, I think it's relatively clear there's what I would call "heavy" moderation from OpenAI being applied. When I work with Falcon 40B, I do not notice any such thing. Even on topics you might suspect there could be moderation applied, I do not find any examples, so I would say that's just false. Beyond that, it's a totally open model, so any real moderation/censorship effort would be kind of pointless since users have the power to modify weights w/ fine-tuning.
As for identifying machine generated text, I think we probably have to wait for their paper to be released to learn more about that. It's certainly going to be a problem going forward. I am very curious to hear more about how they curated data too.
@alrashidd Рік тому
falcon b was created by young Algerian geniuses, and since the Emirates regime is a criminal, conspiring and oppressive regime that is hostile to Algeria, the Emirati regime will be exposed soon, God willing.
@MrAquaktus Рік тому ⁺¹
As a researcher, I’ve found Star coder and star chat (beta) to be very effective instruction tuned models, even for NL.
In general, how have you found the comparison of LlaMa vs star coder/char vs falcon?
Also on inference times given the flash attention in the huggingface models.
@sentdex Рік тому
I've been staying away from the legal vaguery that is llama. I think Facebook should have gone about the release of those models much differently. I've only lightly played with a few llama models, but haven't put much effort into them to actually work deeply with them due to licensing questions. There are "work arounds" that it appears Facebook is choosing to allow, but tides can change at any time and I don't like that.
As for StarCoder, I used it a bit and didn't really find it to be good enough. Do you think StarCoder is better than results here for coding? Have you some examples from StarCoder that you think show it's exceptional? At least at the moment, I am mostly thinking in terms of my TermGPT project, so I need mostly code, but also need a decent language understanding and also system administration in general. So far, I find Falcon 40B to be the best here, but everything is purely anecdotal. I dont need everything to be contained in one model either, but I am seriously considering putting my stakes down in Falcon 40B, probably fine-tuning it slightly and going full steam ahead.
@smithpit7133 Рік тому
I’m a research student from Melbourne and would just like to know if Falcon is a censored LLM like ChatGPT? I’m trying to study the bias of LLMs and whether they are impartial with their answers
@kostik Рік тому
Too bad the sequence length is just 2048
@qu765 Рік тому
11:00
HA the scifi trope of making computers autistic is dead wrong.
@vlogsbynazreen Рік тому
Hey. Please increase the volume of future videos if possible :)
@sczoot6285 Рік тому
AGI is impossible to realize and the sooner we realize that the better
@JimBobsBass Рік тому ⁺¹
5th
@CheapDeath96 Рік тому ⁺¹
ehehheeheheh
@sentdex Рік тому ⁺²
Absolutely massive!
@markoh9974 Рік тому
Pop filter bro...
@th3ist Рік тому
Funded by the UAE? LOL
@streamshorts7833 Рік тому ⁺¹
all worthless without book size context limits.
@Gringohuevon Рік тому
I tried it and wasn't impressed at all
@hakoren4444 Рік тому
Hi, model 40b-instruct works on my 3090 with torch.bfloat16. It takes about 23GB VRAM and 62GB RAM. Am I doing anything wrong? 😁
@Woollzable Рік тому
Yes, you're supposed to fit the entire model on your GPU (the VRAM). Otherwise it would take way too long. How long does it take to process a single token? You need something like 45-50 GB memory for the falcon 40b-instruct even at 8-bit precision. Try the 7b model instead, and test 8bit and bfloat16.
@hakoren4444 Рік тому
@@Woollzable Hi, meanwhile my ssd died so I no longer have this model and it takes too long to download. I would rather test llama-2, which should be better than falcon in every test. (see The Impact of chatGPT talks (2023) - Keynote address by Prof. Yann LeCun)
@Woollzable Рік тому ⁺¹
Which Llama-2 are you referring to? You don't have enough memory for some Llama-2 models, only the Llama-2 7b (at both 8-bits & 16-bits) model and Llama-2 13b (at 8 bits)
Let's break it down. The size of the model in RAM memory is very roughly:
Let's say the model has 7 billion parameters:
32-bit: 7 billion parameters * 4 = 28 GB GPU RAM minimum. (not used for inference)
16-bit: 7 billion parameters * 2 = 14 GB GPU RAM minimum.
8-bit: 7 billion parameters * 1 = 7 GB GPU RAM minimum. That's because 8 bits = 1 byte.
All of this is only for inference, not for training/fine-tuning. For training you need to at minimum double the RAM size as for inference (just using the model without training/fine-tuning)
32-bit is not used for inference. Usually the highest is 16-bit. but for training/fine-tuning, you might need higher precision such as 24-bit (automatic mixed precision) or 32-bit.
Also, your CPU RAM is not that important, you can fit part of the model in the GPU RAM and part of the model in the CPU RAM and then do a memory swap, but that is extremely slow and not recommended, but it might work in some cases.
Now think about Llama-2 13b which has 13 billion parameters. Do the same calculations as above but replace the 7 billion with 13 billion. You can probably run the Llama-2 13b at 8-bit.
Finally, think about Llama-2 70b which has 70 billion parameters. You do the same calculations :D
I also forgot to tell you that the lower precision you go to, the worse the performance/accuracy. 16-bit is better than 8-bit which is better than 4-bit etc. Quantization (going to lower bits) is done to save memory but at the cost of accuracy.
@@hakoren4444
@hakoren4444 Рік тому
Thanks for explanation.
I do time series regression where there is no problem to fit model in memory :D due to small amount of data.
@@Woollzable
@snarkyboojum Рік тому ⁺²³
Very cool. I got the Falcon-7b-instruct model working on my home PC while I watched your video. Only took about 5-10 mins to get it all going. Inference works well on my RTX 4080 (16GB) GPU too. As soon as I load the model using torch.bfloat16, the transformers library allocated ~14GB of GPU memory, but it works really well!
I'm going to have to replace my LLM app development with this local endpoint to save cost on OpenAI API calls ;) I wonder if that's a thing, a local dev loop pointing at a smaller, locally hosted LLM, and then when pushed to production, a large model or hosted endpoint, a la GPT-4. Depending on how you use the LLM in your application, I can imagine this could possibly lead to a whole new class of heisen-like bugs. Interesting to think about.
Great vid btw. I like how you keep things simple, and high-level. This is the perfect level of depth/complexity for video.
@Moonz97 Рік тому ⁺¹
Could you share some metrics on your inference speed?
@snarkyboojum Рік тому
@@Moonz97 Using the text-generation pipeline I'm getting about 10-12 tokens/second on my GTX 4080.
@edilgin622 Рік тому
@@snarkyboojum that is actually a really good number isnt it? When I did my project with GPT3.5 they allowed 3 prompts a minute even though I was paying for prompts.
@nihaoworld Рік тому
how is it going now?
@jimdelsol1941 Рік тому
Audio is fine for me.
@MyTubeAIFlaskApp Рік тому ⁺¹
If you had a bigger microphone, it could cover your whole face.
@sentdex Рік тому
The dream!
@ramen4953 Рік тому ⁺³
not gonna continue neural networks from scratch series?
@sentdex Рік тому ⁺⁶
We will.
@antopolskiy Рік тому ⁺¹³
would be awesome to see you finetune the model. do you know if something like LORA could work to reduce the cost of fine-tuning?
@sentdex Рік тому ⁺¹⁷
We'll almost certainly be using lora to fine tune falcon 40b, so stay tuned for that. I am very curious to see how well it works in practice. I think the hardest part right now though is gathering quality fine-tuning data. It's easy to think of 10 decent samples. It's quite hard to think of a thousand haha.
@macklinrw Рік тому ⁺⁷
@@sentdex I was able to successfully fine tune the model using qlora, I think it took >160gb of VRAM. But even just for 10 minutes of training (I only had that amount of credits) I was able to get much better results than fine tuning falcon 7b for 10+ hours.
@ymndoseijin Рік тому ⁺²
@@macklinrw what service did you use?
@snarkyboojum Рік тому
@@sentdex Crowd source it from your 1.2M followers on UA-cam ;)
@lorenzoiotti Рік тому
@@sentdextake a look at the Lima dataset (less is more for alignment) I suggest you individually send each example to gpt3.5 and rephrase it to explain step by step/verify/change style to create a perfect dataset, had great success with llama 7b this way fine tuning on curated LIMA
@alrashidd Рік тому
gpt often makes mistakes. Sometimes it becomes stupidity and not artificial intelligence. I haven't tried Falcon yet. Is it better?
@the_traveller6994 11 місяців тому
Would you be so kind as to refer me to a video explaining, at a non - computer person level, how to set this up?
@yiwensin5913 Рік тому ⁺⁸
Do you think the 7B model can be fine tuned to auto-completing code and be used as a local and good substitute for co-pilot? (for those who have the required compute power, which I don't :D)
@sentdex Рік тому ⁺⁹
Possibly fine-tuned, but there might be better models at that size for just auto-completing, like replit/replit-code-v1-3b. If you had an awesome fine-tuning dataset though, I imagine Falcon 7B could be quite good at this task.
@popothebright 10 місяців тому
Try asking time differences. If it's 1:00am in Tokyo, what time is it in London. Amazing how few LLM's get this right.
@erbterb Рік тому
Its all fun and games, but can it predict the future? Otherwise what is the point.
Not inventing the future by controlling data, but purely predicting it by the inputs of the world.
@WadRex Рік тому ⁺⁶
Amazing vid.
One question, tho, is base ChatGPT actually a 175B? Was it confirmed by anyone? I mean, the "original default" version probably was somewhere around those amount of params. However, since they introduced the "turbo" version, I feel like they just scaled it down. It feels to me that it actually got dumber in some instances, and additionally, how would they actually speed it up if the underlying architecture is still GPT-3.5.
I definitely do agree tho that the Falcon 40B and LLaMA-65B "feel" more knowledgeable than 3.5 from my experience, with LLaMA slightly outperforming Falcon. This is all subjective ofcourse and it depends on what your use case is. This ties neatly with final observation.
The coding part of these models is still FAR from what I could get even with 3.5. This might change, however, if we finetune the base models to act as a sort of agents for specific tasks since the models are ours to modify.
I tried playing with LoRA / QLoRA, but I couldn't achieve any good results for some reason (LLaMA models). I tried replicating early Alpaca training, and it all flopped. There are probably some errors in the code I can not seem to recognize...
As for Falcon, it just takes a huge amount of time, and unfortunately, I can not afford not to use my PC for more than a day or two, so I didn't have a chance to play with it.
@sentdex Рік тому ⁺⁸
I think I heard GPT 3.5 is actually smaller than GPT3, but I might be mistaken, and that'd be the "turbo" version you mentioned that does indeed feel a little scaled down, even since init release. It's hard to really know when so much is kept "secret." It becomes especially problematic as we learn that these models are used on their own outputs too in some, or maybe all, cases to further "improve" outputs. I sure wish OpenAI was a tiny bit more open :D
As for fine-tuning, stay tuned. We're going to almost certainly make use of LoRA/QLoRA.
@WadRex Рік тому ⁺¹
@@sentdex Would love to see your take on LoRAs. Can't wait!
@kyber.octopus Рік тому ⁺²
Isn't the model fully deterministic if you use the exact same seed and weights are exactly the same for each prompt?
@iansharoo2 Рік тому ⁺²
Aren’t all current ai deterministic given identical inputs?
@sentdex Рік тому ⁺⁵
It depends how you frame things. If you have identical input w/ identical seed, yes, a frozen model is very much deterministic.
In practice, with LLMs, and natural language input, however, your input is going to nearly infinitely vary. As such, your outputs will too, so "natural language" input applications are by nature going to be treated as non-deterministic.
@streamshorts7833 Рік тому ⁺¹
@@iansharoo2 no
@just_A_doctor Рік тому
It is not 100% open source
60% of the model only.
@merrell_io Рік тому ⁺²
RunPod is another solid alternative to Lambda to run these models :)
@sentdex Рік тому ⁺²
I've seen a few people using runpod. I think the ~concept~ is fantastic. I tried once to dive in but all the abstractions were kind of annoying. I plan to revisit it because I want it to work for my needs, but we'll see haha. Used right, it could be potentially even cheaper than Lambda and even simpler to use possibly.
@merrell_io Рік тому
@@sentdex curious what you mean by abstractions in the context of the platform?
@sentdex Рік тому ⁺¹
@@merrell_io Essentially it seemed quite challenging to integrate runpod into a larger application that might be powered by an LLM. This is in particular to the "serverless GPU," as that's what I was looking at and think I would want. I cannot remotely speak on runpod intelligently though and forget the precise details where I formed that opinion, which is why I plan to dive in again. I happened across it while also looking into langchain, and I need to probably look into runpod on its own entirely first lol
@5eZa Рік тому
you want to give GPT4 a terminal huh lol
@PrimordialLegend Рік тому ⁺²
Low volume
Please fix it from the next one!
Thanks!
@PrimordialLegend Рік тому
@@deonex4993 Still not near as loud as the other videos.
@sentdex Рік тому
Hmm. To be honest I am not noticing any audio level issues here. This has been a problem with some previous videos, but I now actually have decibel checks in my workflow for producing a video because of those issues in the past and I find the level here to be as expected.
@Personnenenparle Рік тому ⁺⁴
I really hope an llm as powerful as gpt4 becomes available open source soonish.. having an llm running in an engineering business's server would allow for safer use.. without sharing sensitive information to a server
@tylermansmann2843 Рік тому
GPT4 looks like it is 8 models running in parallel at the moment with only 220B parameters each. GPT4 level performance will be awhile away but it seems like performance scales up as better data is made, and models can be scaled down while maintaining performance in this way.
The future is bright
@teknosql4740 Рік тому
This is serious disaster to this world
@CheapDeath96 Рік тому ⁺¹
1st
@jayhu6075 Рік тому
What a amazing topic about the open-source model Falcon 40B. A very important sentence what you say it is "YOURS" ( model).
@jaredzhao665 Рік тому
how about wizardcoder? It seems like wizardcoder might turn out to be a better coding LLM than falcon 40B?
@Zzznmop Рік тому
noob question: why is hugging face benchmarking important?
@szymon308 Рік тому ⁺¹
Great videos lately!
@sentdex Рік тому ⁺¹
Thanks!
@noahkeys-asgill5664 Рік тому
its actually because i am a magic fairy which makes it work
@timmygilbert4102 Рік тому
If you constraints the logits selection to the user input context, after subsequents regressive updates, the model performance shouts up
@PASTRAMIKick Рік тому
I was just playing around with Falcon chat a day ago, it's pretty good and awesome for being open source
@radus8832 Рік тому
how is Falcon performing in other languages?
@antindie Рік тому
how do i concretely do finetuning?
@redthunder6183 Рік тому ⁺²
Please finish your neural network from scratch series, there’s only one more episode needed to finish it and it would help so many people its the only good series I found on UA-cam that explains it clearly.
I followed it all the way through and it took me absolutely ages to figure out back propagation, there were so many tiny questions I had that could have saved hours if it was just explained through an example. And once I got it working, I thought it was wrong because the network cost was decreasing but the accuracy stayed the same, and it took me forever to realize that it was a combination of the size, learning rate, and number of epochs that caused that, not my code.
Any please finish it, it would help so many people who trying to are learning machine learning fundamentals. Anyone who has made it to the final episode is looking to learn, and will find it extremely useful
@arpitsrivastava2996 Рік тому
Any recommendations on SQL and ability to answer questions from multiple tables and plot graphs let's say from a CRM dataset?
@JakeThe_Dog Рік тому
Langchain

Наступне

Автоматичне відтворення

QLoRA is all you need (Fast and lightweight model fine-tuning)