It's nice but i think your architecture has some flows like suppose a text "This is a ...." And now there are different possible next world predictions here like "dog, cow, mountain" and dog and cow are nearby in vocab dimensions space but mountain are might far apart and if you train your model in such cases it will average out the result and might give some nonsense or hallucinate etc... (basically it might give medium point/vector of cow dog and mountain)
The reason why the 160k batch REAN was worse with the graphics card prompt is because the network is overfitting itself, I'd recommend using a test set with some prompts to choose the model that performs best on that test set instead of just running it with high batch amounts
ur right its most likely overfitted, the weird thing is that most other test prompts i was running were generally getting better with more batches so idk
@8AAFFF It sounds like a data problem, then, too little or not general enough data would lead to worse curve fitting. I suppose that there wasn't much data about graphics cards, so it freaked tf out and kept spamming "graphics"
maybe, also possible that the graphics cards knowledge just got overshadowed because it was in the beginning of the dataset. i did some more tests today and basically it just seems to have some knowledge points that it tries sticking to no matter what the prompt is
Hello! Nice video, In the section "Final word2vec Results" i.e. at point 11:14 and 11:28, you had a space inside the variable value of similar_by_world in one and the other you didnt... I wonder if the space changes the results
🤓 well AksUaLly each embedding vector takes up space on the device. So while you save space by vector quantizing the output embeddings the vocabulary is still limited by GPU space. Also you lose the ability to do some calculations on the output like temperature. Good video
You seem to have went a weird route with training. Normally, networks are just trained in plain text first, to learn normal language. Then, they are finetuned with "human/assistant" data to actually answer questions instead of talking to themselves.
Go to piavpn.com/8AAFFF to get 83% off Private
Internet Access with 4 months free (and support me :D)!
thanks for watching!
It's nice but i think your architecture has some flows like suppose a text "This is a ...." And now there are different possible next world predictions here like "dog, cow, mountain" and dog and cow are nearby in vocab dimensions space but mountain are might far apart and if you train your model in such cases it will average out the result and might give some nonsense or hallucinate etc... (basically it might give medium point/vector of cow dog and mountain)
The reason why the 160k batch REAN was worse with the graphics card prompt is because the network is overfitting itself, I'd recommend using a test set with some prompts to choose the model that performs best on that test set instead of just running it with high batch amounts
ur right its most likely overfitted, the weird thing is that most other test prompts i was running were generally getting better with more batches so idk
@8AAFFF It sounds like a data problem, then, too little or not general enough data would lead to worse curve fitting. I suppose that there wasn't much data about graphics cards, so it freaked tf out and kept spamming "graphics"
maybe, also possible that the graphics cards knowledge just got overshadowed because it was in the beginning of the dataset. i did some more tests today and basically it just seems to have some knowledge points that it tries sticking to no matter what the prompt is
@8AAFFF Are you using any sort of speculative decoding or temperature scaling? That wasn't mentioned in the video and does make quite a difference.
You are so underrated it is actually insane, keep it up dude. Great stuff.
18:25 Bro there has gyat to be a better way! I'm crying 😭😭 wtf is that timeline 💀💀
bro did the tower of babel editing technique ahh
This should have millions of views what the hell this is epic, very well edited too
I was shocked to see that this video has so little views. I feel so lucky to come across this gem.
This is so cool man! Please, keep going.
Insane time spent and crazy W video. don't worry about compression or pacing this is gas and should blow up soon
GReat video, these longer videos are always nice to see. Thank you for opensourcing the code.
Very cool video and project man!
I have been working on one as well but ran across issues currently! So exciting!
Your animations are awesome :o
sick bro, absolutely sick
Amazing..how did you animate 👌🎉🎉🎉
ua-cam.com/video/_B2RImihdUI/v-deo.html that's not correct. gpt models predict every "next word" from a sequence at the same time
yeah 100% correct
i just lied about it in the beginning for the explanation to be easier, but i do later correct myself
well done for noticing :)
Hello! Nice video,
In the section "Final word2vec Results" i.e. at point 11:14 and 11:28, you had a space inside the variable value of similar_by_world in one and the other you didnt... I wonder if the space changes the results
very good video, the only default is the sound quality.
The editing of the video is just amazing!!
Even your animations are cool, how did you make them? Or do you have another neural net to do that for you? :)
thanks :), basically with just images / clips in davinci resolve.
I put the almost final timeline at the end 18:26
Hahaha funny guy.. it's like reading a long gpt4 hallucination
🤓 well AksUaLly each embedding vector takes up space on the device. So while you save space by vector quantizing the output embeddings the vocabulary is still limited by GPU space. Also you lose the ability to do some calculations on the output like temperature. Good video
Your voice is quiet on my speakers
Fine for me, not quiet.
You seem to have went a weird route with training. Normally, networks are just trained in plain text first, to learn normal language. Then, they are finetuned with "human/assistant" data to actually answer questions instead of talking to themselves.
yeah thats true
its just that the higher quality human/assistant dataset was so big that i didnt need to first train on raw text
top!
Speak louder!!