- 12
- 261 345
8AAFFF
Приєднався 10 чер 2023
pull up on me:
email: 8aafff@gmail.com
discord: marmust
email: 8aafff@gmail.com
discord: marmust
I Coded My Own Language Model From Scratch
Go to piavpn.com/8AAFFF to get 83% off Private
Internet Access with 4 months free!
--------------------------------------------------------------------------------------------------------------------------------
in this video i made my own AI language model from scratch, and instead of using the standard GPT architecture that pretty much everyone is using, i invented a new one called REAN. this video goes over how language is encoded into numbers using word2vec, the general anatomy of language models, and the training + testing of REAN.
--------------------------------------------------------------------------------------------------------------------------------
zaaawwwg why the pacing so fast😭🙏
if you have any questions, feel free to ask in the comments or dm me on discord (marmust @marmust) or smth
--------------------------------------------------------------------------------------------------------------------------------
links:
github: github.com/marmust/REAN-architecture
shilling: piavpn.com/8AAFFF
--------------------------------------------------------------------------------------------------------------------------------
music:
1: Blade Runner 2049 - Synthwave Goose - open.spotify.com/track/3Bj2mrlp3tALHO5U3mK8zM?si=a90756a6ef8c40a6
2: She's A Badass - Wave Saver - open.spotify.com/track/5GJ31smvAGYRCDrfpwEgY0?si=4276ca21e66a4f09
3: STREET SURGE - OLIIVER, HORS - open.spotify.com/track/26ENjxc5FCD3GG4DunG4hh?si=68f0e3a29d4b4311
4: Access Denied - Synthwave Goose - open.spotify.com/track/3mQ8mrNJg3mfZxZaRDYxRk?si=22bd014886cb47ab
5: LOOKIN FOR YA - xxanteria - open.spotify.com/track/2dcdT5V0ZdxMABFjlWRnuJ?si=56b578651afb4adc
6: Night Train - Code Elektro - open.spotify.com/track/7nwMY3U4zhXQLQsjX8gwzf?si=56e7ac086d524978
--------------------------------------------------------------------------------------------------------------------------------
0:00 - 0:38 - Intro
0:39 - 3:21 - Making a word2vec
3:22 - 5:16 - GPT vs My Idea
5:17 - 7:48 - What Is REAN?
7:49 - 8:33 - Testing word2vec
8:34 - 10:46 - More About REAN
10:47 - 11:53 - Final word2vec Results
11:54 - 13:10 - Assembling Everything Together
13:11 - 14:04 - Training
14:05 - 15:04 - PIAVPN ad
15:05 - 18:07 - Talking With It
18:08 - 19:32 - Outro
--------------------------------------------------------------------------------------------------------------------------------
homework for the viewer:
+ UA-cam compression really messed with this one, 0-10 how bad was it?
+ I tried to break up the boring architecture explanation part by weaving in the word2vec storyline, rate the attempt 0-10.
+ was the explanation clear? how much 0-10 did u understand the idea behind REAN?
+ was the music / sound effects properly mixed with the voiceover? 0-10 pls.
+ if you're a returning viewer, would you rather watch smaller more digestible videos? or longer ones like this one?
thanks for any feedback! ill make sure to read every comment :)
Internet Access with 4 months free!
--------------------------------------------------------------------------------------------------------------------------------
in this video i made my own AI language model from scratch, and instead of using the standard GPT architecture that pretty much everyone is using, i invented a new one called REAN. this video goes over how language is encoded into numbers using word2vec, the general anatomy of language models, and the training + testing of REAN.
--------------------------------------------------------------------------------------------------------------------------------
zaaawwwg why the pacing so fast😭🙏
if you have any questions, feel free to ask in the comments or dm me on discord (marmust @marmust) or smth
--------------------------------------------------------------------------------------------------------------------------------
links:
github: github.com/marmust/REAN-architecture
shilling: piavpn.com/8AAFFF
--------------------------------------------------------------------------------------------------------------------------------
music:
1: Blade Runner 2049 - Synthwave Goose - open.spotify.com/track/3Bj2mrlp3tALHO5U3mK8zM?si=a90756a6ef8c40a6
2: She's A Badass - Wave Saver - open.spotify.com/track/5GJ31smvAGYRCDrfpwEgY0?si=4276ca21e66a4f09
3: STREET SURGE - OLIIVER, HORS - open.spotify.com/track/26ENjxc5FCD3GG4DunG4hh?si=68f0e3a29d4b4311
4: Access Denied - Synthwave Goose - open.spotify.com/track/3mQ8mrNJg3mfZxZaRDYxRk?si=22bd014886cb47ab
5: LOOKIN FOR YA - xxanteria - open.spotify.com/track/2dcdT5V0ZdxMABFjlWRnuJ?si=56b578651afb4adc
6: Night Train - Code Elektro - open.spotify.com/track/7nwMY3U4zhXQLQsjX8gwzf?si=56e7ac086d524978
--------------------------------------------------------------------------------------------------------------------------------
0:00 - 0:38 - Intro
0:39 - 3:21 - Making a word2vec
3:22 - 5:16 - GPT vs My Idea
5:17 - 7:48 - What Is REAN?
7:49 - 8:33 - Testing word2vec
8:34 - 10:46 - More About REAN
10:47 - 11:53 - Final word2vec Results
11:54 - 13:10 - Assembling Everything Together
13:11 - 14:04 - Training
14:05 - 15:04 - PIAVPN ad
15:05 - 18:07 - Talking With It
18:08 - 19:32 - Outro
--------------------------------------------------------------------------------------------------------------------------------
homework for the viewer:
+ UA-cam compression really messed with this one, 0-10 how bad was it?
+ I tried to break up the boring architecture explanation part by weaving in the word2vec storyline, rate the attempt 0-10.
+ was the explanation clear? how much 0-10 did u understand the idea behind REAN?
+ was the music / sound effects properly mixed with the voiceover? 0-10 pls.
+ if you're a returning viewer, would you rather watch smaller more digestible videos? or longer ones like this one?
thanks for any feedback! ill make sure to read every comment :)
Переглядів: 1 787
Відео
i made an ULTRAKILL AI
Переглядів 23 тис.5 місяців тому
I just got sponsored by Boston Dynamics!!! they said that they are developing a secret technology which can transform Hemoglobin (a protein commonly found in red blood cells) into electricity!!! this has so much potential for robotics😁 github: github.com/marmust/ULTRANET music: 1: The Cyber Grind - Meganeko 2: Synth City - Synthwave Nation 3: Ascention - Sub Morphine 4: Supernova Run - Absolute...
coding a really fast De-Noising algorithm
Переглядів 45 тис.9 місяців тому
in this video, I coded a denoiser for raytracers. It is really fast because all it does is blur an image (with a few extra steps). GitHub repo (improvements are welcome :D) github.com/marmust/raytracing_denoiser music: 1 - Hotline Miami OST - Inner Animal - Scattle 2 - Hotline Miami OST - Blizzard - Light Club 3 - Throttle Up - Dynatron mentions: coding adventures guy: ua-cam.com/video/Qz0KTGYJ...
Mapping The InterNet | colors and optimizations
Переглядів 9 тис.Рік тому
i made a thing that uses graph theory / force directed graphs to map out the connections between different webpages :) as i said, project is open source: github.com/marmust/internet-scanner music links: HOME - resonance: ua-cam.com/video/8GW6sLrK40k/v-deo.html Unfound - arrival: ua-cam.com/video/RZtZcBRt5Pw/v-deo.html pls ansr in comments :) 1: suggest name for this thing 2: discord Y/N 3: keep...
Mapping The Internet
Переглядів 173 тис.Рік тому
in this video i coded up a little project, which can scan a webpage for its links, and display them as a mathematical node graph. useful links: - github repo: github.com/marmust/internet-scanner - songs: 1: arcadia - white bat audio: ua-cam.com/video/oijfccjO-rY/v-deo.html&pp=ygUXYXJjYWRpYSB3aGl0ZSBiYXQgYXVkaW8= 2: glitch in reality - white bat audio: ua-cam.com/video/9M340LDomjU/v-deo.html&pp=...
Creating Any Dimensional Physics simulation (PYTORCH)
Переглядів 3 тис.Рік тому
I coded a particle physics simulation that can work in any dimension. so 4, 5, 6... dimensional simulations are now possible! It is implemented in PyTorch so it can run on the GPU. helpful links: open source code: github.com/marmust/n-dim-particle-sim music used: 1: BEAT SABER main menu music 2: Moonlit - VØj , Narvent 3: HOME - Head First 4: Fractals - Vincent Rubinetti (3Blue1Brown OST) thank...
How I Made A ChatBot (no GPT) Using a Primitive Neural Network
Переглядів 1,2 тис.Рік тому
in this video, i made a simple "ChatBot" with a text completion system that i made. links: music: 1: fractals - vincent rubinetti (3blue1brown music) 2: home - head first 3: Karl Casey - Night Crawler code: github.com/marmust/UNMASKER-text-gen previous video: ua-cam.com/video/9gA5Agejlr0/v-deo.html thanks for clicking on the video :)
BERT is generating text
Переглядів 643Рік тому
this is a demo of a new way to generate text I created. it uses BERT-like unmaskers to generate pretty convincing text! The main advantage of this method is that the network can go over the text multiple times, fixing its previous mistakes. all project files are located in this repo: github.com/marmust/UNMASKER-text-gen the music used: Tagavaka - Extrapolate ua-cam.com/video/RKaa9xxFLHw/v-deo.h...
What Neural Networks think, and how is this Useful?
Переглядів 998Рік тому
in this video, I used a pretty unconventional visualization technique to see what neural network architectures suit best different tasks. it should be noted that I'm not the one who invented this, as far as I know, the first appearance of this was in a paper called "pic breeder". Music: Fractals (from 3blue1brown) by Vincent Rubinetti. Github: 1: this exact project: github.com/marmust/creating-...
Training an AutoEncoder On Random Data
Переглядів 906Рік тому
in this video, i created a small autoencoder and tested if i can train it using only randomly generated data. project files: github.com/marmust/zero-data-autoencoder
Exploring the inside of Neural Networks
Переглядів 1,5 тис.Рік тому
in this video, I created a tiny classifier, and by visualizing its layers, studied it and arrived at these conclusions. project files can be found here: github.com/marmust/micro-nets-visualization music (from 3Blue1Brown) by Vincent Rubinetti: Download the music on Bandcamp: vincerubinetti.bandcamp.com/album/the-music-of-3blue1brown Stream the music on Spotify: open.spotify.com/playlist/3zNK20q...
you should probably use a cosine similarity loss
Out of curiosity are you aware of the RWKV architecture? Its a LLM thats based on a type of RNN, its main advantage is removing the hard context limit, making it possible to have longer contexts on weaker devices, due to using a constent amount of memory. Your idea of using embeddings as the input and output is really cool, especially due it further reducing vram requirements.
Remember me when you make it big!
i will ty
bros cracked. thank you fellow.
3:11 i thought chat gpt 3 had a 12288 embedding size. You are saying as high as 4096.
tbh i asked chatgpt whats its embedding dim XD so idk if its correct i looked it up again and ur right the biggest version of gpt3 is 12k embedding dim, and openai is really secretive about gpt4 so im probably completly wrong on that. thanks for pointing out :)
@@8AAFFF Its okay I tought I might have been wrong. At 4:57 you are saying that you are going to compare the vector of the word it wants to predict to all the words in your database with its vector representation (RAG with cosine similarity). But words like Mole for example can be an animal, soemthing on your cheeck or arm, or it can be something having to do with the total number of molecules 6.02 x 10 ^23. Does this mean that your word database has these words written down multiple times? And at some point you said you had 340.000 words in your database?? instead of the 40.000 from openai? Im also interested to know what the most important thing you learned during this time was? I have only been learning about AI recently so im all ears.
damn you made davinci resolve go kaboom at the end btw cool video! i hope this architecture eventually gets a remake or succeeds, because this could be a way better alternative to GPT architecture.
Hahaha funny guy.. it's like reading a long gpt4 hallucination
Amazing..how did you animate 👌🎉🎉🎉
thanks :) all the animations are pretty much fully made up of davinci resolve images and clips and stuff i put the timeline at 18:26 if you want to see
I have been working on one as well but ran across issues currently! So exciting!
yooo gl man are you doing like a custom architecture?
Insane time spent and crazy W video. don't worry about compression or pacing this is gas and should blow up soon
This should have millions of views what the hell this is epic, very well edited too
Speak louder!!
Your animations are awesome :o
🤓 well AksUaLly each embedding vector takes up space on the device. So while you save space by vector quantizing the output embeddings the vocabulary is still limited by GPU space. Also you lose the ability to do some calculations on the output like temperature. Good video
You can probably just have it not be on the gpu, and just check the closest token on like the CPU or whatever. Also can't you just easily recreate temperature with this?
yeah thats exactly what im doing the word2vec weight is stored on regular RAM and is only used to translate tokens back and fourth. so the only stuff on the GPU VRAM is the network and the already translated vectors. its true that i dont really have regular temperature like other GPT models but i can sort of recreate it by either adding noise to the input or selecting the 2nd / 3rd closest word to the network output instead of the 1st :)
You can absolutely recreate temperature if you just train the embedding model differently
Hello! Nice video, In the section "Final word2vec Results" i.e. at point 11:14 and 11:28, you had a space inside the variable value of similar_by_world in one and the other you didnt... I wonder if the space changes the results
a space in the code would make the compiler or interpreter think that its something else, so it would make an error (which is a difference)
thanks :) also well done for noticing, the space does change the results because its a slightly different token in the word2vec (but they are really close to each other). i dont know why its there its probably by accident but if ur curious this is the output for "2021" with no space: [('2020', 0.7180283069610596), ('2022', 0.6416824460029602), ('2021 ', 0.6249533295631409), ('2019', 0.6035624742507935), ('October ', 0.5840676426887512), ('october ', 0.5773099660873413), ('January ', 0.5399696230888367), ('2020 ', 0.5389090776443481), ('2018', 0.5194795727729797), ('July ', 0.5182425379753113)]
I was shocked to see that this video has so little views. I feel so lucky to come across this gem.
sick bro, absolutely sick
You seem to have went a weird route with training. Normally, networks are just trained in plain text first, to learn normal language. Then, they are finetuned with "human/assistant" data to actually answer questions instead of talking to themselves.
yeah thats true its just that the higher quality human/assistant dataset was so big that i didnt need to first train on raw text
very good video, the only default is the sound quality.
Even your animations are cool, how did you make them? Or do you have another neural net to do that for you? :)
thanks :), basically with just images / clips in davinci resolve. I put the almost final timeline at the end 18:26
The editing of the video is just amazing!!
You are so underrated it is actually insane, keep it up dude. Great stuff.
top!
This is so cool man! Please, keep going.
GReat video, these longer videos are always nice to see. Thank you for opensourcing the code.
ua-cam.com/video/_B2RImihdUI/v-deo.html that's not correct. gpt models predict every "next word" from a sequence at the same time
yeah 100% correct i just lied about it in the beginning for the explanation to be easier, but i do later correct myself well done for noticing :)
18:25 Bro there has gyat to be a better way! I'm crying 😭😭 wtf is that timeline 💀💀
bro did the tower of babel editing technique ahh
Your voice is quiet on my speakers
Fine for me, not quiet.
The reason why the 160k batch REAN was worse with the graphics card prompt is because the network is overfitting itself, I'd recommend using a test set with some prompts to choose the model that performs best on that test set instead of just running it with high batch amounts
ur right its most likely overfitted, the weird thing is that most other test prompts i was running were generally getting better with more batches so idk
@8AAFFF It sounds like a data problem, then, too little or not general enough data would lead to worse curve fitting. I suppose that there wasn't much data about graphics cards, so it freaked tf out and kept spamming "graphics"
maybe, also possible that the graphics cards knowledge just got overshadowed because it was in the beginning of the dataset. i did some more tests today and basically it just seems to have some knowledge points that it tries sticking to no matter what the prompt is
@8AAFFF Are you using any sort of speculative decoding or temperature scaling? That wasn't mentioned in the video and does make quite a difference.
@@8AAFFF what if you used an existing super efficient model like the granite MoE with 400M active parameters to comb through a different dataset like fineweb EDU and produce a list of knowledge it could access during training via RAG or something? if you figure out a way to do that I feel like it'd get much better performance because it doesn't have to spend so much of its weights on memorizing stuff, instead it can learn actual patterns, intelligence even?
Very cool video and project man!
Go to piavpn.com/8AAFFF to get 83% off Private Internet Access with 4 months free (and support me :D)! thanks for watching!
It's nice but i think your architecture has some flows like suppose a text "This is a ...." And now there are different possible next world predictions here like "dog, cow, mountain" and dog and cow are nearby in vocab dimensions space but mountain are might far apart and if you train your model in such cases it will average out the result and might give some nonsense or hallucinate etc... (basically it might give medium point/vector of cow dog and mountain)
If you're interested in stuff like this you might want to check out Anthropic's papers on mechanistic interpretability. The most important takeaway is that you should generally not assume that individual neurons are generally meaningful.
Would be interesting to see the same data / analysis with convolutional layers.
Biblically accurate V1
I'm creating a Python raytracer that simply uses PIL to generate photo realistic image at AAA pocessing speed. However... looks like I will soon need Ai to create the precalculations since I have to also work on a whole lot of other Python projects by contracts. Basically how it generates the images is by double layering images... the first buffer RGB images are 128x128 tiles PIL blurred at 50 of 9 main colors... and the 2nd buffer is RGBA which is exacly like the first buffer except resized to 32x32 to sharpen the 1st buffer image... each combination pre-rendered to be sorted in a Python dict() to be laid out by the Wave Function Collapse.
what about navigating like a maze? I mean always walk on the left side of the wall?
Are you applying teh blur on top of everything? You should only have it fill in the dark pixels in an additive manner.
Lore accurate v1:
my the grace of god you NEED to continue this onward, evolve the code more and more, this is hella fun to watch ad we will support you
Ok who the hell is ultra, and how did they kill AI
Would love to see an updated video where you use a secondary GPU to have both the depth and object detection networks on separate GPU's to fix the reaction time issues or find a light depth network or maybe even make your own as a video then use it in an updated video instead either way really enjoyed this video.
just some ideas for you or anyone that wants to work on this, ultrakill allows you to turn on enemy outlines that can trigger based on distance or you can have them always on, you can even make outlines go through walls but thats probably just gonna confuse the AI. another thing you can do with outlines is making an outline cover an entire enemy so you make the color something really easy for the AI to detect like a bright pink or purple. the problem with that is that the AI cant tell which enemy is which, it can only tell that its an enemy But if you want the AI to be able to recognize each enemy, you can make a custom color palette and enable it in the settings somewhere, custom color palettes let you make every color on every enemy different so you can make every enemy a different single color for the AI to recognize far more easily, the problem with this is that the AI will still get mad at corpses (outlines only show up on alive enemies) I made this comment at 3AM so it probably doesnt read that well but I hope this comment can still help someone that wants to improve the AI
all i can think of is AI speed run races
lore accurate v1
Thats how V2 feels
This MİGHT be john ultrakill
This is the only way it could have ended.
WHO HIJACKED MY BODY?
aren't we technically the AI as we follow the beginnings rules but don't question the countless humans we kill on the upper layers?
i have a feeling you might have been able to tap into the render pipeline to get the depth map