I Coded My Own Language Model From Scratch

Поділитися
Вставка
  • Опубліковано 13 гру 2024

КОМЕНТАРІ • 35

  • @8AAFFF
    @8AAFFF  2 дні тому +2

    Go to piavpn.com/8AAFFF to get 83% off Private
    Internet Access with 4 months free (and support me :D)!
    thanks for watching!

    • @thefcraft8763
      @thefcraft8763 53 хвилини тому

      It's nice but i think your architecture has some flows like suppose a text "This is a ...." And now there are different possible next world predictions here like "dog, cow, mountain" and dog and cow are nearby in vocab dimensions space but mountain are might far apart and if you train your model in such cases it will average out the result and might give some nonsense or hallucinate etc... (basically it might give medium point/vector of cow dog and mountain)

  • @scoffpickle9655
    @scoffpickle9655 2 дні тому +15

    The reason why the 160k batch REAN was worse with the graphics card prompt is because the network is overfitting itself, I'd recommend using a test set with some prompts to choose the model that performs best on that test set instead of just running it with high batch amounts

    • @8AAFFF
      @8AAFFF  2 дні тому +4

      ur right its most likely overfitted, the weird thing is that most other test prompts i was running were generally getting better with more batches so idk

    • @scoffpickle9655
      @scoffpickle9655 День тому +1

      @8AAFFF It sounds like a data problem, then, too little or not general enough data would lead to worse curve fitting. I suppose that there wasn't much data about graphics cards, so it freaked tf out and kept spamming "graphics"

    • @8AAFFF
      @8AAFFF  День тому +1

      maybe, also possible that the graphics cards knowledge just got overshadowed because it was in the beginning of the dataset. i did some more tests today and basically it just seems to have some knowledge points that it tries sticking to no matter what the prompt is

    • @PaulanerStudios
      @PaulanerStudios 8 годин тому

      @8AAFFF Are you using any sort of speculative decoding or temperature scaling? That wasn't mentioned in the video and does make quite a difference.

  • @salad_txt
    @salad_txt 2 дні тому +14

    You are so underrated it is actually insane, keep it up dude. Great stuff.

  • @zaj007
    @zaj007 2 дні тому +10

    18:25 Bro there has gyat to be a better way! I'm crying 😭😭 wtf is that timeline 💀💀

    • @8AAFFF
      @8AAFFF  2 дні тому +3

      bro did the tower of babel editing technique ahh

  • @jaythecoderx4623
    @jaythecoderx4623 33 хвилини тому +1

    This should have millions of views what the hell this is epic, very well edited too

  • @brams06
    @brams06 7 годин тому +1

    I was shocked to see that this video has so little views. I feel so lucky to come across this gem.

  • @lionlight9514
    @lionlight9514 2 дні тому +2

    This is so cool man! Please, keep going.

  • @aamindehkordi
    @aamindehkordi 33 хвилини тому

    Insane time spent and crazy W video. don't worry about compression or pacing this is gas and should blow up soon

  • @slowpoke101
    @slowpoke101 2 дні тому +3

    GReat video, these longer videos are always nice to see. Thank you for opensourcing the code.

  • @kotway2
    @kotway2 2 дні тому +1

    Very cool video and project man!

  • @alisyoung2741
    @alisyoung2741 29 хвилин тому

    I have been working on one as well but ran across issues currently! So exciting!

  • @TeamDman
    @TeamDman 3 години тому

    Your animations are awesome :o

  • @Moshi74533
    @Moshi74533 9 годин тому

    sick bro, absolutely sick

  • @60pluscrazy
    @60pluscrazy 25 хвилин тому

    Amazing..how did you animate 👌🎉🎉🎉

  • @takeraparterer
    @takeraparterer 2 дні тому +4

    ua-cam.com/video/_B2RImihdUI/v-deo.html that's not correct. gpt models predict every "next word" from a sequence at the same time

    • @8AAFFF
      @8AAFFF  День тому +1

      yeah 100% correct
      i just lied about it in the beginning for the explanation to be easier, but i do later correct myself
      well done for noticing :)

  • @MrNootka
    @MrNootka 4 години тому

    Hello! Nice video,
    In the section "Final word2vec Results" i.e. at point 11:14 and 11:28, you had a space inside the variable value of similar_by_world in one and the other you didnt... I wonder if the space changes the results

  • @Leb369
    @Leb369 День тому

    very good video, the only default is the sound quality.

  • @fortaber
    @fortaber День тому +2

    The editing of the video is just amazing!!

  • @VioletGiraffe
    @VioletGiraffe День тому

    Even your animations are cool, how did you make them? Or do you have another neural net to do that for you? :)

    • @8AAFFF
      @8AAFFF  День тому

      thanks :), basically with just images / clips in davinci resolve.
      I put the almost final timeline at the end 18:26

  • @TheTruthOfAI
    @TheTruthOfAI 12 хвилин тому

    Hahaha funny guy.. it's like reading a long gpt4 hallucination

  • @lobiqpidol818
    @lobiqpidol818 4 години тому

    🤓 well AksUaLly each embedding vector takes up space on the device. So while you save space by vector quantizing the output embeddings the vocabulary is still limited by GPU space. Also you lose the ability to do some calculations on the output like temperature. Good video

  • @averesenso
    @averesenso 2 дні тому +3

    Your voice is quiet on my speakers

  • @AllExistence
    @AllExistence День тому +1

    You seem to have went a weird route with training. Normally, networks are just trained in plain text first, to learn normal language. Then, they are finetuned with "human/assistant" data to actually answer questions instead of talking to themselves.

    • @8AAFFF
      @8AAFFF  9 годин тому

      yeah thats true
      its just that the higher quality human/assistant dataset was so big that i didnt need to first train on raw text

  • @idf3da
    @idf3da 2 дні тому

    top!

  • @Tenraiden
    @Tenraiden 3 години тому +1

    Speak louder!!