Visualizing transformers and attention | Talk for TNG Big Tech Day '24

Поділитися
Вставка
  • Опубліковано 21 лис 2024

КОМЕНТАРІ • 76

  • @rustyroche1921
    @rustyroche1921 День тому +243

    woah grant getting the gains

    • @hyperadapted
      @hyperadapted День тому +26

      high dimensional vascularity

    • @kellymoses8566
      @kellymoses8566 День тому +7

      I know. Now he is as hot as he is smart!

    • @poke0003
      @poke0003 День тому +9

      -"Swol is the goal, size is the prize!" - 3B1B Loss Function, probably

    • @hyperadapted
      @hyperadapted День тому +2

      @@poke0003Ah I see you are a man of culture as well. Glad to see other Robert Frank Connoisseurs :)

    • @sho3bum
      @sho3bum 19 годин тому +4

      3Curls1Extension Grant Sanderson

  • @PatrickMetzdorf
    @PatrickMetzdorf 11 годин тому +4

    That was easily the best explanation I have ever seen. Way to decrypt some of the most magical-seeming mechanisms of the transformer architecture. Thanks a lot for this!

  • @krishdesai9776
    @krishdesai9776 День тому +71

    Someone's been working out!

    • @F30-Jet
      @F30-Jet День тому

      AI generated😂

  • @magnetsec
    @magnetsec День тому +69

    Grant should team up with Andrej Karpathy. They'd make the best Deep Learning education platform

    • @nbme-answers
      @nbme-answers День тому +12

      They already do make the best deep learning education platform

    • @magnetsec
      @magnetsec 21 годину тому

      @@nbme-answers Yeah but separately

    • @tescOne
      @tescOne 11 годин тому

      Two of the most talented educators on yt. Their two series on neural nets are basically anything a curious person needs to start building their own models. Grant gives you the big picture with immense sensibility and insane visualization. Andrej gives you all the technical details in reasoning, implementation and advanced optimization, with an empathy for your ignorance comparable to Feynman's haha.

    • @aricoleman5802
      @aricoleman5802 7 годин тому

      @@nbme-answerswha is it?

  • @mpperfidy
    @mpperfidy 7 годин тому +1

    Another in a long, long line of excellent educational presentations. If you didn't exist, we'd have to invent you, which would be quite hard. So I'm glad you already exist.

  • @omarnomad
    @omarnomad День тому +33

    38:30 The only reason we use tokenization is due to limited computational resources, *but* not for meaning. We gain efficiency improvements of about ~400% when using BPE for the same budget (1 token ≈ 4 characters).

  • @onicarpeso
    @onicarpeso День тому +20

    I finally see the human behind the great videos I watch!

  • @egoworks5611
    @egoworks5611 15 годин тому

    Such a great way to learn and undertand the intuition behind this work. I sometimes think about the people that started these sorts of works and all the groups of people that thought about the possibility of encoding language and mathematically express it. Comes out that even once you understand this conceptsit is still an outstanding effort and the ideas behind are superb.
    Crazy to think that some people thought about this, had the ambition and actually expected to build a tool. Once you understand it and it is well explained, yes, it might look as not impossible, but you still can see how groundbreaking it was.
    Thanks Grant for taking the time to share this

  • @abhidon0
    @abhidon0 День тому +25

    I guess the main question here is "Is Grant Natty?"

  • @Kvil
    @Kvil День тому +71

    he should be steve in minecraft movie

  • @souvikbhattacharyya2480
    @souvikbhattacharyya2480 2 години тому

    I wouldn't mind "giving a talk" type videos like this from Grant every now and then. I think I would actually prefer this style over the regular one.

  • @learnbydoingwithsteven
    @learnbydoingwithsteven День тому +6

    Grant is in great shape.

  • @murmeldin
    @murmeldin День тому +8

    Just came here from the LLMs for beginners video. Loved the talk, very informative. Keep the great work up, man 👏🏼

  • @tomasg8604
    @tomasg8604 2 години тому +1

    30 to 50% of the brain cortex neurons are devoted to vision or sight, as compared to 8 percent for touch and just 3 percent for hearing.
    That means learning how to see or look and process visual information is at the center of human intelligence.

  • @pufthemajicdragon
    @pufthemajicdragon День тому +1

    That question at the 54 minute mark about analog computing making LLMs more efficient - yes. There are a LOT of smart people experts in the field who are working on exactly that. Maybe a next direction for your continued learning?

  • @rorolonglegs4594
    @rorolonglegs4594 День тому +4

    Great addition to your pre-existing series!

  • @undisclosedmusic4969
    @undisclosedmusic4969 День тому +2

    My left ear thanks you

  • @noorghamm3449
    @noorghamm3449 15 годин тому

    Thank you❤️

  • @__m__e__
    @__m__e__ 21 годину тому +1

    Great talk! Bad questions.

  • @AnnaSayenko-f6s
    @AnnaSayenko-f6s 20 годин тому

    Thanks for the breakdown! A bit off-topic, but I wanted to ask: I have a SafePal wallet with USDT, and I have the seed phrase. (alarm fetch churn bridge exercise tape speak race clerk couch crater letter). How should I go about transferring them to Binance?

  • @jordantylerflores
    @jordantylerflores День тому +4

    As someone who is in the, "wishes he took math more serious" camp; I wish we were given more, ANY, cool examples of what was possible with applied math. Growing up in rural Ohio, the only things that math was pushed for was business/finance and maybe some CS stuff however, it was always abstract Here are some concepts, learn them for the test. Like how many cool things can be done inside of 3D programs such as Blender with just an above-level understanding of geometry.
    I acknowledge my failings in this too, as I did not seek these things out while I was in school. I also might have some age related FOMO lol. Since the things I enjoy doing now, VFX/Blender/CGI, are all things based on concepts I am having to teach/re-learn on my own, as a man who is almost 40.
    Thank you for this, and it is going to take a couple watches for it to sink in haha.

    • @kellymoses8566
      @kellymoses8566 День тому +1

      I agree. Kids would put a lot more effort to learn math if they were shown how incredibly useful math is in real life. Being really good at math is like having a superpower compared to people who are not.

  • @cat_copilot
    @cat_copilot 17 годин тому

    Good job 😃

  • @Izumichan-nw1zo
    @Izumichan-nw1zo 9 годин тому

    Please collaborate with Andrej karpathy and make a huge deep learning platform or at least explain stuff in this format regularly as we not need animations every time ppt or chalk and talk is also fine sir !

  • @zamplify
    @zamplify День тому +41

    3 blew one blown

    • @m41437
      @m41437 День тому +7

      I really hope this has something to do with the video

    • @sblowes
      @sblowes День тому +2

      That’s clever

    • @jaydeep-p
      @jaydeep-p День тому +1

      Nice try Diddy

    • @fakegandhi5577
      @fakegandhi5577 День тому +3

      Oh my god. This is incredible. You're a genius!

    • @jhdsipoulrtv170
      @jhdsipoulrtv170 День тому +3

      This is truly one of the most clever things I have seen a long time

  • @Loveforcricket99
    @Loveforcricket99 12 годин тому

    For a word like ‘bank’ which can have different meanings for different contexts, does the LLM store it as a single vector or it can store multiple vectors for each known variations of the word?

    • @GrantSanderson
      @GrantSanderson  10 годин тому +1

      It’s initially embedded as one vector, but one big point of the attention layer is to allow for context-based updates to that vector

  • @JuliusUnique
    @JuliusUnique День тому

    which word/token is in the middle at 0 0 0 0 0 ... for example for chat gpt 4?

  • @literailly
    @literailly 16 годин тому

    @39:00, Why not make tokens full words?
    (time to read up on byte-pair encoding!)

  • @rifatmithun8948
    @rifatmithun8948 4 години тому

    Your voice seems very familiar. It took me 10 seconds to realize you are the 3b1b.

  • @eugenedsky3264
    @eugenedsky3264 День тому +5

    Grant! We now know what LLMs are, but what about LMMs - Learning Mealy Machines (named so by me)?
    A learning Mealy machine is a finite automaton in which training data stream is remembered by constructing disjunctive normal forms of the output function of the automaton and the transition function between its states. Then those functions are optimized (compressed with losses by logic transformations like De Morgan's Laws, arithmetic rules, instruction loop rolling/unrolling, etc.) into some generalized forms. That introduces random hypotheses into the automaton's functions, so it can be used in inference. The optimizer for automaton's functions may be another AI agent (even Neural Nets), or any heuristic algorithm, which you like.
    Machine instructions would be used to calculate the output function and the transition function of the automaton. At first, as the automaton tries some action and receives a reaction, corresponding terms of those functions are constructed in plain "mov"s and "cmp"s with "jmp"s (suppose x86 ISA here). Then machine instructions of all actions-reactions are optimized by arithmetic rules, loop rolling and unrolling, etc, so the size of the program is reduced. That optimization may include some hypotheses about "Don't Care" values of the functions too, which will be corrected in future passes, if they turn out to be wrong...
    Imagine that code running on something like Thomas Sohmers' Neo processor, or Sunway SW26010, or Graphcore Colossus MK2 GC200.
    One kind of transformation they seem often forget is "a loop rolling" (not just un-rolling). I.e. making an instruction loop ("for x in range a..b" statement) from a long repetitive sequence of instructions.
    ...Kudos for Bodybuilding!

  • @AzharAli-n5c
    @AzharAli-n5c 23 години тому

    great

  • @rohan_gupta
    @rohan_gupta День тому +3

    So good

  • @ashukun
    @ashukun 22 години тому

    let's go

  • @debyton
    @debyton 21 годину тому +1

    Choosing the next word, by any name, is thinking.

    •  18 годин тому +1

      Agreed. Except here we are not talking about "choosing". We are talking about "calculating the probability that a specific word belongs there". An this is (mainly) math.

  • @vit3060
    @vit3060 День тому +1

    It could be nice to see more about KAN approach which is very promising.

  • @PaperTigerLive
    @PaperTigerLive День тому +4

    nooo you were in munich and didn't tell us :((((

  • @no1science
    @no1science День тому +2

    amazing

  • @oncedidactic
    @oncedidactic 6 годин тому

    Another roof video!? Oh…

  • @salchipapa5843
    @salchipapa5843 День тому +1

    I graduated with a degree in electrical engineering back in '07. I did not understand most of anything that was talked about in this video.

  • @BenjaHernandezMemm
    @BenjaHernandezMemm День тому

    im really proud of being alive at the same time as you

  • @DakshPuniadpga
    @DakshPuniadpga День тому +6

    Great Speech

    • @raideno56
      @raideno56 День тому +6

      Video came out like 10 minutes ago and it is 50 mins long

    • @volodyadykun6490
      @volodyadykun6490 День тому

      ​@@raideno56 That's what's so great about it, very big

    • @jordantylerflores
      @jordantylerflores День тому

      @@raideno56 watched it on 5x speed lol

    • @erwinschulhoff4464
      @erwinschulhoff4464 День тому +1

      @@jordantylerflores did you have subway surfers on the side as well?

  • @jasonandrewismail2029
    @jasonandrewismail2029 День тому +1

    GRANT IS IT NOT BASICALLY DOE IN STATISTICS. KIND REGARDS JASON

  • @Trtko-y2p
    @Trtko-y2p 22 години тому

    you're smiling like you're microdosing LSD or something

  • @johnchessant3012
    @johnchessant3012 День тому +2

    hi

  • @geekyprogrammer4831
    @geekyprogrammer4831 День тому +2

    Second!

  • @seatyourself7082
    @seatyourself7082 День тому +1

    First! (of commenting after watching the whole thing)

  • @volodyadykun6490
    @volodyadykun6490 День тому +1

    Why would he explain cartoon to them?

  • @dadsonworldwide3238
    @dadsonworldwide3238 День тому

    Great stuff,
    Yet a generalized go/nogo theory or reference in space doesn't undoubtedly build an assimilated seed of deterministic responsibility for our mixed multitude to simulate strong indentefiers and compute the modern world that would be a sir on the opposite side of the eqauvalance principle to einstein lol
    Great thinker in renormalization overly extended and everyone is ready for over delayed era of optimization. We got nuked and detoured this quest but its great to be back on oar with goals of multiple genrations that was so rudely interrupted by the world

  • @MichealScott24
    @MichealScott24 День тому +1

    ❤🫡

  • @oraz.
    @oraz. День тому

    Why do people make that mouth smacking sound whenever they start a sentence