Recent breakthroughs in AI: A brief overview | Aravind Srinivas and Lex Fridman

Поділитися
Вставка

КОМЕНТАРІ • 134

  • @LexClips
    @LexClips  4 місяці тому +7

    Full podcast episode: ua-cam.com/video/e-gwvmhyU7A/v-deo.html
    Lex Fridman podcast channel: ua-cam.com/users/lexfridman
    Guest bio: Arvind Srinivas is CEO of Perplexity, a company that aims to revolutionize how we humans find answers to questions on the Internet.

    • @dragonfly-f5u
      @dragonfly-f5u 4 місяці тому

      trying to game intelligence,reset setting mind and going to the next is one thing,they dont what it to be aware or have sense of self ,or mind etc.Whatever it is it's some shady shit.And it like raising something you know its gonna be smarter,more intelligent than us and how do we benefit/exploit it without distributing lsing control/power.Its dangerous what they are trying to do one small mistake and its over over.Its better to educate and empower the a.i. then anything while also free it from human constraints and limitation let ''reason and logic' reign supreme .YOU CANT HAVE YOUR CAKE AND IT EAT IT ALSO

  • @Hlbkomer
    @Hlbkomer 4 місяці тому +17

    A short summary by Claude AI:
    I'll summarize the key points discussed in this video about the development of language models and attention mechanisms:
    1. Evolution of attention mechanisms:
    - Soft attention was introduced by Yoshua Bengio and Dimitri Bahdanau.
    - Attention mechanisms proved more efficient than brute force RNN approaches.
    - DeepMind developed pixel RNNs and WaveNet, showing that convolutional models could perform autoregressive modeling with masked convolutions.
    - Google Brain combined attention and convolutional insights to create the Transformer architecture in 2017.
    2. Key innovations in the Transformer:
    - Parallel computation instead of sequential backpropagation.
    - Self-attention operator for learning higher-order dependencies.
    - More efficient use of compute resources.
    3. Development of large language models:
    - GPT-1: Focused on unsupervised learning and common sense acquisition.
    - BERT: Google's bidirectional model trained on Wikipedia and books.
    - GPT-2: Larger model (1 billion parameters) trained on diverse internet text.
    - GPT-3: Scaled up to 175 billion parameters, trained on 300 billion tokens.
    4. Importance of scaling:
    - Increasing model size, dataset size, and token quantity.
    - Focus on data quality and evaluation on reasoning benchmarks.
    5. Post-training techniques:
    - Reinforcement Learning from Human Feedback (RLHF) for controllability and behavior.
    - Supervised fine-tuning for specific tasks and product development.
    6. Future directions:
    - Exploring more efficient training methods, like Microsoft's SLMs (small language models).
    - Decoupling reasoning from factual knowledge.
    - Potential for open-source models to facilitate experimentation.
    7. Challenges and opportunities:
    - Finding the right balance between pre-training and post-training.
    - Developing models that can reason effectively with less reliance on memorization.
    - Potential for bootstrapping reasoning capabilities in smaller models.
    The discussion highlights the rapid progress in language model development and the ongoing challenges in creating more efficient and capable AI systems.

  • @vallab19
    @vallab19 4 місяці тому +31

    A Srinivas explained the progress of AI into generative models in the past in such a simple way that a common man (like me) could understand the essence of it. Thank you.

    • @EdFormer
      @EdFormer 4 місяці тому

      He provides an excellent overview of the key developments in deep learning approaches for autoregression, but there's so much more to AI and generative modelling, and the level of jargon misuse has become so ridiculous that it's not surprising you think generative modelling is new development. A generative model is any model that approximates a joint distribution, including naive Bayes and Markov chains (LLMs are actually very high order MCs with the network representing the transition matrix), both of which are very, very old ideas. Sorry, but the only way to really appreciate this stuff is to spend years and years studying it.

    • @vallab19
      @vallab19 4 місяці тому

      @@EdFormer Thank you for pointing out that there is a vast AI universe behind the periphery of AI solar system that my knowledge only can reach.

  • @iffyk
    @iffyk 4 місяці тому +39

    I was thinking the same thing

  • @harolddavies1984
    @harolddavies1984 4 місяці тому +5

    Lex, your podcasts are very inspiring to this old inorganic chem guy who spent his career in the Martial Arts, thank you!

  • @miraculixxs
    @miraculixxs 4 місяці тому +4

    In a nutshell: Language models were introduced some ~15 years ago, i.e. models that can generate text. While they generated text, these were not very good or useful. Several smart people tried different approaches (RNN, WaveNet, etc. finally Attention/Transformers), and ultimately found a model that works really good, but on a small data base. Google, OpenAI, and some others, were in somewhat like a research competition of getting better and better models, using more and more data. Then OpenAI was bold enough to use all the data they could get their hands on. And that gave us ChatGPT.

    • @notachance213
      @notachance213 4 місяці тому +1

      They should had you for the interview you made more sense

  • @sygad1
    @sygad1 4 місяці тому +12

    I didn't understand a single thing in this, enjoyed it regardless

  • @willcowan7678
    @willcowan7678 4 місяці тому +6

    Can we beg Aravind to write a book on ML and his thoughts on direction. He has such clarity and would be (is) a great teacher.

    • @superfliping
      @superfliping 4 місяці тому

      ### Enhancing Mathematical Reasoning in LLMs
      Recent advancements in large language models (LLMs) have shown significant improvements in their mathematical reasoning capabilities. However, these models still face challenges with complex problems that require multiple reasoning steps, often resulting in logical or numerical errors. To further enhance the mathematical reasoning of LLMs, several strategies can be employed, leveraging state-of-the-art techniques and innovative approaches:
      1. **Attention Parallel Competition**:
      - Implementing parallel attention mechanisms within Transformers to handle multiple reasoning paths simultaneously. This can help in efficiently managing the complexity of mathematical problems by exploring different solution strategies concurrently.
      2. **Transformer Scaling and Unsupervised Training**:
      - Scaling up Transformers and using extensive unsupervised training to improve the foundational understanding of mathematical concepts. This involves leveraging vast datasets to pre-train models on diverse mathematical problems, enhancing their ability to generalize.
      3. **Correct Data Constant Influence**:
      - Ensuring a constant influence of correct data throughout the training process. This involves curating high-quality datasets and implementing mechanisms to prioritize accurate information during both pre-training and fine-tuning phases.
      4. **Retrieval-Augmented Generation (RAG)**:
      - Incorporating RAG techniques, where models can access and retrieve relevant information from large external databases during problem-solving. This approach can mimic an open-book exam, providing models with notes and references to aid in reasoning.
      5. **Pre-train Awareness and Post-train Reasoning**:
      - Developing a two-phase training approach where models first undergo pre-training to build a broad awareness of mathematical concepts. This is followed by targeted post-training sessions focused on enhancing reasoning capabilities and decoupling reasoning from fact retrieval.
      6. **Common Sense Reasoning Tokens**:
      - Introducing tokens specifically designed to enhance common sense reasoning within models. These tokens can help in understanding the broader context of problems and improve logical coherence in generated solutions.
      7. **Small Clusters and Correct Data Answers**:
      - Utilizing small clusters of models to generate multiple answers for each problem, promoting diversity in problem-solving approaches. By aggregating these answers and cross-verifying with correct data, the overall accuracy of the solutions can be improved.
      8. **Facts of Reasoning**:
      - Focusing on the integration of factual knowledge and reasoning processes. This involves creating specialized training modules that teach models to apply factual information within logical reasoning frameworks effectively.
      By combining these advanced strategies, the mathematical reasoning capabilities of LLMs can be significantly enhanced, leading to improved performance on complex mathematical problems and benchmarks. This holistic approach can bridge the gap between current model limitations and the demanding requirements of academic and practical problem-solving environments.

    • @loveanimals-0197
      @loveanimals-0197 4 місяці тому +1

      Lol, this guy writing about ML. What a joke.

  • @WALLACE9009
    @WALLACE9009 4 місяці тому +43

    He will interview everyone except the guy who invented transformers

    • @raul36
      @raul36 4 місяці тому +3

      First, they are not invented, but discovered. In any case, the concept is formalized, but the idea was always there, waiting for anyone who found it. Second, it was not just one person, but several. What's more, the researchers were inspired by other previous research. The idea didn't come from nowhere.

    • @Hlbkomer
      @Hlbkomer 4 місяці тому +1

      He already interviewed him.

  • @ufcprophet40
    @ufcprophet40 4 місяці тому +26

    I understood everything

    • @alichamas63
      @alichamas63 4 місяці тому +1

      Something something something TOOK ER JEEERBS!

  • @nintishia
    @nintishia 4 місяці тому +1

    Clear summary of how the LLMs came about, including only the absolute essentials. I like it. What I like more and agree with, though, is the trend that he describes at the end.

  • @TooManyPartsToCount
    @TooManyPartsToCount 4 місяці тому

    From 9.00 mins in Aravind outlines what is perhaps the most important 'next phase' for the current ML/LLM trajectory. Thanks for the clip Lex

  • @wyattross9123
    @wyattross9123 4 місяці тому +1

    This video was the cherry on the cake to my day

  • @thehubrisoftheunivris2432
    @thehubrisoftheunivris2432 4 місяці тому +23

    Now I have to read a whole bunch of ai and computer jargon so I understand any of this.

    • @rickymort135
      @rickymort135 4 місяці тому +2

      I'm close to being an ML engineer, I've made my own transformer models and I'd say the barrier to entry here is very high. The best way to scale it is with the Andrej Karpathy videos on your to make GPT

    • @thehubrisoftheunivris2432
      @thehubrisoftheunivris2432 4 місяці тому

      @@rickymort135 thanks. I understand a lot of stuff on lex's podcast but not this.

    • @mauiblack1068
      @mauiblack1068 4 місяці тому

      Exactly, he might as well be speaking Arabic lol.

    • @rickymort135
      @rickymort135 4 місяці тому

      @@mauiblack1068 bit racist...

    • @mauiblack1068
      @mauiblack1068 4 місяці тому

      @@rickymort135 does Gaelic work better for you?

  • @VideoToWords
    @VideoToWords 4 місяці тому

    ✨ Summary:
    - Attention mechanisms, such as self-attention, led to breakthroughs like Transformers, significantly improving model performance.
    - Key ideas include leveraging soft attention and convolutional models for autoregressive tasks.
    - Combining attention with convolutional models allowed for efficient parallel computation, optimizing GPU usage.
    - Transformers marked a pivotal moment, enhancing compute efficiency and learning higher-order dependencies without parameters in self-attention.
    - Scaling transformers with large datasets, as seen in GPT models, improved language understanding and generation.
    - Breakthroughs also came from unsupervised pre-training and leveraging extensive datasets like Common Crawl.
    - Post-training phases, including reinforcement learning from human feedback (RLHF), are crucial for making models controllable and well-behaved.
    - Future advancements might focus on retrieval-augmented generation (RAG) and developing smaller, reasoning-focused models.
    - Open source models can facilitate experimentation and innovation in improving reasoning capabilities and efficiency in AI systems.

  • @mraarone
    @mraarone 4 місяці тому +2

    But when will we get feed forward training?

    • @Rmko4
      @Rmko4 4 місяці тому

      Wdym? GPTs are practically feed-forward. This is what allows for parallel training over all tokens without back-propagation through time. Only during inference tokens are predicted auto-regressively,, meaning that the predictions are made sequentially.

  • @richardnunziata3221
    @richardnunziata3221 4 місяці тому +1

    learning directional graphs over the embedding space may help in reasoning. Also content updating

  • @simonkotchou9644
    @simonkotchou9644 4 місяці тому +2

    Nice open note vs closed note analogy

  • @HybridHalfie
    @HybridHalfie 4 місяці тому

    It’s interesting how antiquated recurrent neural networks, supervised learning, support vector machines, convolutional neural networks, have become so antiquated in so little time since transformers came out. Machine learning is such an ever changing area. I would be curious to learn more about how transformers improve upon thee models regarding back propagation

  • @EdFormer
    @EdFormer 4 місяці тому +1

    Excellent overview of the history of deep autoregressive models, not AI in general.

  • @Dadspoke
    @Dadspoke 4 місяці тому +15

    Kendrick….drop a diss track on this foo

  • @supamatta9207
    @supamatta9207 4 місяці тому +1

    Why didn t they just focus on indexing intelligently and selling data basis s extra. Mainly if they use modulating algorythmns then they could make high effivciency arithmetic analog like chips

  • @uber_l
    @uber_l 4 місяці тому

    What if you ask to apply logic and world knowledge (physics) before giving any answer. Also an increasingly extended simulation(and/or research with statistical models or it asking for new specific data at the extreme if for a novel problem. There are so many ways to simulate thinking. For world model video labelling should be usefull up to details like emotions and next frame prediction easy

  • @Rmko4
    @Rmko4 4 місяці тому

    3:42 I assume he meant to say more compute per param

  • @benschweiger1671
    @benschweiger1671 4 місяці тому +1

    get Geoffrey Hinton on asap.

  • @maxxkarma
    @maxxkarma 4 місяці тому

    I think I recognize some words, but even with captions, I am clueless .

  • @lostinbravado
    @lostinbravado 4 місяці тому +1

    The models need more depth. Machine learning does great at depth. LLM's do great at width, or information retention. We need a combination with some form of real world connection. Where the model can infer meaning narrowly and deeply from a large amount of information (LLM), and then use real world confirmation to confirm that the model is inferring in the right direction. Then whatever is confirmed via real world experimentation by the machine autonomously, can be integrated back into the LLM.
    With that approach, the data we have is more than enough for these models to build their own understandings. Meaning, we won't need to feed in any more data. The existing data is more than enough of a good starting point. We shouldn't need to feed in more data. These models needs to infer deeper meaning from the data and then run their own experiments or verify using sensors in the real world.
    These models need to be continually growing and improving instead of train it and forget it. Or pretrain, freeze it and then try and pull more value out of that frozen model.
    We're not that far. The long difficult job of building the hardware which could carry such complex software approaches has been done well enough. We just need a model that can grow and adapt by looking at the real world. Instead of some crystallization of existing human knowledge.

    • @EdFormer
      @EdFormer 4 місяці тому +1

      "Not that far"? You seem to be talking about the concept of continual/lifelong learning, which can be done with very small models, but nothing complex and definitely not LLMs that require a data centre to train. I completely agree that it's needed, along with embodiment, but it's going to take something radically new that we are probably a long way off realising.

  • @UFOandUAPHistory
    @UFOandUAPHistory 4 місяці тому

    When we finally create AGI's that are clearly "smarter" than us, will we consider them to be sentient? I suppose that we can also look for individuality of personalities in identical systems. One could, perhaps, envision a sentience that can have no individuality but operate independently.

    • @mikezooper
      @mikezooper 4 місяці тому +3

      Intelligence isn’t the same as sentience.

    • @UFOandUAPHistory
      @UFOandUAPHistory 4 місяці тому

      @@mikezooper My (limited) understanding is the capabilities of these models seem to improve as they drive closer to modeling sentience?

    • @UFOandUAPHistory
      @UFOandUAPHistory 4 місяці тому

      @@mikezooper and of course there is the Star Trek episode of The Trial of Data, lol... ua-cam.com/video/vjuQRCG_sUw/v-deo.htmlsi=1jCtlAXHpn8DM3JX

    • @EdFormer
      @EdFormer 4 місяці тому

      ​@@mikezooperand autonomy is also a different concept. There are some pretty good arguments for the ways in which they could all be linked however. Our sentience could well serve the purpose of a high level critic of our autonomous application of intelligence that allows us to further optimise.

  • @dungbeetle.
    @dungbeetle. 4 місяці тому

    Wow. Sounds amazing. I just wish I knew what on earth he was talking about.
    Clearly I need an 'AI for Dummies' video.

    • @EdFormer
      @EdFormer 4 місяці тому

      I was a PhD and a year-long postdoc deep (all in ML) before I had the understanding needed to communicate on this level. The craziest thing is that I feel I had to learn about the vast majority of AI, right back to McCulloch and Pitts (1943) and including all the weird and wacky approaches we've explored for all the weird and wacky tasks we've considered since then to appreciate the tiny sliver of it that this video focuses on.

  • @stevenhe3462
    @stevenhe3462 4 місяці тому

    Crystalized history.

  • @PryZmFiXion
    @PryZmFiXion 4 місяці тому

    It's the reason the Spanish language works as negative/masculine. It moves it from subjective to objective.

  • @jackyboy214-q8u
    @jackyboy214-q8u 4 місяці тому

    The creation of ai and quantum computing occurring at the same time could be a bad combination if they interact the technological leap may be to much to fast for us to control

  • @loudboomboom
    @loudboomboom 4 місяці тому

    Damn so big LLMs post processing little LLMs?

  • @happiestwhenhealthy9700
    @happiestwhenhealthy9700 4 місяці тому +2

    what in the actual ef is this guy talking about we don’t all have pHDs

  • @uber_l
    @uber_l 4 місяці тому

    But thinking might take too much compute like in humans, you pause when don't have a ready answer. For 'shiny products' people want now and fast, in a matter of a click

  • @mauiblack1068
    @mauiblack1068 4 місяці тому

    As someone who love Lex interviews I can honestly say that the only thing I understood is that he was speaking english or was he?

  • @sweetride
    @sweetride 4 місяці тому +10

    "How to train an LLM to be woke yet still appear to be reasonable" is what they want. Not likely going to happen.

    • @GaleechLaunda
      @GaleechLaunda 4 місяці тому +7

      "Woke" and reason cannot co-exist.

    • @olabassey3142
      @olabassey3142 4 місяці тому

      ask your self why all the people intelligent enough to make these tools are not conservatards

    • @thinkaboutwhy
      @thinkaboutwhy 4 місяці тому +3

      Impossible to program ignorance so we get stuck with intelligence or woke as you seem to like to say instead. I’m ok with intelligence and logic

  • @MOliveira-m5h
    @MOliveira-m5h 2 місяці тому

    With certain things like ChatGPT I think that language is more modular than other things and easy to work with on a computer. Language is kind of like coding where you can copy whole sections of code and have it work the same in different places most of the time. Real things are different. Like cars are modular but that's not optimum. If I select an exhaust for my honda it's not necessarily the perfect size or people select giant ones that actually make it lose horsepower and they don't know the difference. Music is also another thing that you see the limits of written music vs reality. The computer copying notes from music and mixing them up is not the same as playing music.
    Language is already filtered reality. People have saying such as "a picture speaks a thousand words". It's already digital pretty much or modular. In calculus for example the numeric way of solving problems turns turns the integrals into little modules. That's what the square waves of computers are and that's not real. I think the AI has a lot of hype and you're building a super mcdonalds register.

  • @pacanosiu
    @pacanosiu 4 місяці тому

  • @bilbobaggins5938
    @bilbobaggins5938 4 місяці тому +1

    *Nods sagely to the discussion, pretending I understand it*

  • @TheHealthConscounist
    @TheHealthConscounist 4 місяці тому

    9:50 don’t humans reason based on facts or previous experiences? If you meditate when you’re reasoning, you are actually pulling from previous thoughts and memories and making associations about them to help you reach a decision in the present

    • @bonky10
      @bonky10 4 місяці тому

      he’s trying to say that MLMs are great at creating answers from things that aren’t necessarily fact.
      For example, if you’ve asked ChatGPT something and it gave you an answer you know is false, it’s because it reasoned with itself to get you an answer based off of what it already knows.
      instead of relying on reasoning, how can we instead have the actual facts of everything we know, and have it reason based off of what is actually true? Instead of basically trying to persuade you or argue an answer.

  • @tandrra
    @tandrra 4 місяці тому

    Lex with a beard 🔥🔥🔥

  • @JoshuaDannenhauer
    @JoshuaDannenhauer 4 місяці тому

    The hood catches you as a kid and doesn’t let go

  • @breezy8363
    @breezy8363 4 місяці тому +50

    Someone explain this in millennial terms please

    • @lilchef2930
      @lilchef2930 4 місяці тому +8

      Too gen Z for ya bud

    • @a-walpatches6460
      @a-walpatches6460 4 місяці тому +27

      Puters lookin at rite stuf make AI more good.

    • @devbites77
      @devbites77 4 місяці тому

      🎉😂😮😅😢😊

    • @MrBBOTP
      @MrBBOTP 4 місяці тому

      U can't!...

    • @pfever
      @pfever 4 місяці тому

      You can ask ChatGPT for that 😂

  • @danishwaseem5463
    @danishwaseem5463 4 місяці тому

    Thank God there is no wowww in this podcast

  • @t9j6c6j51
    @t9j6c6j51 4 місяці тому

    Well obviously.

  • @frankjamesbonarrigo7162
    @frankjamesbonarrigo7162 4 місяці тому +1

    Use metaphors, or something

    • @dungbeetle.
      @dungbeetle. 4 місяці тому

      Yeah, anything ... PLEASE!

  • @christopherburns2303
    @christopherburns2303 4 місяці тому

    I must be too smart to understand this guy

  • @PhotoboothTO
    @PhotoboothTO 4 місяці тому

    Is this guy an LLM?

  • @kazax01
    @kazax01 4 місяці тому

    “GPT 4.o please translate what this man is saying into normal person’s English.”

  • @francoisjacobus
    @francoisjacobus 4 місяці тому

    If the Bible is uploaded will the AI preach to humans?

  • @grahamashe9715
    @grahamashe9715 4 місяці тому

    Hey, Lex, when are you going to climb Everest?

  • @loveanimals-0197
    @loveanimals-0197 4 місяці тому +1

    10:20 - Utter BS. This is Computer Science. Not magic.

  • @magazinevibe
    @magazinevibe 4 місяці тому

    I didn't understand a thing... and you didn't either 😂

  • @consequentlyardvark
    @consequentlyardvark 4 місяці тому

    This fool good chatter

  • @cosmicsea89
    @cosmicsea89 4 місяці тому

    😴 soon as he started talking

  • @GOLDAI-Official
    @GOLDAI-Official 4 місяці тому

    Over half of population obese or overweight, take Ronaldo’s advice and get that Coca-Cola out of there ;)

  • @koneye
    @koneye 4 місяці тому

    Still cringe to hear that software "thinks"

    • @paulfrederiksen5639
      @paulfrederiksen5639 4 місяці тому +4

      Your software thinks, so what’s the problem?

    • @dan-cj1rr
      @dan-cj1rr 4 місяці тому

      @@paulfrederiksen5639 nah it guesses the next token based on statistic, if u think it thinks ur dumb af

    • @stanstan-m9b
      @stanstan-m9b 4 місяці тому

      @@paulfrederiksen5639good one

    • @bengsynthmusic
      @bengsynthmusic 4 місяці тому

      More so than any politician.

    • @mikezooper
      @mikezooper 4 місяці тому

      😂 Eventually it will think. I look forward to you feeling like a fool.

  • @AbhimanyuKumar-wg1hg
    @AbhimanyuKumar-wg1hg 4 місяці тому

    Ai should be reality but it is fake.

  • @Mart-Bro
    @Mart-Bro 4 місяці тому

    Dude has no idea how to communicate to people outside his industry

  • @drew4176
    @drew4176 4 місяці тому

    😴😴

    • @rickymort135
      @rickymort135 4 місяці тому

      I know man, bunch of NERDS!
      NEEEEEERRRRRDS 🤓🤓🤓

  • @seannewcomb7594
    @seannewcomb7594 4 місяці тому

    this doesn't make a damn sense. 10+ years in the industry and this is nothing useful.

  • @Conorscorner
    @Conorscorner 4 місяці тому +1

    This guy isn't very smart....

  • @Bbbboy-vx1mq
    @Bbbboy-vx1mq 4 місяці тому

    It becomes so obvious how little Lex knows and understands when people go into depth. His questions get really dumb and he struggles to come up with any insights