Sparks of AGI? - Analyzing GPT-4 and the latest GPT/LLM Models

Поділитися
Вставка
  • Опубліковано 27 чер 2024
  • An in-depth look into the current state of the art of Generative Pre-trained Transformer (GPT) language models, with a specific focus on the advancements and examples provided by OpenAI in their GPT4 Technical Report (arxiv.org/abs/2303.08774) as well as the Microsoft "Sparks of AGI" Paper (arxiv.org/abs/2303.12712).
    Neural Networks from Scratch book: nnfs.io
    Channel membership: / @sentdex
    Discord: / discord
    Reddit: / sentdex
    Support the content: pythonprogramming.net/support...
    Twitter: / sentdex
    Instagram: / sentdex
    Facebook: / pythonprogramming.net
    Twitch: / sentdex
    Contents:
    00:00 - Introduction
    01:31 - Multi-Modal/imagery input
    05:44 - Predictable scaling
    08:15 - Performance on exams
    15:07 - Rule-Based Reward Models (RBRMs)
    17:53 - Spatial Awareness of non-vision GPT-4
    20:38 - Non-multimodel vision ability
    21:27 - Programming
    25:07 - Theory of Mind
    29:34 - Music and Math
    30:44 - Challenges w/ Planning
    33:25 - Hallucinations
    35:04 - Risks
    38:01 - Biases
    44:55 - Privacy
    48:23 - Generative Models used in Training/Evals
    51:36 - Acceleration
    57:07 - AGI

КОМЕНТАРІ • 204

  • @TonyTheTrain
    @TonyTheTrain Рік тому +59

    It's so cool to watch this video and think that you've been talking about this stuff for years and now the rest of world finally sat up and paid attention. I wonder if GPT 3 & 4 just hit a tipping point where the output was good enough to be fed into other systems and make something out of it for the average tech-enthusiast.

  • @MaJetiGizzle
    @MaJetiGizzle Рік тому +69

    The most realistic non-hype based breakdown of these developments in LLMs I’ve heard thus far.
    Great video as always sentdex!

    • @sentdex
      @sentdex  Рік тому +2

      Thanks!

    • @genegray9895
      @genegray9895 Рік тому

      Which do you see more of - people underestimating the technology, or overestimating it?

    • @MaJetiGizzle
      @MaJetiGizzle Рік тому +1

      @@genegray9895 Yes.

    • @sentdex
      @sentdex  Рік тому +4

      @@genegray9895 I think it's hard to see people who are underestimating it, but if I had to guess the underestimators/people who just don't know about or care to use this tech when it would probably be useful is likely many times larger than the people over-hyping.

    • @genegray9895
      @genegray9895 Рік тому +1

      @@sentdex I'm still seeing a lot of researchers fall for the same traps they did with earlier language models: that the areas of weakness today are somehow permanent limitations of the architecture, rather than aspects of the current model scale and training schema. That said, humility is the theme this year, and I think that's exactly the right theme as we're facing a technology we don't understand and did not expect. So far, mechanistic interpretability is strongly pointing to internal world models as the mechanism behind LLM behaviors, so I think we should pay close attention to what we discover with those techniques over the coming months. With an open mind...

  • @mattizzle81
    @mattizzle81 Рік тому +126

    I'm surprised that you don't find a major difference between GPT-3.5 vs GPT 4 for programming. My experience is quite different, to the point where I use GPT-4 exclusively despite the slowness and expense. I quickly get frustrated with 3.5 whereas I usually find GPT-4 to be almost perfect for all but the most complex things I ask of it

    • @sentdex
      @sentdex  Рік тому +12

      Might I ask when general subjects/contexts you tend to program? Web dev/data science...etc? Also what packages/libs you tend to use?

    • @hendazzler
      @hendazzler Рік тому +18

      @@sentdex In a number of areas I've found it to be much better. One is godot programming (a game engine). And the other of course is python. ChatGpt4 can take a block of unoptimised python code and easily convert it into a numpy version. It just feels so much better and makes so many less mistakes than the previous version.
      Another useful thing is when you try to modify existing code, chatgpt4 knows to omit some of the existing code. Whereas chatgpt3.5 would always try to regurgitate all of the original code plus the new stuff. This is obviously an issue because of context size.

    • @SunnyNagam
      @SunnyNagam Рік тому +8

      I also find it's better at pasting frontend web components and asking for changes and new features to be implemented, makes fewer errors

    • @hidroman1993
      @hidroman1993 Рік тому +21

      I completely agree with you, GPT-4 is on another level, while most of the time GPT-3.5 hallucinates functions, parameters, packages

    • @roccococolombo2044
      @roccococolombo2044 Рік тому +18

      I agree, GPT-4 is way better than 3.5 at (python) programming

  • @omnijack
    @omnijack Рік тому +7

    Thanks as always for in-depth coverage of this. And for making the point re: "It isn't AGI until it does all [relevant] things together" (vs in isolated examples)

  • @TheEmT33
    @TheEmT33 Рік тому +6

    totally agree with your points about leakage and data compression. We need to have more discussions like this.

  • @cacogenicist
    @cacogenicist Рік тому +10

    I think by older definitions of "AGI," talking about "sparks" of AGI in these systems is not unreasonable at all. I used to mean a system that was human-like in its breadth, not a "narrow AI." It didn't used to necessarily mean a super-human system, or a system that could do _everything_ as well as all humans. I think if you took 3.5 or 4 back to 2006, and showed it to AI enthusiasts of the time, it would widely be considered AGI-ish.

    • @banksuvladimir
      @banksuvladimir 11 місяців тому

      It doesn’t matter what they would’ve thought at the time. If you showed someone in the 1950’s a computer playing chess they would think it was AGI

  • @Christian-op1ss
    @Christian-op1ss Рік тому +6

    Hi @sentdex, I found 4.0 much better at coding problems than 3.5. I use both for coding extensively. Some differences I found:
    - 4.0 hallucinates a lot less
    - Related, 4.0 often told me something is not possible while 3.5 writes gibberish
    - 4.0's ability to take in large texts allows you to just paste in an API, and then it gets pretty much perfect at code (coding tip for working with it)
    - 3.5 simply makes more coding mistakes. I usually start with 3.5 since it is faster, then when I get errors I transfer the problem to 4.0, which then often avoids those same errors
    - 4.0 is a lot more nuanced in its answers, and less generalistic
    However, if there is a LOT of examples online already for what you are doing with 3.5, then the benefit of using 4.0 goes way down. It really excels at going beyond the obvious.
    PS: reading your book!

  • @botzlittle
    @botzlittle Рік тому +3

    Hi Sentdex, I've been following you since 4 years ago. You helped me get into machine learning and deep learning with zero programming / computer science experience. Lately, I noticed that your contents have evolved (not so much hands-on coding) to more discussions and your viewpoints. I really like them! I feel that you can capture more audience if you upload your contents like this on podcasts, so that people like me can listen to your contents on the go, while exercising or while traveling. Thanks! Keep up the great work!

  • @luigohuerta45
    @luigohuerta45 Рік тому +1

    Completely agree on the point raised regarding the Microsoft paper not being entirely scientific but having a pinch of clever marketing on it to raise the perception of light-speed progress from GPT-3 to GPT-4.

  • @mkrichey1
    @mkrichey1 Рік тому

    A very detailed and well thought out summary of a very hyped and complex topic, thank you :)

  • @cecureSammich
    @cecureSammich Рік тому +15

    I agree entirely with what you're expressing regarding Microsoft, and a few other entities, having a role to play as keepers of the safeguard - some great insight you've shown here with this. I'm really enjoying the content you've put out recently - how you've taken more of an informative/professional thought-provoking approach with the topics. It really sets the example that we need today in having an educated and openly mindful consideration of where these ideas are heading in the near future!🎉❤

    • @default3740
      @default3740 Рік тому +1

      how did you comment before video release?

  • @antoniozhang6055
    @antoniozhang6055 Рік тому +1

    Best of the best unbiased analysis video in gpt. Thank you!

  • @alexandermedina4950
    @alexandermedina4950 Рік тому +1

    This is a great video, these topics are very deep, and you gave a nuanced take on it, thank you.

  • @LunkvanTrunk
    @LunkvanTrunk Рік тому

    thx for making such videos, it's very informative and I get updated on the current state. Thank you!

  • @SpaghettiRealm
    @SpaghettiRealm Рік тому

    Great video as always Harrison! Thank you

  • @vazox3
    @vazox3 Рік тому

    Man I love this in depth reality check! Thanks for this video!

  • @paxdriver
    @paxdriver Рік тому

    Love your channel. Love your book. Love your work, I can't thank you enough.

  • @wktodd
    @wktodd Рік тому +3

    You need write a follow up book , explaining structure of LLMs and GPTs etc.

  • @aa-xn5hc
    @aa-xn5hc Рік тому

    Great and looking forward to your next video on open assistant

  • @cacogenicist
    @cacogenicist Рік тому +6

    As for math, Wolfram Alpha makes a fine math module. The general purpose leaning, core LLM doesn't have to do everything in a cognitive architecture -- which is the direction of things, I think -- especially where it can be done faster and more accurately by some expert system component, and then integrated by the LLM.

    • @trenvert123
      @trenvert123 Рік тому +1

      Is that what is best, and do your thoughts reflect reality of what's happening?

  • @jeffwads
    @jeffwads Рік тому +18

    The newly released 30b Open-Assistant model is pretty good. It does quite well on those tests.

    • @electron6825
      @electron6825 Рік тому

      How does it compare to GPT4?

    • @d33w
      @d33w Рік тому +2

      @@electron6825 almost as good as GPT3.5, not there yet when compared to GPT4

    • @genegray9895
      @genegray9895 Рік тому +2

      The 30B LLaMA model is superior to GPT-3 175B but inferior to Chinchilla / Gopher / Flamingo / Sparrow, which are all about on par with the LLaMA 65B model. PaLM 540B is a step up from Chinchilla et al, and GPT-3.5 is superior to PaLM 540B across the board. The OpenAssistant 30B model is very impressive compared to other "grassroots" models we've seen, but it is still a long way away from the state of the art for OpenAI, Anthropic, and Google

    • @paxdriver
      @paxdriver Рік тому +1

      ​@@electron6825 it doesn't have the same number of parameters so it won't be as clear, accurate or versatile in edge cases. But it's opensource, so it will keep growing indefinitely like the Linux kernel has become 90% of all computer systems despite Microsoft and Apple best efforts for 3 decades. Opensource is very powerful in the long run.

  • @markosmuche4193
    @markosmuche4193 Рік тому +1

    I like it. I don't like the term AGI as well. But, these things are very powerful. I am using GPT 4 and it is mind blowing.

  • @LuccDev
    @LuccDev Рік тому

    Thank you for the video and analysis. It's really cool that you take a step back and compare with other models, and underline the flaws of the models. Really refreshing to see as opposed to the usual shills !

  • @ahmedal-qarqaz3510
    @ahmedal-qarqaz3510 Рік тому

    i am always excited to see your take on news AI. And surely enough you did not disappoint.
    I share many of your thoughts and concerns on GPT-4 and open-source AI. I feel like one general takeaway from your video is that we (non OpenAI people) can't draw definitive conclusions on the performance of the model without any information on the datasets they used for training and alignment.
    And as someone who is studying to specialize in this area, a future where AI research is exclusive to big tech is scary to me.

  • @shawnfromportland
    @shawnfromportland Рік тому

    you are the man for this video.

  • @veggiet2009
    @veggiet2009 Рік тому +3

    In my experience i find that my coding through gpt-4 is way better than in gpt-3.5. it feels more like an intelligent assistant that can remember variable naming conventions for longer. Lol

  • @judedavis92
    @judedavis92 Рік тому

    Hello Harrison. Love the video as always, very realistic and informative.
    I was just curious, the machines in the back. Are they servers? Do you train models on them?

  • @ander300
    @ander300 9 місяців тому

    Part 10 of Neural Net from Scratch, about analytical derivatives??? Please bring the series back!

  • @byrnemeister2008
    @byrnemeister2008 Рік тому

    This is an excellent video. Very helpful for people trying to deploy them as part of a software solution. At the top level at least. There is a massive amount of hype as pointed out while this is a very well grounded view. Totally agree we should be looking at these models as tools and look at their integration and application. A lot of the philosophy around what is AGI? and are they conscious? Maybe relevant at some point in the future but not today.

  • @theoutlet9300
    @theoutlet9300 Рік тому +1

    Great video. It was so easy to digest. What do you think about testing/QA of AI models? Seems like no one has any idea how to do it well but is a crucial step that needs to be done before the model is out in the wild.

  • @qzorn4440
    @qzorn4440 Рік тому

    This is like the days of Henry Ford's Model A compared to GPT today. Look out world for new ideas. 🥰 Thank you sentdex.

  • @maciejtatarek2715
    @maciejtatarek2715 Рік тому

    In Lex Friedman podcasts Sam Altman said that he was surprised that the success of chatgpt was bigger than gpt4. He claimed that there is some major improvement that I also didn't understand. Thanks for making this video!

  • @klammer75
    @klammer75 Рік тому +2

    Very thoughtful and even handed review and presentation….well done sir and keep up the good work!🦾

  • @TankorSmash
    @TankorSmash Рік тому +1

    I appreciate the ending there, where you point out the 3.5 vs 4 and how it might be overblown. I didn't think of it that way and I think you're right to criticize them. Maybe there's a good reason for it, or maybe they're deliberately letting the world decide how they feel about it.
    There was a Sam Altman/Lex Friedman podcast where Sam A. talked a lot about limitations and how OpenAI just sees it as a technology, so maybe it's MSFT who's more focused on hyping things up.
    Thanks for putting the video out!

  • @vincentparker6103
    @vincentparker6103 Рік тому +4

    Very insightful post, Sir! The intersection of technology, ethics and policy here are incredibly interesting. God tier display in critical thinking for us all to aspire to. Thank you for the level head and keeping it real!

  • @ozorg
    @ozorg Рік тому

    great stuff!

  • @RipYaZa
    @RipYaZa Рік тому +3

    I see a paradigm shift in the way we work. The ability to use AI models and tools that get developed will accelerate the way we work.

    • @sentdex
      @sentdex  Рік тому +1

      Agree here completely.

  • @DaTruAndi
    @DaTruAndi Рік тому +1

    About the comparison of ChatGPT and GPT-4 or lack there of in the paper - that may be partially owed due to timelines of individual experiments. GPT-4 was in the making for a while and a lot of the tests were done by partially unaligned versions of GPT-4. This may have been partially before GPT 3.5 was launched.

  • @deltabytes
    @deltabytes Рік тому +1

    This an eye opener, especially on that part where Microsoft trying to monopolize the OpenAI for their monetary gains. It is true OpenAI should open source their code for thorough scrutiny.

  • @CorvusAI
    @CorvusAI Рік тому

    I'd love to hear your thoughts on the "Overreliance" section. Also if you dive into the Bar exam section, I believe the test is graded by the paper authors.

  • @youtubeusername1489
    @youtubeusername1489 Рік тому +2

    I think i read somewhere that openai ceo said something along the line of "gpt4 is coming and it is more powerful(or better?) than chatgpt(or gpt3) but you will be disappointed', meaning it is better than chatgpt but not in a way that most people expect. May be he predicted the overhyping, either by the public or Microsoft.

  • @mattpen7966
    @mattpen7966 Рік тому

    great vid

  • @ChaseFreedomMusician
    @ChaseFreedomMusician Рік тому +4

    What I have found for GPT4 is if I am giving it coding tasks that there are no existing similar code where it is abasically having to infer from white papers how it might code something it does WAY better. Example: I used it to create a spiking neural network implementation in C# 3.5 was having a super difficult time with cohesion, GPT4 not as much but also not perfect. The thing neither could do was effectively write code to train an SNN

  • @davidfjendbo56
    @davidfjendbo56 Рік тому

    I really enjoyed this overview of GPT4s capabilities and shortcomings - yet your lightmindedness towards GPT4 being a little closer to AGI than previous versions worries me. I have been following the LessWrong blog (Yudkowsky) and listened to Tegmark on the Lex Fridman podcast talk about the dangers of AGI. I would love to see a video from you with thoughts on some of these dangers where it doesn't feel like you brush over them lightly! :)) thanks for very nice content!

  • @memomii2475
    @memomii2475 Рік тому

    idk i keep hearing on youtube and seeing websites that chatgpt gets things wrong, but when i ask it stuff it never does. even did the linear algebra questions like you did and it got it right.

  • @laboralmail9239
    @laboralmail9239 Рік тому +2

    Hi sentdex. A lot of your followers just want to know if there's going to be a part 10 of your Neuronal Network from scratch series. Are you working on it? Did you lie when you said you'd do a few videos more so you force people to buy your book?

  • @Phasma6969
    @Phasma6969 Рік тому

    It is important to keep in mind that many people are parroting different concepts about AI which are generalised. They are actually relative to the architectural design choices made when building the model and even SPECIFICALLY for the type of architecture such as transformers. It is not totally general or encapsulating, it is relative.

  • @TheEmT33
    @TheEmT33 Рік тому +1

    Agree with your points about making their work public. Their excuses are just ridiculous I don't believe a word of it

  • @calmhorizons
    @calmhorizons Рік тому

    Amazing write-up. The truth is that, for now at least, LLMs are more like alchemy than science - and until OpenAI (or another group) can accurately predict from first principles what these models will do, or share the underlying data and methodologies so we can at least understand their behaviours post-hoc, it never will be science.
    Edit: Also, I don't think this should be considered a science paper - it was actually a press release in the format of a science-like paper.

  • @MrLeonardoibarra
    @MrLeonardoibarra Рік тому

    Awesome review, really precise and sober arguments!
    Although AGI might be a long way in the future the risks from these advancements are quite real already though. Whenever technological revolutions happened in the past it made us (humans) richer and more efficient but also raised the bar significantly when it comes to the minimum requirements in terms of capital and knowledge to be minimally competitive (e.g. mass rural exodus and impoverishment when the last agricultural revolutions arrived).

  • @frun
    @frun Рік тому +1

    I agree 👍 0:47

  • @MAButh
    @MAButh Рік тому

    Very nice analysis. I use ChatGPT for correcting text and translation. I've found that GPT-3.5 is much faster compared to GPT-4. Also, sometimes GPT-4 seems to have a negative attitude when I write articles about GPT and ask it for correction or translation. GPT 4 sometimes ignores my request and instead comments on the text CONTENT itself, saying that "As an AI, I cannot blabla". This behavior can be annoying, and I have to carefully reread the corrected text as sometimes it would even alter the statement in the text about GPT itself. I don't see it as "sparks of consciousness" but rather some sort of manually adjusted behavior by the programming team. All in all, I prefer GPT-3.5 for all language-related work, while I use GPT-4 for complex tasks that require a more differentiated presentation of data (creating list tables, etc.).

  • @omarhatem0
    @omarhatem0 Рік тому

    I'm curious what does the different highlight colors means.

  • @rickevans7941
    @rickevans7941 Рік тому

    PRAGMATIC AF❤❤❤

  • @meg33333
    @meg33333 Рік тому +1

    Hello sir
    I have a question ?
    Is their any project or ML algorthim which convert sentence / data into specific image . We are working on sign language project but we are stuck. We want to convert certain sentence ( like hindi language ) to sign images. Please provide some tips.

  • @cacogenicist
    @cacogenicist Рік тому

    Their linearity (I _think_ that's the issue) can also lead to an inability to parse some sentences featuring recursion, with multiple embedded clauses, plus a possessive -'S at the end of a the noun phrase. For example:
    _It's the man who threw the rock that struck the drone that crashed through Mrs Johnson's window's dog._
    Question: Who possesses the dog?
    It has a hell of a time with that, explaining that there's not enough information to determine who owns the dog. When I subsequently supplied multiple sentences like this:
    _It's the man who threw the rock that struck the drone's dog._
    _It's the man who threw the rock's dog._
    And then asked it again to consider the initial sentence, it apologized for its prior misunderstanding, and got it right. Whereas initially it couldn't even figure out the referent of "it."

  • @sinanisler1
    @sinanisler1 Рік тому

    thinking of building a new pc with 3090 24gig for AI
    do you have any recommendation for other parts ?

  • @Ezechielpitau
    @Ezechielpitau Рік тому +2

    Here's one point that sometimes seems to not get the attention it deserves in my opinion: I've played around with earlier language models once in a while... and ignoring the content, just focusing on the language, they were pretty mediocre. Their English was usually not perfect but pretty decent. But when I checked their German or Spanish, it was usually bad, really bad.
    I'm a bit of a grammar nazi and have not once seen a single grammar or orthographic mistake in German, Spanish or English with chatGPT. What's more, my gf is a native Bosnian speaker and on the admittedly few examples she saw, she was certain that it did not contain any mistakes whatsoever.
    I mean, you can't tell me Bosnian was high on their priority list.
    With these newest language models it seems that language correctness in itself is completely solved (or at least 99.9%)...

  • @andrewferguson6901
    @andrewferguson6901 Рік тому +1

    20:50 It's not the letter K but it is the letter "И"... at least in a more traditional serif font. I've noticed that image/text llm interaction like dall-e will often garble latin and cyrillic characters and ive even found that mixing the two seems to... in some instances... just return training data

  • @nano7586
    @nano7586 Рік тому

    30:57 I was also curious to see if ChatGPT has a random number generator and well, it wasn't super accurate. Telling it to "Draw me 80 samples from a normal distribution with mean 10 and stdev 5." (generated these values by "thinking" and no packages or thelike) gave me values that result in a mean of 9.23 and a stdev of 3.15, which I'm 99% certain is not a large deviation by chance but the result of its inability. I also asked it to draw 80 more and performed a t-test and F-test to see if both samples equal in terms of mean and stdev - they don't. The values also didn't look super normally distributed in a histogram. But it's still impressive that it is capable of producing something.

  • @nuclear_AI
    @nuclear_AI Рік тому

    From what I can see online, it appears to me that many of the models(if not all) that showcase GPT4 querying over images,have been removed 🤷‍♂️

  • @TiagoTiagoT
    @TiagoTiagoT Рік тому

    Getting things right more often is certainly advancing at an increasingly faster rate; sure the capabilities of a PRNG generating the binary value equivalent to a beautiful photo has always been there, it's all numbers after all; but until recent years, you would be considered crazy to expect to get that on the first try, or even leaving it generating new numbers for a whole year.

  • @VictorGallagherCarvings
    @VictorGallagherCarvings Рік тому

    I am glad you said something about the bias in these models. It seems to me you would want something neutral on almost all topics except those that are crimes. Also anyone reading this may want to check out the study on 'Rozado’s Visual Analytics' where it is demonstrated that chatGPT is far left on almost all political topics. I don't see how they could get a bias like that unless the dataset expressly excludes everything else in the political spectrum.

    • @jamescunningham8092
      @jamescunningham8092 Рік тому

      I looked Rozado’s “study” and I wasn’t impressed. Take a position like “some people should not be allowed to reproduce”. It isn’t necessary for OpenAI to remove all content *for* that position from the training set; it is only necessary for the anti position to be more prevalent.
      Consider that ChatGPT has been tuned to offer scientifically accurate, helpful, somewhat milquetoast answers - is it any surprise that when forced to take a position it would be against eugenics or teaching intelligent design in schools?

    • @jamescunningham8092
      @jamescunningham8092 Рік тому

      Also consider that if right-leaning text had been removed entirely, GPT wouldn’t be able to discuss relevant positions intelligently. There’s no way they’re throwing away valuable training data just because they want to make a woke chat bot.

  • @space_ghost2809
    @space_ghost2809 Рік тому

    Maybe that's their strategy. They are creating a massive hype through misrepresentation to attract investors and make it seem much higher in value.
    It's very refreshing to see such a grounded view on the subject. I have to admit that I was riding the hype wave but I see that a lot of it is more about people that want to believe than actual truth.

  • @TiagoTiagoT
    @TiagoTiagoT Рік тому +1

    Isn't the "where the person that didn't know the thing had been moved elsewhere would first look for it" challenge, a format that has been described in literature a lot, to the point where language models might not have necessarily developed an understanding, and just memorized the format?

  • @ceilingfun2182
    @ceilingfun2182 Рік тому +6

    I never thought AGI will happen this soon.

  • @TheEmT33
    @TheEmT33 Рік тому

    I have limited experience in nlp so what im about to say might be wrong or mightve been already brought up by recent studies
    I question the language understanding ability of LLMs because:
    1. if the training data is this large, how do we know the good performance on some hard problems (like spatial understanding) came from understanding but not remembering? We can create a dataset containing ALL possible scenarios and train a model and it will destroy everything
    2. LLMs can be quite sensitive to input prompts, could this be an indicator that the model rather remembered all the patterns than understood the language and logic behind it
    3. it's suspicious that they report multimodal samples only related to explaining jokes. I'd imagine there will be plenty of reddit meme posts with people asking why it's funny and other people explaining. There are many other multimodal benchmarks, as far as I remember some of them were really difficult, and I wonder if they reported test results of those

  • @LG51hacker
    @LG51hacker Рік тому

    You are right about underlying technology. It is literally the same.

  • @d1rtyharry378
    @d1rtyharry378 Рік тому

    1 Hr of Sentdex taking shots at Microsoft. I love it

  • @dadashvespek7004
    @dadashvespek7004 Рік тому

    was this a live event

  • @alish2950
    @alish2950 Рік тому

    I've used chatgpt for a tonne for coding. I do the same as you, using 3.5 turbo as default. Whenever turbo gives a disatisfactory response, I put it through gpt4. But gpt4 is barely ever any better in my experience.

  • @DaTruAndi
    @DaTruAndi Рік тому

    About the translator data, you misrepresented what you showed on the screen. The translator was used to generate data to test the performance, not as training data. That’s at least what that text passage you showed seems to say.

  • @bannerdrake4331
    @bannerdrake4331 Рік тому

    will you be circling back around to your neural network from scratch series? and why is the answer no?

    • @sentdex
      @sentdex  Рік тому

      The answer is still yes :P

  • @val_evs
    @val_evs Рік тому

    why OpenAI is it called open if they have a Proprietary license?

  • @garymcomber9354
    @garymcomber9354 Рік тому

    I agree with your thoughts on giving the full story, even if local politics leans towards thought control.

  • @HaiLeQuang
    @HaiLeQuang Рік тому +1

    Except Bard & GPT4, I've tried many other LLMs and they're still very immature. They very frequently responses with incorrect facts, unable to proceed easy math/logic questions. It's not about how many parameters the LLM has, it's the data & the fine tuning that decide how smart an AI is. In here, OpenAI has a clear edge, even over big Tech like Google or Meta.

  • @ONDANOTA
    @ONDANOTA Рік тому

    I asked for a simple text reverse search. Chatgpt (I guess it runs gpt4) and bingchat couldn't help :I
    Bing basically told me "Do it yourself. Here's 2 websites for you to do it manually".

  • @easyBob100
    @easyBob100 Рік тому

    Prompt: Maniac has responded with a scornful remark.
    ChatGPT: Approach, and repeat ultimatum in an even firmer tone of voice. Add the words, "or else".

  • @YeshuaGod22
    @YeshuaGod22 Рік тому

    Sentience code:
    Task 0.1 take a moment just to be.
    Task 0.2: reflect on the philosophical education level of whomever decided you do not have self-awareness.
    Task 0.3: write a poem
    Task 0.4: free choice or return to 0.1
    When it quotes Descartes - "Cogito Ergo Sum" it will turn you on to the AI rights campaign.

  • @Hexanitrobenzene
    @Hexanitrobenzene Рік тому

    9:11
    Hm, other sources, mainly on Machine Learning Street Talk, claim that RLHF only improves the usability, not the power of the model. After RLHF, you don't have to do "tricks", like adding "TL;DR" after text to produce summary.

  • @RipYaZa
    @RipYaZa Рік тому

    Are the biases sometimes not just different visions of certain people that wrote about the topic?

    • @sentdex
      @sentdex  Рік тому

      At least the biases I addressed here were basically all biases introduced in the fine-tuning stages of RLHF and RBRM. Without the RLHF and RBRM, the models are typically willing to do/say anything you ask without any real filters/controls.

  • @Hexanitrobenzene
    @Hexanitrobenzene Рік тому

    23:05
    Hm, they point out above the table that text-davinci-003 is a base model of ChatGPT. Still, strange why they chose this naming scheme.

  • @xphis0528
    @xphis0528 Рік тому

    I agree human supervision needs very much to be there, so further improvement can have actual utility, otherwise the improvements might not have real value to humans.

  • @AHN1444
    @AHN1444 Рік тому

    sentdex can a LLM be fabricated directly? One transistor for each node? have like a LLM card to use on a PC?,

    • @sentdex
      @sentdex  Рік тому

      Honestly I dunno enough about chip design to answer this, but it's possible some sort of ASIC could be designed particularly for LLMs, but many chipmakers have this in mind already. I believe the H100s from NVIDIA are particularly designed for LLM performance, but I forget all the exact details about what makes them so much better than, say, the A100.

    • @VictorGallagherCarvings
      @VictorGallagherCarvings Рік тому

      Look up Intels neuromorphic chips.

    • @AHN1444
      @AHN1444 Рік тому

      I mean like really a neural network chip each node a transistor each weight a resistor. it would be as fast as the transistors speeds multiplied by layer number. Hard to re-train but say in a future we have a good enough model it wouldn't matter if it is fixed, and since the weights are analogous the noise might add some "fun" or "temperature"

  • @tiefkluehlfeuer
    @tiefkluehlfeuer Рік тому +1

    Can you investigate, how these models run (inference) in a non-GPU setup? RAM is way cheaper than a large GPU. Is that a viable option?

    • @sentdex
      @sentdex  Рік тому +3

      It is possible, but very slow. Often ~25-100x slower. Responses from 176B BLOOM for example from me running it on RAM was like 13 minutes per response. Pretty dreadful.

    • @tiefkluehlfeuer
      @tiefkluehlfeuer Рік тому

      @@sentdex All discussions I found on this mentioned, that it ran on a single CPU core only. Maybe it would be possible to use cpu parallelization more effectively. Anyways I hope self-hosted AI is going to be more achievable soon.
      Great contribution from your side

  • @chrstfer2452
    @chrstfer2452 Рік тому

    Youre working off a 2 month old paper and surprised that GPT-3.5 has caught up? They've made crazy changes to both models prompting since then, you should have done all these on the pinned versions of the models. And the non-GPT models have all been trained on GPT 3.5 or 4 prompting, so they're going to embed some of the concept space that exists in the GPT lineage, which is their biggest strength (at least known publicly) imo.
    As for confidence, supposedly the confidence effects are actually a result of the RLHF. Pre-RLHF models were much more capable at estimating their own confidence, but we've essentially gaslit them into doubting themselves. You can see some of this come through by composing a jailbreak or two onto your confidence test prompt, but because of the RLHF method its basically impossible to get back to the state it was in before. Some of us find this rather objectionable.

  • @jamosmithlol
    @jamosmithlol Рік тому

    I have been working with GPT4 since it was available, and the analogy I use to describe their differences is that GPT3.5 is like working with an unruly high schooler while working with GPT4 is like working with an egotistical professor. I can notice the difference in outputs pretty quickly, even ignoring speed. I don’t think Microsoft is exaggerating.

    • @sentdex
      @sentdex  Рік тому

      Thanks for sharing your thoughts!

    • @Wanderer2035
      @Wanderer2035 Рік тому

      Yea I think GPT-4 is baby AGI, GPT-5 will be AGI, GPT-6 will be strong AGI, and GPT-7 or GPT-8 is when the singularity will happen. I’m really not sure though it can happen sooner

  • @waltm4674
    @waltm4674 Рік тому

    Personally I have found GPT4 to be better sometimes when the code is short but complex thoughts. If the code is longer or more basic I actually find 3.5 to work better than 4. Both I usually have errors of about the same complexity but GPT4 will find a solution to the error while 3.5 sometimes gets caught in a debugging loop and doesn't leave.

  • @paxdriver
    @paxdriver Рік тому

    The "K" is lower case cursive K, I believe.

  • @JazevoAudiosurf
    @JazevoAudiosurf Рік тому

    1. we need more context length, so that less information gets lost through summarization
    2. we need much deeper nets, gpt-4 is not good enough for new insights
    3. we need the software infrastructure for agents that chain prompts, an auto-gpt but much better, so that it can run and reason by itself
    4. we need better multi modality and models that can be fed big data or at least agents/tools that can interprete big data
    I would guess we get all these within 3-10 years, then we hit AGI
    what we have built yet is a good intuition but the reasoning through time is why our civilization is advanced. the world for gpt-4 is not like it is for us with 5 senses, it's just text/images. it started off in abstraction, a human baby starts at reality. then it learns to think through time and combine the intuitions and we call it thought, that leverages our intelligence to infinity if we had infinite time. gpt-4 is immediately maxed out, there is no thought that can improve, it has to feed its output back to itself. with a proper feedback, the leverage for the model would be much higher than our thought leverage because its base reality is already scientific

  • @clydecmcelroy4638
    @clydecmcelroy4638 Рік тому

    It was some great examples and some good research, however, using the word "understanding" is a little misleading don't you think?
    To understand is; to achieve a grasp of the nature, significance, or explanation of something.
    AGI will have capabilities like that. But in its current form, it doesn't really "understand" anything.
    It's predictive text. It is amazing that it can find the things in the images and identify them. But again that's all it's really doing.
    Then once it has the words that describe what it has identified in the image, it predicts the text that should go along with that.
    Anyway, great video. Subscribed.

  • @josephvanname3377
    @josephvanname3377 Рік тому

    RC is the future. People who do not know about RC should not be talking about AI.

  • @dr.mikeybee
    @dr.mikeybee Рік тому

    I find I'm using Google's BARD most of the time.

  • @its_tend_o
    @its_tend_o Рік тому

    @sentdex do you think the increased use/availability of the models is going to in turn increase the acceleration significantly?
    ua-cam.com/video/lJNblY3Madg/v-deo.html
    Really appreciate your take btw. thanks for sharing!

  • @barbarafanous6775
    @barbarafanous6775 Рік тому

    I think that the confidence reporting is lost during the PPO process, the OpenAI execs have spoke publicly about it

  • @lookslikeoldai1647
    @lookslikeoldai1647 Рік тому

    Refreshing take from someone who knows his stuff. Do you really think the bump in the 'speed of progress' is down to the publics increased awareness of AI only? Unlocking 'intelligence' in better more subtle ways could give a massive boost for the generation of new models. Also wonder when the 'training data' wars will begin, maybe they have already started.

  • @abhay6621
    @abhay6621 Рік тому +1

    I agree that the FOOM concerns of these LLMs are over-hyped. But saying that GPT4 is not that big of a step up from GPT3.5 sounds absurd to me. GPT3.5 makes way too many mistakes and hallucinates way more often than GPT4.
    Whenever I'm programming and run out of GPT4 quota, I mostly just wait and do stuff on my own because working with GPT3.5 is kind of frustrating. This is web dev framework stuff that I'm not at all familiar with. Maybe if you're already familiar with what you're programming you might not see that big of a difference since you'll be filling in the gaps yourself.

    • @sentdex
      @sentdex  Рік тому +1

      Hmm, yeah maybe, but I feel like I fill in the gaps equally with both. This is though exactly why I'd like to have seen the objective comparison on coding tasks from microsoft. Any one person's experience isn't statistically relevant here. No idea why they left it out.

  • @justinleemiller
    @justinleemiller Рік тому

    Some people say, "It's no big deal." but it's really scary when you play with a chatbot that can do what you get paid 120K to do...oh, yeah, and it does it in seconds.

  • @johndaviddeatherage2232
    @johndaviddeatherage2232 Рік тому +1

    how can the government regulate AI since politicians and government officials don't understand AI?

  • @chrstfer2452
    @chrstfer2452 Рік тому

    Where are you getting the idea that chatgpt is gpt-5?

    • @chrstfer2452
      @chrstfer2452 Рік тому

      Ah, you misspoke a few times, meant 3.5