Phi-2, Imagen-2, Optimus-Gen-2: Small New Models to Change the World?

AI Explained

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 12 гру 2023
Phi-2 is a tiny model that could fit on a phone, but it outperforms huge language models like Llama 2. I explain more about how it was made and what it means. Then we see Imagen-2, the most stunning text-to-image yet, at least according to Google' images. We then glimpse Optimus 2, smaller in the sense that it's 10kg lighter! With more degrees of freedom, it's movements look a lot more humanoid. And then the full launch of AI Insiders, plus a recap of why we shouldn't use the MMLU to 2 decimal places!
/ aiexplained
phi2 now on HuggingFace: huggingface.co/microsoft/phi-2
Bubeck Video, min 19: • Textbooks Are All You ...
Phi 2: www.microsoft.com/en-us/resea...
Shital Shah: / 1734882570603753814
Shoggoth: / 1702488701782352097
Mamba 3B: www.together.ai/blog/mamba-3b...
Phi 1.5B: arxiv.org/abs/2309.05463
Phi 1: arxiv.org/abs/2306.11644
Microsoft Prompting: www.microsoft.com/en-us/resea...
SmartGPT Video: • SmartGPT: Major Benchm...
The Information: www.theinformation.com/articl...
Imagen 2: / 1734954295655534780
deepmind.google/technologies/...
/ 1734763060244386074
Greg Technology: / 1734544659953623509
Swyx: www.latent.space/
AI Engineer: youtube.com/@aiDotEngineer?si...
Shawn Wang: x.com/swyx?s=09
/ aiexplained Non-Hype, Free Newsletter: signaltonoise.beehiiv.com/
Наука та технологія

КОМЕНТАРІ • 432

@SebastienBubeck 4 місяці тому ⁺²⁴
Yet another amazing video! I really enjoyed your critical take on benchmarks like MMLU, this is much needed.
@aiexplained-official 4 місяці тому ⁺⁷
Thanks so much Sebastien, Phi-2 is an incredible model - have been testing it for many hours - congratulations to you and the team! And yes, am looking forward to new benchmarking standards for 2024. Thank you again for speaking yesterday.
@Diabloto96 4 місяці тому ⁺²¹⁸
Philip doing public work by fact-checking the MMLU WHILE creating all this content?? Impressive work, you're one-of-a-kind in the AI vulgarization field, congrats!
@aiexplained-official 4 місяці тому ⁺²⁴
Thanks Diabloto, I am very LLM-curious
@gabrote42 4 місяці тому ⁺⁴
@@aiexplained-official major credits!!! Hope you sent a link to this to all those companies!
@sumanthbalaji1768 4 місяці тому ⁺¹
@@aiexplained-officialhey this MMLU flaws are crazy, could you share the doc of inaccuracies for others to go through?
@skierpage 4 місяці тому ⁺¹
@@sumanthbalaji1768I found a Medium post from August, "Errors in the MMLU: The Deep Learning Benchmark is Wrong Surprisingly Often," but that seems independent work by a Daniel Erenrich.
@sumanthbalaji1768 4 місяці тому ⁺¹
@@skierpage yes I went through that blog too, doesn't have this document of errors
@Megneous 4 місяці тому ⁺¹¹⁰
You honestly need to publish a paper on the errors in the MMLU. This needs to be seen by academia.
@KP-sg9fm 4 місяці тому ⁺⁸
100%
@maxm1555 4 місяці тому ⁺¹
No paper needed, they should watch this video and immediately build a new test from the ground up!
@StevenAkinyemi 4 місяці тому ⁺¹
They know lol
@onoff5604 4 місяці тому ⁺¹
please please publish, but please prepare to be attacked for your honesty
@raphaelsoltero8805 4 місяці тому ⁺⁹⁸
I feel as though it is slightly ironic that the Ai's intelligence was held back not by their own way of learning, but by our inaccurate datasets.
@KibberShuriq 4 місяці тому ⁺¹¹
It makes a lot of sense though. We tried to make it equally good at predicting experts AS WELL as predicting average Joes AND raging lunatics. Of course that task is much harder than just predicting experts.
@rantmarket 4 місяці тому ⁺²³
I still can't believe the MMLU isn't being called out by people, at least. It's been so long since you found those problems, that I won't accept that people don't know about the issue enough to have it thrown out by every benchmark set using it.
Thank you again for your great work. Cheers.
@aiexplained-official 4 місяці тому ⁺⁶
Thanks rant. I thought so to and then up it pops with Gemini, front and centre
@skierpage 4 місяці тому ⁺¹
@@aiexplained-official what did the authors of "Measuring Massive Multitask Language Understanding," Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, Jacob Steinhardt, say when you contacted them?
@DaveShap 4 місяці тому ⁺¹⁶
Increase efficiency!
@L1AM 4 місяці тому ⁺⁹⁴
Well, at this rate this time next year we'll have a locally runnable AGI.
@Feel_theagi 4 місяці тому ⁺²
I'm more excited about how much better the largest cloud ones will be
@Boufonamong 4 місяці тому ⁺⁸
Imagine that 😂, I'm calling mine hal
@Karearearea 4 місяці тому ⁺²
5 years from now we probably will
@aN0nyMas 4 місяці тому ⁺³
@@BoufonamongI'm calling mine Meg. Short for Megatron.
@aiexplained-official 4 місяці тому ⁺³¹
AG-phi?
@H1kari_1 4 місяці тому ⁺⁶⁸
The big big issue most people are currently overseeing is that all those benchmarks are in english. The data is in english. The models are heavily optimized for english. GPT3.5 and GPT4? Speaks about any language it has gotten some data for and also provides excellent results for tasks in those languages.
@aiexplained-official 4 місяці тому ⁺²⁵
Great point
@twosaibackbot 4 місяці тому
Yeah I am Swedish and will be truly scared of an automated workforce when these LLM:s speak and understand smaller more local languages fluently. GPT-4 is decent at it but not yet good enough for professional use
@jokmenen_ 4 місяці тому
Very true. I haven't seen a model with less than 70b params yet that really impressed me with its performance in my language
@ryzikx 4 місяці тому ⁺²
though that is a very big problem i'd argue the larger problem is 'poisoned' models basically trained to tackle the benchmarks rather than being actual general-purpose models
@KyriosHeptagrammaton 4 місяці тому ⁺¹
Given that multi-modality seemed to boost performance I wonder if multilingual models would also be boosted.
@ClayFarrisNaff 4 місяці тому ⁺⁵
I love that you're an informed AI enthusiast, yet you're not afraid to criticize -- and to do so forcefully -- where you see the need. It's a mark of integrity.
@aiexplained-official 4 місяці тому
Thanks Clay
@kylemorris5338 4 місяці тому ⁺¹⁵
Having seen your previous work on the MMLU that graph that declared a .06 PERCENT breakthrough made me burst out laughing.
We need an MMLU 2 or something to that effect yesterday, and I'm starting to suspect the only reason we don't have it yet is that nobody wants their fancy benchmark numbers to go down, even if they would be more accurate.
Re: Phi-2, I am happy to see that synthetic data is getting more love, as opposed to the models that just use mass scrapes of any data that isn't tied down properly.
@randfur 4 місяці тому ⁺³⁵
Thanks for looking into the benchmark data, they were too opaque up until now. Whenever a model scores impressively on one we should dig into it to know whether it really means it's good at X subject or if it's just good at making the same mistakes.
@consultantnigel-projectman7274 4 місяці тому ⁺⁶
As a new Patreon member, I'm here to tell you how amazing AI Insiders is. Phillip's research is impeccable. The Insider info is priceless. Those of you who make your living with AI - do yourself a favor & budget the $30 each month to support Phillip. Everyone will eventually be making their living with AI; if not today, very soon. You will need quality, authoritative information upon which you can make important decisions. AI Insiders will provide you with AI news that is second to none. If you have not already, join. Completely worth the money.
@aiexplained-official 4 місяці тому ⁺¹
Such high praise, thank you so much. If you like what's there, in 2024 you will be even more impressed!
@skippy6086 4 місяці тому ⁺¹⁰
The pace of the race toward the edge of cliff quickens
@MachineLearningZuu 4 місяці тому ⁺¹
Gemini Nano hit the punch line 🥊
@GrindThisGame 4 місяці тому ⁺¹
Time to fly.
@aiexplained-official 4 місяці тому ⁺³
Hmm, but synthetic data is good for safety no?
@Igor_lvanov 4 місяці тому
@@aiexplained-official Maybe this model won't be a Shoggoth, but there are a lot of ways things may got wrong. E.g. because we will get extremely powerful systems without proper defence mechanisms against misuse, or things like instrumental convergence.
@CalConrad 4 місяці тому ⁺³
For the record, you have the best thumbnails in the game.
@aiexplained-official 4 місяці тому ⁺²
Aw thanks Cal, often get criticised for them and hundreds of offers to pay for thumbnail services but I love them too. Minimalist.
@swyxTV 4 місяці тому ⁺²
thanks for having me as your first Insiders speaker Philip!
@aiexplained-official 4 місяці тому ⁺¹
And thank you so much swyx. It was a great talk and laughed at the intro!
@skippersthepenguin3591 4 місяці тому ⁺⁴⁸
They should make Phi-3 a 7B model. If Phi-2 is a quarter of it then increasing by double should make it even better, and 7B models are runnable on 90% of computer hardware.
@berkertaskiran 4 місяці тому ⁺¹⁴
Their priority is probably phones.
@boris6237 4 місяці тому ⁺³
yeah, i think it's especially important for decent models to be able to run on low-end phones so that LLM access isn't restricted to the first world @@berkertaskiran
@noone-ld7pt 4 місяці тому ⁺⁶
@@boris6237 Oh wow, that's an incredibly important argument, had not thought about it like that and I really appreciate you sharing that perspective!
@QH96 4 місяці тому ⁺¹
Don't quote me, but a 7 billion model would probably use about 6 GB of ram
@carkawalakhatulistiwa 4 місяці тому ⁺¹
@@QH96and iPhone 15 pro max only have 8gb ram . And IOS sistem aredy use 2gb of ram
@_ptoni_ 4 місяці тому ⁺⁸
I was impressed by the phi-2 code perf
@heinkle1 4 місяці тому ⁺¹
I’ll be honest, I stopped watching your videos for a while because they caused me too much anxiety - but when I then look at some of the other things going on in the world, it is actually comforting to hear about AI. Congrats on your meteoric growth in 2023.
@aiexplained-official 4 місяці тому ⁺¹
Thanks for coming back heinkle! Great to have you here
@harnageaa 4 місяці тому ⁺¹⁴
TL;DR If the data set trained for these 'gpts' would actually be accurate, we'd have even more impressive models overall. So not even changing the training method, just the data you can get way better models
@skierpage 4 місяці тому
I wonder if a large model with a big context window would be able to spot inconsistencies and mistakes in training data. I saw a documentary where a AI presented with logical inconsistencies went into a garbled "Does not compute" loop and eventually caught fire, so maybe it's too dangerous!
@harnageaa 4 місяці тому
Idk, how can you determine if something is right or wrong if you learn the wrong thing
in the dataset. I think the best would be "smaller models" used by a bigger model.Where the smaller models are used to detect inconsistencies within the dataset.
You train the small models with 100% accurate data and you teach them to spot right/wrong answers, and that's their sole purpose, and they will find every mistake in any dataset. So a model for math one for chemistry one for biology,etc. Then the bigger model could access through api these mini models and get the results from them and recreate pdfs with "correct dataset".
I think it's safer that way, when you have a big models it's harder to "control" and know what he actually knows. And to make a model that have perfect data for code,math,physics,etc. It's basically the final product we want, but to obtain that we need to curate the data we have, and fastest way to do that is a smaller model. Then once all data is curated, we use that for a bigger model. I spammed q_q oops. u get the point.
@@skierpage
@skierpage 4 місяці тому
@@harnageaa symbolic AI tried to develop AI by teaching systems only the right answers, and it's utterly failed to keep up with transformers. One of the great things about LLMs is they can handle inconsistency and exceptions: "Water is wet" (ice), "Palestine is a state" (disputed), "An asteroid killed the dinosaurs" (generally accepted), etc. Learning everything includes ingesting megabytes of the "wrong" things; again, I want to know if an LLM can be aware of discrepancies while or after it trains.
@alphahurricane7957 4 місяці тому ⁺¹⁴
i think that smaller models giving out 100% accurate information to a general, bigger AI capable of understanding and finding anomalies in the process, or be critical of the result is the real AGI
i saw today "teslabot 2", im very much interested in seeing AI and robotics in everyday life
a lot of insights as always, thanks!
@MCA0090 4 місяці тому ⁺¹
Maybe the way to go is finding ways to make models smaller and more efficient to the point that they could run on local devices instead of big datacenters rellying on clound and internet connection and higher latencies (Cloud would never work to make robots work properly)... Yesterday I was reading about liquid neural networks and how they can do the work with just a few neurons, it seems promising to shrink really large NNs into much smaller and faster ones especially for vision, videos, images and audio/speech recognition, for robotics LNNs can handle vision better than current neural networks and run fast even on small devices such as Raspberry Pi because it needs just a few neurons to do the same task as a really big NN based on other architectures do. LNN are very small and have plasticity to adapt to new situations without needing a new training process.
@jawadur_ 4 місяці тому ⁺¹
The most value delivered per minute on UA-cam
@aiexplained-official 4 місяці тому ⁺¹
Thanks so much jawadur!
@baychaoz 4 місяці тому ⁺¹
7:06 such a legend
@educated_guesst 4 місяці тому ⁺¹
Hi Philip
just wanted to say thank you for still pumpin out so many videos despite your patreon contend probably also being a ton of work
Thank you so much for keeping us informed!
@aiexplained-official 4 місяці тому
Haha, no thank you for supporting on Insiders. It's what keeps the main channel going!
@3dVisualist 4 місяці тому ⁺⁴¹
With AI Insider, you really are creating a lot of content. I do hope it turns out you were AI all along!
@aiexplained-official 4 місяці тому ⁺²⁴
Haha not quite, a hardworking human!
@iamcoreymcclain 4 місяці тому ⁺¹
@@aiexplained-officialthe way you pronounced “Imagen” made me question if this was an AI voice as
well lol but I think you’ve left enough small clues to prove your humanity 😂
@SBImNotWritingMyNameHere 4 місяці тому ⁺¹
@@aiexplained-official thats what you think
@3dVisualist 4 місяці тому
@@aiexplained-official certainly hardworking! Thanks for all your explainers, they really help stay on top of the fast moving world of AI.
@Dylan-zg2jl 4 місяці тому ⁺¹
As usual, a fascinating video with revealing insights that are seldom if ever found anywhere else. Great job mate and look forward to more
@aiexplained-official 4 місяці тому
Thanks so much Dylan!
@pablolucianogomezdemayorag4060 4 місяці тому ⁺²²
Amazing as always! Whish regular media was half as good at divulging complicated topics, this channel is gold
@aiexplained-official 4 місяці тому ⁺²
Wow thanks pablo
@BTFranklin 4 місяці тому ⁺¹⁰
Is there any effort to actually correct the MMLU? If not, why not? What would be required to get these corrected? I feel that this is a serious problem and it's disturbing that the MMLU is continuing to be used without correction.
@Olack87 4 місяці тому ⁺²⁴
Amazing video, as always. Have you contacted any of the people in the field about the erroneous benchmarks? Do we know if anyone is working on it to create new ones or fix them? I can't believe they don't know or care about it but the problem is still there it seems.
@aiexplained-official 4 місяці тому ⁺²³
Yes, and people are. There are better benchmarks coming out all the time, hence my surprise at this MMLU d-measuring
@schemage2210 4 місяці тому ⁺⁹
We have all seen the Boston Dynamics robots doing incredible things, but the scripting and trail and error involved to make those incredible videos is insane. And lets not forgot that the Atlas robot is huge. Are we actually meant to believe that Musk's Optimus robot is "as described"? AI powered, and physically capable of all the actions its shown doing?
@whiterottenrabbit 4 місяці тому
This time next year
@McDonaldsCalifornia 4 місяці тому
Anything Musk hypes up should be taken with a laaarge grain of sand
@stephenrodwell 4 місяці тому ⁺¹
Thanks! Fantastic content, as always. 🙏🏼
@HonestyLies 4 місяці тому ⁺¹
great vid as always, strapping in for next year's craziness
@doctormaddix2143 4 місяці тому
Can’t appreciate your work enough! Thank you.❤
@aiexplained-official 4 місяці тому
Needed to hear that, thank you
@ryzikx 4 місяці тому ⁺¹
always a good day when phillip ai uploads
@onoff5604 4 місяці тому ⁺¹
Thank you so much for investigating problems with testing.
@DreamOfFlying 4 місяці тому ⁺²
I absolutely love your videos! They deserve each and every view and like!
@aiexplained-official 4 місяці тому ⁺¹
Thanks so much Dream!
@Shaunmcdonogh-shaunsurfing 4 місяці тому ⁺¹
Sounds great for general chat conversation
@CrueMusic 4 місяці тому ⁺¹
Thank you! I hope you dont reduce the ammount of great content here on your channel. Its invalueable.
@aiexplained-official 4 місяці тому ⁺¹
Hopefully this video is evidence not!
@williamjmccartan8879 4 місяці тому ⁺¹
Thank you for sharing your time and work Phillip, I responded to one of your tweets, by asking if you knew what is going on over at liquid ai, the new year is fine and by the looks of it your going to really busy, but if you get a chance I'm curious as that's where Joscha Bach is working at right now. Merry Christmas to you and your family and all the other family helping you with this great work, and a Happy New Year.
@aiexplained-official 4 місяці тому ⁺¹
Merry Christmas Bill, will check it out, cool name at the very least
@MrSchweppes 4 місяці тому ⁺¹
As always great video! Very informative! Many thanks to you!
@aiexplained-official 4 місяці тому
Thanks so much, as always!
@JohnLeMayDragon 4 місяці тому ⁺¹
Thanks for another informative video!
@aiexplained-official 4 місяці тому
Thanks so much John, means a lot
@MindFieldMusic 4 місяці тому ⁺²
Billy Madison to the MMLU, "I choose: Business Ethics." 😉
@miker99 4 місяці тому ⁺¹
when will they learn? rubbish in rubbish out. Thanks for all your efforts to bring awareness to this issue of testing quality.
@sharkeys 4 місяці тому ⁺²
You know they are flexing their ability when they show hands :D
@aiexplained-official 4 місяці тому
Haha
@Q1w1N 4 місяці тому ⁺¹
I don't know what's more concerning, the fact that those models did so good at flawed test, or that they might be much more capable than we think.
@ok373737 4 місяці тому ⁺¹
Brilliant!
@eburgwedel 4 місяці тому ⁺¹
Can’t wait to see Mixtral in the mix!
@yw1971 4 місяці тому ⁺¹
I think if we can find a formula, no matter how long & complex, that can be the 'Engine' for such a training, it will change the field.
@covle9180 4 місяці тому ⁺²
Small models ftw! If I cant run it on my phone or self-host it (without really expensive GPUs) then 90% or use cases just don't work.
Models are flaky enough as they are. Add to that the unreliability of some companies' APIs, we need self hosted solutions we can fine tune. (Not to mention privacy issues)
@user-hk8jt6so3l 4 місяці тому ⁺¹
I can not thank you enough! I will definitely support you on patren when my finances allow it! THANK YOU FOR GUIDING US THROUGH ALL OF THIS, YOU ARE THE BEST!❤
@aiexplained-official 4 місяці тому
Thanks so much, no worries on Patreon, your kindness here is enough!
@KP-sg9fm 4 місяці тому ⁺¹
TOP FRICKEN NOTCH MY FRIEND, THANK YOU!!!
@aiexplained-official 4 місяці тому
So kind KP!
@aaronnewman2 4 місяці тому ⁺³
You are beautiful sir. Thanks as always.
@aiexplained-official 4 місяці тому ⁺²
Wow thanks Aaron, that's cheered my spirits
@mugwortofz3lb1za 4 місяці тому ⁺¹
Always the best videos!! Have you considered making a patreon tier where some of the funds go towards a google colabs backend for the patreons to use, depending on their subscription amount & time?? Given how little resources were used training Phi-2, it could be a good idea to let people experiment with the concepts shown in your videos, as well as more exotic variations in model architecture such as cyclic attention heads, sub networks etc..
@tomaszkarwik6357 4 місяці тому ⁺⁴
if this was SDXL. i'd give the image a 9/10 the problems are:
-the eyes ( they are not pointing in the same direction)
-the ear (it is just weird)
-the lighting is wrong (the leafs are lit from behind the subject and the subject is lit from the front)
-her whole right side is a bit wonky
- the 1 strand of hair in the back is weird. 7:33
PS if you want to see the best SDXL models, use the ones over at cvitai and not the stablity ai's (the 1.0 model is still the best you can get from there). Just pick the "JuggernautXL" or " DreamshaperXL" as they are SotA for XL.
PSPS Other then the part about imagen-2 this was a very good video. Love your dedication to the craft of making ai news without the hype.
@aiexplained-official 4 місяці тому
Thanks tomas, your professional eye caught much more than me, apologies
@tomaszkarwik6357 4 місяці тому
@@aiexplained-official i ain't a proffesional, but i use SD. these things are just what you train your eye for
@maciejbala477 4 місяці тому
really? Dreamshaper is SotA? I knew about Juggernaut but i remembered dreamshaper's earlier non-SDXL versions as kinda worse than some alternatives. Will have to try it out
WyvernMix was another that impressed me
@tomaszkarwik6357 4 місяці тому
@@maciejbala477 the XL version is at least close to the sota for trubo
Or at least it was late last week
@AICodingAdventures 4 місяці тому ⁺²
Awesome video! You did a great job exposing MMLU and how shady it is. I agree that people should stop trusting it as a measure of capabilities. What about MoE and Mistral?
@aiexplained-official 4 місяці тому
Thanks AICA, still investigating !
@Just4Games2011 4 місяці тому ⁺⁵
Great video, but why not mention Mixtral? Are you still experimenting with it?
@aiexplained-official 4 місяці тому ⁺⁴
First I think phi2 is more significant but also to cover it properly would be a lot more work, there's only so much time in the day!
@Just4Games2011 4 місяці тому
@@aiexplained-official Fair point, can't wait to see your video on it.
@jamesatotago 4 місяці тому ⁺⁸
Great video again! Please do a video on synthetic data. I get that this will likely decrease toxicity but what else will it do. If, for example, Microsoft is building the synthetic data, does that mean that we are training the AI on Microsoft’s view of the world? One can imagine how this could be influenced by all sorts of commercial imperatives. Will synthetic data make models more and more similar to one another and perhaps less interesting?
@aiexplained-official 4 місяці тому ⁺¹
I don't think less interesting if you ensure diversity - see original phi1 vid
@atom1496 4 місяці тому ⁺¹
For the benchmark, it is common to include wrong or ambiguous questions so catch training leakage. It should not be possible to get a 100%.
@GrindThisGame 4 місяці тому ⁺⁴
Better data, better models...makes sense.
@nacho7872 4 місяці тому ⁺³
Amazing video as usual, thanks for the fast update
@aiexplained-official 4 місяці тому
Thanks nacho!
@UncleJoeLITE 4 місяці тому ⁺¹
I'll speak only to what I know. That project sounds amazing, I wish I was into VC, I'd buy in! Tbh, most GenX weren't taught ANY entrepreneurship if we went the corporate/govt career. I'm sure you have even bigger plans, depending on what sticks. _Putting decimal places on data with ~? confidence intervals is how we manipulate ofc._
@kevinli3767 4 місяці тому ⁺¹
I'll ask the question that everyone's curious about - how are you able to 1) access, 2) digest, 3) synthesize, and 4) produce so productively???
@aiexplained-official 4 місяці тому
Will do a video on that someday! And don't forget the hours of content (researched, narrated and edited by me) for AI Insiders at the same time, plus sourcing and conducting interviews! And comment replying!
@kevinli3767 4 місяці тому
AGI must be helping you with the details :D @@aiexplained-official
@muhammedkoroglu6544 4 місяці тому ⁺¹
Amazing content! Don’t get how you don’t have a million subs
@aiexplained-official 4 місяці тому
Aw thanks Muhammed
@beaumac 4 місяці тому ⁺⁸
AGI coming to a mobile device near you in 2024 thanks to synthetic data. Is there any safety checking done on this data?
@aiexplained-official 4 місяці тому ⁺³
Well it's synthetic so shouldn't be as bad but I was still surprised that there was toxicity at all, maybe I shouldn't be
@YoussefMohamed-er6zy 4 місяці тому ⁺¹
Finally a new video!!!🎉🎉🎉
@matusstiller4219 4 місяці тому ⁺¹
Great video, like always.
@aiexplained-official 4 місяці тому
Thanks matus
@k225 3 місяці тому ⁺¹
AIs are experiencing the real world of academic exams. I remember several times in college where textbooks were wrong, exam questions were ambiguous, or we were told to give outdated or blatantly wrong answers to pass tests and get good grades.
@onoff5604 4 місяці тому ⁺¹
Great video!! many thanks. On the topic of generated images of human faces: Look at the shirt collar (and ears and ear-rings if you can see them), instant give-away. The face is phenomenal...but textile manufacturing is apparently a harder problem.
@aiexplained-official 4 місяці тому
Nice spot
@jessedbrown1980 4 місяці тому ⁺¹
Jesus crist. So many implications from this, and will result in massive improvements. Thank you so much for pointing this out as it will really slap AI into hyper drive.
@onil2301 4 місяці тому
Is there a way to access the document you've compiled of the errors that you found in the MMLU Benchmark? i would like to source it for my bachelor thesis, if that's possible.
@dcgamer1027 4 місяці тому ⁺²
Apprciate the updates as always, I was wanting to look more into the MMLU since you mentioned people still using it and thought I'd go back and watch your video on it, but it's not in the description, might be a good idea to put it there since you played a bunch of it at the end here. I assume I'm not the only one that might want to look more at that part.
Anyways ty and have a good day :)
edit: also just a thought, has anyone compiled an exhaustive list of the issues in the MMLU test? And if so does anyone have a link to that list?
@aiexplained-official 4 місяці тому ⁺¹
Hey dc, thought I put it in there somewhere! Can search mmlu broken benchmark too. And no, to the best of my knowledge this channel has shown the biggest repository of mistakes in that benchmark
@haileycollet4147 4 місяці тому ⁺³
Please make a cleaned (remove or fix questions) MMLU bench as a PR to Eleuthear's evaluation benchmark :)
@haileycollet4147 4 місяці тому
Some fixes better than none...
@Houshalter 4 місяці тому
You can't just change a benchmark that is already widely used. It would create confusion when different models are tested at different times. And produce results that aren't comparable to each other. It needs to be a new benchmark like "MMLU 2"
@haileycollet4147 4 місяці тому
@@Houshalter I mean, arguably it's pretty worthless in its current state ... I suppose it could be its own bench, or v2 or 1.5 or whatever, but seems better to fix it somewhere than to just say it's bad, since it's gonna get used anyway...
@lorenzoblz799 4 місяці тому ⁺¹
It would be interesting to take a few LLMs and ask them to evaluate the questions: are they clear, are they ambiguous, do they make sense?
@clearpupil 4 місяці тому ⁺²
This explains why I did so badly in my medical exams. The college has all the wrong answers :-)
@aiexplained-official 4 місяці тому
True true
@user-pf9jv1fl2n 4 місяці тому ⁺¹²
Great video just one question
Do you feel the AGI?
@ekstrajohn 4 місяці тому ⁺²
The others think you should no longer be on the board. It's not my decision, really.
@youssefanajar4061 4 місяці тому ⁺¹
Best yt channel
@KolTregaskes 4 місяці тому ⁺¹
4:30 Not many people are talking about the flaws in these benchmarks, e.g. MMLU. Perhaps we need another video on this?
@KolTregaskes 4 місяці тому
I've read, heard and watched a lot of content for Gemini and very few mentioned any issues with MMLU.
For once I think a more clickbaity title is needed.
@aiexplained-official 4 місяці тому
Haha, more so than the original 'Broken Benchmark Smartgpt' one!
@KolTregaskes 4 місяці тому
@@aiexplained-official Hehe, indeed. Perhaps it needs spelling out more, including words like "MMLU" and not "SmartGPT". BTW, how is SmartGPT going?
@aiexplained-official 4 місяці тому ⁺¹
@@KolTregaskes more to come on that front in 2024...:)
@Y3llowMustang 4 місяці тому ⁺¹
Wow that was surprisingly sudden end to the video
@aiexplained-official 4 місяці тому
The wonderful day came earlier!
@lhomme_bepis 4 місяці тому ⁺¹
Could you add timeline sections to your videos? I'd like to see an outline of what topics exactly are being covered at a quick glance
@aiexplained-official 4 місяці тому
When I add timestamps YT doesn’t automatically segment the video, wondering what I am missing
@freek633 4 місяці тому ⁺²
Phi-2 is a tiny model that could fit on a phone, but it outputs huge language models like Llama 2. (from the caption) outputs should be outperforms!
@aiexplained-official 4 місяці тому ⁺²
Thanks freek, missed that!
@supertetleman 4 місяці тому ⁺¹
Just wait until Jan 8th. No meetings on the last week of December or first week of Jan. All the AI researchers have extra time to work on their pet projects and keep the compute famrs running over the holiday; I expect to see some interesting results, it's always the most productive time of year.
@carterellsworth7844 4 місяці тому ⁺⁷
Is it rational to say that if Google and OpenAI are using the MMLU benchmarks in this way without acknowledging the benchmarks problems that they are behaving too naively to deserve public trust to try and solve the alignment problem?
It's so blatant once you point it out that I find it very disturbing no one else talks about it
@skierpage 4 місяці тому
The two issues seem unrelated. The numbers game to two decimal digits is stupid when the benchmarks are 1% flawed, and training to the test when the test is bad may degrade models' real-world abilities, but what does that have to do with alignment?
@patronspatron7681 4 місяці тому ⁺³
Me thinks the Phi models were named after you. :-)
@aiexplained-official 4 місяці тому ⁺¹
Haha, very kind of you to think it!
@davidbutler9323 4 місяці тому ⁺²
By this time next year, I expect to see a continuous stream of AI Explained content generated by Phillip-2 or I'll be really disappointed.
@aiexplained-official 4 місяці тому
Haha, I will be human-generated to the end
@mukulishwar2737 4 місяці тому ⁺¹
Can you also talk about the newly released Mixtral 8x7b?
@anywallsocket 4 місяці тому ⁺¹
Soon we’ll have to get the LLMs to not only generate the next update’s training data, but to prove to us the labels are correct, because otherwise we are limited by what we think we know.
@DavidsKanal 4 місяці тому ⁺²
Hey Philip, dunno if it's me watching this at 6am but this video felt a little fast and stressful. Do you think you could integrate a 1 to 2-second pause before switching to a new topic to give us time to digest the information?
@aiexplained-official 4 місяці тому
Thanks for the feedback David, will bear it in mind!
@be2eo502 4 місяці тому ⁺¹
Agreed. We poor biological intelligences need a longer pause between concepts to integrate the information.
@aiexplained-official 4 місяці тому ⁺¹
It's like we need a smidgen of noise between the signal
@Veileihi 4 місяці тому ⁺¹
Feels like we're a part of those vaguely touched upon histories in AI movies 😅
@aiexplained-official 4 місяці тому ⁺¹
Haha, nice way of putting the strangeness
@Jeremy-Ai 4 місяці тому ⁺¹
Yeah, you are correct.
This MMLU benchmark is significant.
It appears to me to be either gross negligence, or a test to challenge the very best of testers…. Without ever asking or encouraging or paying them to fix the most fundamental flaws in these models for free.
It’s pretty sad to hope for gross negligence.
But ignorance has no bounds.
If not., we should start your lawsuit right now.
As you have proven your claim.
Take good care my friend,
Jeremy
@micbab-vg2mu 4 місяці тому ⁺¹
Thank you for the video. The next year I will test those Open Source models - at the moment I am using only GPT4 and Claude 2.1 - I am looking for the place into my workflow for Gemini (maybe translations?) - I hope Ulimate will beat GPT4 in real life.
@aiexplained-official 4 місяці тому
Thanks as always micbab
@BradleyZS 4 місяці тому ⁺³
The errors with the MMLU makes me think a good test for AI should have trick questions - questions without actual answers or lacking the appropriate option - to test the AIs ability to recognise when it doesn't know or can't find the answer.
@skierpage 4 місяці тому
I think the video showed that GPT-4 would give a better answer than any of the garbled multiple choice answers. I think you could engineer a different test-taking prompt where you prompt the AI to pick the best multiple choice answer but also point out when there's a problem with the Q&A.
One problem is these technical reports are drowning in a sea of benchmark numbers, so I'm sure the person cranking out all the scores to two decimal digits has no time for nuance or evaluation.
@BradleyZS 4 місяці тому
@@skierpage
While it is useful to let it answer freely, in terms of serving people AI should be able to work within constraints. Otherwise it will likely become just an advertising tool, always telling you to buy the industry tool to get the job done.
In an example specefic to me, I do a lot of python programming on my phone and ChatGPT often gives coding examples for libraries that don't don't work on my phone. So it's handy if we can give it a constraint - asking it to solve the coding problem with a specefic library - since we may want the best solution we can do right now than the theoretical perfect solution.
@skierpage 4 місяці тому
@@BradleyZS Make up your mind. Do you want to constrain the AI to answer a multiple choice question, or point out that it's flawed? What should the AI do in response to a sleazy lawyer: "Have you stopped beating your wife? Answer yes or no!"
@BradleyZS 4 місяці тому ⁺¹
@@skierpage
The ideal would be if the AI could recognise the intent of such questions. That it could understand that a leading question is intended to ascribe undue guilt to it, or that a trick test question exists to test the AI's ability to react to an impossible task.
Such an ability, I believe, is crucial for the progression of AI beyond the simple LLM. An AI should be able to understand the desire of the user, and in the context whether it should give the best answer or admit the inability to answer.
@Buidlre_69455 4 місяці тому ⁺¹
My litmus test is taking it out for a spin. Some models that score well on benchmarks do basically nothing for me. My experience thus far is that nothing beats GPT 4. Not yet anyways.
@aiexplained-official 4 місяці тому
I did! It's really good for such a small model
@Dron008 4 місяці тому ⁺¹
I wonder what you think about Mixtral 8x7b model? And what about new MMMU (multimodal) benchmark? Is it good enough?
@aiexplained-official 4 місяці тому ⁺¹
MMMU is so much better yes. And Mixtral I am still playing about with, and investigating with the help of experts. The future looks bright.
@Dron008 4 місяці тому
@@aiexplained-officialI tried it on deepinfra site, looks good for such a small model.
@FRC_CR 4 місяці тому ⁺²
Thanks for the in-depth news as always. I was personally shocked at all the errors from the MMLU and don't understand why it keeps getting used...
@aiexplained-official 4 місяці тому ⁺¹
Thanks FRC
@andybrice2711 4 місяці тому ⁺³
I wonder if it's a good idea to remove all "toxicity" from datasets though. I'd imagine it's necessary to hear plenty of bad ideas in order to understand them. It might result in a model which is naïve.
@aiexplained-official 4 місяці тому
Interesting point, nice one
@zockermarlon5183 4 місяці тому ⁺¹
comment for the algo. keep up the great videos :D
@aiexplained-official 4 місяці тому
Thanks zocker!
@tomski2671 4 місяці тому ⁺¹
By my estimate it cost about $70k to train. However the real cost is preparing the data.
@humunu 4 місяці тому ⁺¹
MMLU...WTF? (Merry Christmas/Happy Holidays)
@SuperMunQ 4 місяці тому ⁺¹
Fantastic video again! What do you think about benchmarks being done by a neutral 3rd party? I'm thinking of making a benchmark system that only uses custom datasets that change on some interval, so the results aren't affected by being in the training data of the models.
@aiexplained-official 4 місяці тому
That would be great
@SuperMunQ 4 місяці тому
@@aiexplained-official Thank you for the reply :) Happy to hear that!
@weakmindedidiot 4 місяці тому ⁺¹
You're doing gods work out here Philip. You have by far and away the best AI news / analysis YT channel. I'm an AI dev, and I'll add your discord and drop by. Thanks for the hard work.
@aiexplained-official 4 місяці тому
Wow thanks so much, far too kind!
@Rkcuddles 4 місяці тому
16:46 “type of question that depends which source you ask” I didn’t catch this point. Anyone can elaborate?

Наступне

Автоматичне відтворення

4 Reasons AI in 2024 is On An Exponential: Data, Mamba, and More