[ML News] Grok-1 open-sourced | Nvidia GTC | OpenAI leaks model names | AI Act

Yannic Kilcher

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 17 тра 2024
OUTLINE:
0:00 - Intro
0:15 - XAI releases Grok-1
2:00 - Nvidia GTC
4:45 - Comment of the Week
5:35 - Brute-forcing OpenAI model names
7:30 - Inflection AI gets eaten by Microsoft
9:25 - EU AI Act moving forward
11:45 - Advances in Robotics
14:00 - India retracts controversial advisory
14:30 - OpenSora
15:20 - Improved Gemma fine-tuning
16:20 - Decoding encrypted LLM traffic
17:45 - Varia
References:
x.ai/blog/grok-os
github.com/xai-org/grok-1
finance.yahoo.com/news/nvidia...
spectrum.ieee.org/nvidia-gr00...
anshelsag/status/...
/ 1769770983924142475
arthurmensch/stat...
arithmoquine/stat...
files.catbox.moe/od9pyb.txt
techcrunch.com/2024/03/19/aft...
archive.ph/p4W1N#selection-24...
reelC4df3D...
techcrunch.com/2024/03/15/mer...
www.axios.com/2024/03/14/huma...
techcrunch.com/2024/03/15/ind...
github.com/hpcaitech/Open-Sora
/ gemma_finetuning_shoul...
felix_red_panda/s...
/ 1768386949201408103
ollama/status/176...
arxiv.org/pdf/2403.09611.pdf
github.com/lavague-ai/LaVague
blog.research.google/2024/03/...
www.cnbc.com/2024/03/18/apple...
blog.google/products/search/g...
stability.ai/news/introducing...
Nils_Reimers/stat...
Links:
Homepage: ykilcher.com
Merch: ykilcher.com/merch
UA-cam: / yannickilcher
Twitter: / ykilcher
Discord: ykilcher.com/discord
LinkedIn: / ykilcher
If you want to support me, the best thing to do is to share out the content :)
If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
SubscribeStar: www.subscribestar.com/yannick...
Patreon: / yannickilcher
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n
Наука та технологія

КОМЕНТАРІ • 144

@genomexp Місяць тому ⁺⁹⁰
I'm so glad to see ML news back in action with more regularity. You've got actual knowledge and credibility that matters for presenting info in a rapidly crudifying space, the scene is filling up with empty influencer know-nothings, and I want the straight dope and technicals. Thank you.
@AncientSlugThrower Місяць тому ⁺⁵
Yannic's channel was my first experience in ML and AI news when GPT4 exploded. Yannic is the real deal, one of the best and most reliable sources available.
@2ndfloorsongs Місяць тому ⁺¹
The sunglasses are the most important, followed by the humor; but yeah, okay, I appreciate the information as long as it doesn't interfere with my fandom.
@handris99 Місяць тому ⁺²
Yeah same here. I know you are busy with open assistant and stuff like that (keep those things up) but we need your AI news and paper reviews! How else am I gonna move on with my own AI project. Half the technology I'll need is still buried in some obscure paper somewhere in the paper pile!
@lachland592 Місяць тому ⁺²¹
For the Grok model it is worth noting this has no fine tuning, and its performance is bad. I want to give Elon credit but this seems like more of a performative release than a real contribution.
@waterbot Місяць тому ⁺⁵
i love a good Monday ML news on Tuesday that I watch on Wednesday, cheers Yannic!
@MonkeySimius Місяць тому ⁺⁹
I'm always happy to see your videos. You give informative breakdowns without getting lost in the sauce about how whatever minor new improvement shocks the industry and marks full on AGI.
Thank you
@AncientSlugThrower Місяць тому ⁺⁸
Inflection put out a pretty good product with Pi. The Pi chatbot is tuned to being a companion and even as a kind of therapist. Giving an empathetic digital face to Microsoft will not end well.
@IsraelMendoza-OOOOOO Місяць тому ⁺⁹
Great To see you again God Bless You 🫵🏼❤️
@gaggix7095 Місяць тому ⁺¹⁴
The robots are humanoids because the tools have been developed around the human topology, so if you want to have a robot in the future to interface with these tools, humanoid is the most optimal form.
@pedrogorilla483 Місяць тому ⁺³
I generally agree with you but there’s also the marketing aspect of it.
@MFBjosejuFanNumberUan2047 Місяць тому ⁺³
WHEN AND WHERE SHALL WE GET THE CAT ROBOT GF!!!!
@2ndfloorsongs Місяць тому ⁺²
This is AI world, YOU are the cat.
@bbamboo3 Місяць тому
appreciate the information/evaluation density.
@dmytroivakh6164 Місяць тому
Great work! Keep it up!
@dylan_curious Місяць тому ⁺¹
Make sure you think about the downsides to open sourcing frontier models also. Open source might be the best way but it's not clear to me the benefits are worth the risks.
@ScibbieGames Місяць тому ⁺²
I don't know if I've missed your coverage, but 1.58bit model training / inference source code has been released, which is interesting because of the scaling law suggested by the associated paper.
@jonmichaelgalindo Місяць тому ⁺¹
Humanoid bots: It's about training. If you want to show a bot how to do something, it needs to be able to follow your example. If it's not a humanoid, it has to solve every task in existence from scratch. If it's humanoid, it can learn from human actions.
Great video. Thanks for the detailed and broad content! ❤
@rexrelic Місяць тому
Very very cool AF😎
@memegazer Місяць тому ⁺⁵
You joke...but I think mandatory discoslure of AI is the easist regulatory hurdle that will make a huge impact imo.
Ideally people would discoluse such without regulatory compulsion...but imo it is important, to me at least, that such disclosures exists moving forward.
@johnflux1 Місяць тому ⁺²
Just like how the cookie disclosure has had a huge impact, and California's cancer warnings?
@danielsan901998 Місяць тому ⁺¹
@@johnflux1 GDPR have a huge impact, even american companies had to adapt.
@ra2enjoyer708 Місяць тому
@@johnflux1 Yes it did make an impact, since now a typical Norman knows "cookie" is some scary legalistic jargon and not just a magical internet feature which "just works".
@BooleanDisorder Місяць тому ⁺¹
I'm excited for the future.
@2ndfloorsongs Місяць тому
I'm not sure if the future needs our support. But that's probably just me. I have a hard enough time maintaining interest in the present, being excited about a future is way beyond my abilities.
@serta5727 Місяць тому
Very cool ❤
@marcussky Місяць тому ⁺²
@yannic there is a large FP4 literature now (NF4, QLoRa etc) with hundreds of QLoRa models on HuggingFace
@matteofrattini9133 Місяць тому ⁺¹
As a fellow AI engineer/enthusiast of course I'm happy when we can just release cool stuff, but Europe is going in the right direction by starting to regulate the field. While AI is amazing there are still countless possibilities of misuse, overapplication and even rights infringement (I'm thinking privacy), so regulating big corps early on is necessary.
Or we can just take the US route and have sugar in our bread lmao
@spleck615 Місяць тому
Amazing to hear they finally open sourced Grok-1. No doubt given the channel history you will build it from scratch and validate it matches the distributed weights and doesn’t have any sleeper agents, etc, as you can do with any good open source project. Right? We don’t just have to take the word of the guy that has repeatedly lied and misled many? That’s the power of open source, right? Trust, but verify.
@TylerMatthewHarris Місяць тому ⁺²
Wow
@unclecode Місяць тому
Love your joke in EU AI Act, watched twice 🤣🤣🤣
@k98killer Місяць тому
I experimented with rolling my own 4-bit float encodings, and the lack of precision made them challenging to use. Maybe it will be useful with the first several passes of quickprop.
@jayhu6075 Місяць тому ⁺¹
In India, the government has considered a non-regulatory approach, emphasizing the need to innovate, promote and adapt to the rapid advancement of AI technologies.
The EU should do the same. Innovation is very important than the rules about regulation, otherwise we will lag behind in development compared to the US, China, the politicians have no understanding of technology at all, which has major consequences for knowledge technology and our future generation. Many thanx for your great update.
@matteofrattini9133 Місяць тому
The EU is fully aware that its innovating capacity is miles behind American or Chinese massive corporations, there's just no way Europe will win the research battle in an open field. I think regulating and sort of "protecting" the market from these corporations is a good move, especially since the applications of AI will keep covering more and more ground in the next decades
@technolus5742 Місяць тому
@@matteofrattini9133 Not so sure about that, the US is the biggest business center, which is different from being the most innovative.
European regulation is likely to influence guidelines across the globe. It's not just about protecting the market, but ensuring some level of safety with a technology that is dangerous.
@matteofrattini9133 Місяць тому
@@technolus5742 I fully support Europe's effort to regulate a potentially dangerous field. I'm just saying it's also a strategic move, since Europe could never achieve technological dominance in an unregulated field against powerhouses like the US or China
@soylentpink7845 Місяць тому
Nice
@anthonyward8805 Місяць тому ⁺³
Humanoid robots also make sense if you believe that we can get video pre-training to work from human videos
@aethelyon Місяць тому ⁺¹
The models are the private deployment capacity for those companies.
@SinanAkkoyun Місяць тому
Is the groq inference code as fast as the one they use for hosting?
@zhandanning8503 Місяць тому
How does the 1 bit embedding work? Can someone explain? That eould mean things are encoded into binary right? So does that mean precision foesn't really matter to throughout the machine learning model? That at some point it becomes like a pigeon hole thing within the model?
@d_b_ Місяць тому
Wow, does that list expose companies that fine tuned models with OpenAI?
@rexrelic Місяць тому
Can you please make a video explaining this paper on anomaly detection "Asymmetric Student-Teacher Networks for Industrial Anomaly Detection"? It will be a great help😊
@jonclement Місяць тому
Khan academy was one of the first reported partners using openai so I assume the gpt -khan model name is specific to a Khan application rather then just trained on khan data nuance.
@gpeschke Місяць тому ⁺²
The Google post was to help the SEO shamans start hallucinating in the right direction. Without it, there are prone to all sorts of fun thinking. Blog posts like that keep them mostly sane.
@Hexanitrobenzene Місяць тому ⁺¹
SEO Shamans :)
@florianhonicke5448 Місяць тому ⁺¹
🎉 You make people to like mondays
@wolpumba4099 Місяць тому
Terrifying robot movement at 4:37
@markr9640 Місяць тому
"These people had wey too much precision" 🤣
@Sven_Dongle Місяць тому ⁺¹
Grok-1 256GB for the weights. Good luck.
@rando6836 Місяць тому ⁺¹
RIP, Josh.
@dubhd4r4 Місяць тому
Josh sweating like crazy right now
@velo1337 Місяць тому
ai banner would be great.
@graham8316 Місяць тому
Flat repo for grok is hard asf
@fitybux4664 Місяць тому ⁺¹
23:00 It used to be "one guy" they all called, but now they just get LLMs to hallucinate the answers. 😆
@Phobos11 Місяць тому ⁺⁴
Rest of the world: develops AI and makes it open source
The EU: we don’t do that here
@seanreynoldscs Місяць тому
Those look like open ai api adapters. Ie the APIs that OpenAI can access.
@fitybux4664 Місяць тому
3:27 Half a bit on, half a bit off, you're half way to quantum computing!
@VincentVonDudler Місяць тому
22:40 - I hope Google bargained an Android release of iMessage into the deal.
@retronyme Місяць тому
For one I think the AI act is a good thing for people !
@ClaudioMartella Місяць тому ⁺⁶
there's actually a paper that shows that ~1.5 bits per weight is enough
@Sven_Dongle Місяць тому ⁺¹
And that paper is probably good for wiping butts.
@ClaudioMartella Місяць тому ⁺²
@@Sven_Dongle why?
@Sven_Dongle Місяць тому
@@ClaudioMartella If you think there is no difference between 4 and 8 bit quantization you havent worked with these models at all. And what even is
~1.5 bits per weight? 3 bits per 2 weights? It stretches credulity.
@AM-yk5yd Місяць тому ⁺¹
@@ClaudioMartella that paper came from the authors of "RetNet: the successor to transformers" (~half of the authors are the same to be more exact).
This time they didn't want to be just to be just a mere successor and called it a new era. They are high on their own farts.
Oh, and retnet was such a successor that the only model released was a pile of garbage and got unreleased. 1.58bit paper didn't even compare themselves to the "successor' of transformers that they build themselves
And author never released the weights.
I expect their next paper to be called "The bestest revolutionest architecture in the mutliverse" with loud claims and no impact.
@DajesOfficial Місяць тому ⁺²
@@Sven_Dongleyou don’t seem to understand the difference between the possibility that some model can perform well with 1.5 bpw and ability to cast any model (presumably pretrained with 16bpw precision) into 1.5 bpw without losing quality
@pedrogorilla483 Місяць тому
I like how you don’t jump on the hype train as soon as it passes by. I see you were here before it was cool.
@manishsharma2211 Місяць тому
samay ki english achi hogyi hai xd
@scorpiorok Місяць тому ⁺⁴
Grok=314B, Pi=3.14..., I assume this is deliberate?
@zacharyormstedt8514 Місяць тому ⁺⁴
You sure did some deducing there detective
@SteveStavropoulos Місяць тому ⁺⁴
Given that we are talking about Musk, I would guess his next model will have exactly 3141B parameters (and the next next one, 31415B). And that will be a hard requirement given to his engineering team...
@lexer_ Місяць тому ⁺³
naah its going to be 420B first
@GoldenBeholden Місяць тому
I guess for every GDPR we get a cookie-esque law.
@truehighs7845 Місяць тому
Question is how many groqs it takes to run grok.
@regularsteven Місяць тому ⁺³
Hi Yannic, Can you please confirm that you're coming to WebExpo? I've tried a few times to get in touch. Hope to get a reply, and I'm sorry for chasing. Please let us know.
@DanielWolf555 Місяць тому
12:45 - All the technology and buildings and tools that mankind has created is made for the humanoid form. Making humanoid robots would mean that they could use everything we use as well. E.g. you want your floor cleaned? Get a robotic vacuum cleaner? No, better give your ordinary vacuum to a humanoid robot so that he cleans the floor with it.
@tensor_verkampen Місяць тому
That list of model names appear to have been thoroughly scrubbed... Catbox 404.
@LukaszWiklendt Місяць тому
The latest most groan-inducing eye-rolling marketing buzzword is describing an AI as "terrifying".
@ozten Місяць тому
We should use genetic algorithms to evolve the ideal robot form factor based on parts, cost, and human built environments and tasks. Maybe they won't look like humans!
@GarethDavidson Місяць тому
"employ, exploit, extinguish."
@TiagoTiagoT Місяць тому
Microsoft never stopped doing EEE, did it?
@XOPOIIIO Місяць тому ⁺⁵
People should confirm agreement on cookies just once - when they connect to the internet. It should be written in the agreement with provider.
@ra2enjoyer708 Місяць тому ⁺³
The person who had not used internet before will never know what "agreement on cookies" even is, especially since the agreement can be constructed in misleading but technically correct ways aka "do you want to create accounts on the internet?". Which would ofc implicitly include an account on provider's site, bonus points if said provider holds a de facto monopoly in the area.
That's akin to implicitly agreeing to be stabbed by a knife just because you bought one for your kitchen.
@erikjohnson9112 Місяць тому ⁺¹
I realize nobody probably cares but GR00T is zeros no the letter O. GR00T vs GROOT. It jumps out once you see it printed on a page that shows cap-O, like 4:00
@Brahvim Місяць тому ⁺¹
Thanks for telling me.
...and me specifically, given how you yourself think about not many caring.
@erikjohnson9112 Місяць тому
@@Brahvim Just trying head off the "who cares?" default response from YT comments. Personally I think it is neat to discover things by paying close attention (in this case visually).
@Brahvim Місяць тому
@@erikjohnson9112 At least it isn't a "secret" anymore, thanks to you! People WILL know it's a `0` and not the letter 'O' now!
...
People like _me,_ at least!
It really is great. Keep it up, dear internet stranger!
@alexijohansen Місяць тому ⁺¹
very competent …. single source 1400 lines of code 🎉😂
@Hukkinen Місяць тому ⁺¹
I'm all for EU cookie-nags. Just say no, and I always do. Companies maximize their data hogging anyway, so I don't see why there would be less data by default in an alternative universe, where no EU cookie law exist.
@ParkerSeeley Місяць тому
6:09 Welp, guess josh is getting fired
@1PercentPure Місяць тому
I love you andnml news so kuchby
@jeffrey5602 Місяць тому
Can we get a 250k subscriber special on ML meme reviews please?? 😃
@scottmiller2591 Місяць тому
"First major AI law passed by European LLaMakers"
@GilesBathgate Місяць тому
Well 1-bit is closer to a 'biological' activation function 🤷
@testboga5991 Місяць тому ⁺⁸
Open sourcing Grokh was cheap, it was a basically useless model in terms of commercial usefulness.
@tho207 Місяць тому
and useless in terms of research since it's jax
@memegazer Місяць тому
On open source Sora like models.
Imo there will be no parity bc of compute.
The results I have seen from sora indicate to me that it is not raw vid train data, but rather a hybrid of synthetic data that uses either a nerf or guasssplat supplement to create that level of fine control and temporal fiedility.
As impressive as sora is, it still seem obvious to me that 3d rendering is part of their training pipeline, and imo the most reasonable explantion of how to get large amounts of synthetic data there is with nerf/guass splat
@quantumjun Місяць тому
FP4 maybe we just need 0/1
@alansmithee419 Місяць тому
11:10
I'm slightly confused here, because it sounds like you think the lawmakers can shift their focus onto developing technology instead.
But I know you don't think that, I'm just not really sure what your point is.
@memegazer Місяць тому
lol...I am not surprised you are impressed with grok considering your 4chan llm.
But as far as I am concerned they are about par in terms of impressiveness relative to state of the art.
@memegazer Місяць тому
Also...not sure how reliable "popularity graphs" are concerning online tools...whang did a cool vid about such voting metrics are easily manipulated...boaty mcboat face examples come to mind.
@technolus5742 Місяць тому
The performance is quite poor, it's useless to them (and to the vaaast majority of people). If they had a SOTA model, I wonder if they would open-source it.
Also: it's an MoE model, not all parameters are active at the same time. Expect it to require about the same compute as an 80b model.
@skierpage Місяць тому
23:30 c'mon Yannick, use the original for "On the internet, no one knows your dog" which is Peter Steiner's 1993 cartoon in The New Yorker.
As I said last week, your curved movie-screen topic thumbnails should snap into a full screen view, not whiz off the side.
@6AxisSage Місяць тому
The transformer architecture consists of multiple decoder layers, each containing:
- Multi-head attention (MHA) with query, key, and value projections
- Mixture of experts (MoE) layer
- Feedforward layers (linear transformations with activation functions)
- Layer normalization and RMS normalization
The MoE layer uses the `Router` to compute routing probabilities, which determine the experts to route the input to. The selected experts process the input independently, and their outputs are combined based on the routing probabilities.
The multi-head attention mechanism allows the model to attend to different positions in the input sequence, capturing dependencies and relationships between tokens. The rotary position embeddings (RoPE) enhance the model's ability to capture relative position information.
The transformer model takes an input sequence and applies the decoder layers sequentially. At each layer, the input goes through the MHA, MoE, and feedforward layers, with layer normalization and residual connections. The final output of the transformer is the embedded representation of the input sequence.
The code also includes sharding and partitioning utilities to distribute the model across multiple devices for efficient training and inference.
Overall, this transformer architecture incorporates mixture of experts layers and rotary position embeddings to enhance the model's capacity and ability to capture complex dependencies in the input sequence.
@WiseCheese587 Місяць тому ⁺¹
Bro what are you doing commenting on yt go build West World with that big brain or please be the next president a
@Sven_Dongle Місяць тому ⁺¹
@@WiseCheese587 Its just a regurgitation of the Grok-1 specs, genius.
@d96002 Місяць тому
why not 420 billion parameters ?
@snakeonex Місяць тому ⁺¹⁰
I kinda feel bad, that we got to a point that everytime someone mentions Elon, he has to first say "I am not endorsing him / what ever you think about him / yeah he is bad, but this is good", he is really reaching Trump level rep isn't he
@bryce.ferenczi Місяць тому ⁺¹⁵
He's done it to himself. No one told he should try as hard as possible to ruin his own reputation.
@Dogo.R Місяць тому ⁺³
I love how he said "what ever you think of him" 2x.
Yet you added versions he didnt say that imply or dirrectly say elon is more on the bad side.
Which is not what he said in the video.
Very weird to read your comment.
Its like subtly changing what happened... but in a way where its doesnt overtly look like lying unless you rewatch the video clip.
Please try to be more accurate in the future.
@voyagerrock1137 Місяць тому ⁺¹
I mean it's his own fault, he's a lying scumbag
@gpeschke Місяць тому ⁺³
'Whatever you think of him' is a pretty unsubtle code. It's a polite way of distancing yourself from a person.
Added with endorsement for his actions, and he's taking a pretty neutral stance. Which is a wise public stance around polarizing figures.
Regardless of which pole he's actually on, it's foolish to spend his social capital on the subject. That's the message he's actually sending.
@maheshprabhu Місяць тому
@@Dogo.R you can't be that naive. Elon Musk has pretty much forced people to pick sides when forming an opinion on him.
@IsraelMendoza-OOOOOO Місяць тому ⁺¹
Split one side Truth word Of God/other Ungodly things ❤
@schwajj Місяць тому
Lol Grok-1 “will probably require 69 GPUs to run”, haha at least that many. Probably more like 420 😂
@Sven_Dongle Місяць тому
And its 256 GB for the weights, so you probably need a tera of RAM, then a combined tera of RAM over the range of the GPU's
@bigbug04 Місяць тому
Most of the Open Source are rather just Open Sores
@allurbase Місяць тому ⁺³
If a robot costs more than a humans year salary, it's not worth it.
@pjtren1588 Місяць тому ⁺¹
Yet...
@drdca8263 Місяць тому ⁺¹
Depends on the upkeep costs and how long it remains how useful, right? Especially if there’s a “rent-to-own” option!
@allurbase Місяць тому
@@drdca8263I was assuming an average lifetime of 1 year. Sabotage by human coworkers is definitely going to be a thing.

Наступне

Автоматичне відтворення