[ML News] Chips, Robots, and Models

Поділитися
Вставка
  • Опубліковано 14 тра 2024
  • OUTLINE:
    0:00 - Intro
    0:19 - Our next-generation Meta Training and Inference Accelerator
    01:39 - ALOHA Unleashed
    03:10 - Apple Inks $50M Deal with Shutterstock for AI Training Data
    04:28 - OpenAI Researchers, Including Ally of Sutskever, Fired for Alleged Leaking
    05:01 - Adobe's Ethical Firefly AI was Trained on Midjourney Images
    05:52 - Trudeau announces $2.4billion for AI-related investments
    06:48 - RecurrentGemma: Moving Past Transformers for Efficient Open Language Models
    07:15 - CodeGemma - an official Google release for code LLMs
    07:24 - Mistral AI: Cheaper, Better, Faster, Stronger
    08:08 - Vezora/Mistral-22B-v0.1
    09:00 - WizardLM-2, next generation state-of-the-art-LLM
    09:31 - Idefics2, the strongest Vision-Language-Model (VLM) below 10B!
    10:14 - BlinkDL/rwkv-6-world
    10:50 - Pile-T5: Trained T5 on the Pile
    11:35 - Model Card for Zephyr 141B-A39B
    12:42 - Parler TTS
    13:11 - RHO-1: Not all tokens are what you need
    14:59 - Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs
    References:
    / 1780263768968273923
    ai.meta.com/blog/next-generat...
    soumithchintala/s...
    deepnewz.com/tech/apple-inks-...
    TolgaBilge_/statu...
    javilopen/status/...
    paulg/status/1781...
    / 1776706907295846628
    www.cbc.ca/news/politics/fede...
    arxiv.org/pdf/2404.07839
    huggingface.co/blog/codegemma
    mistral.ai/news/mixtral-8x22b/
    MistralAILabs/sta...
    huggingface.co/Vezora/Mistral...
    huggingface.co/Vezora/Mistral...
    WizardLM_AI/statu...
    _philschmid/statu...
    huggingface.co/BlinkDL/rwkv-6...
    blog.eleuther.ai/pile-t5/
    huggingface.co/HuggingFaceH4/...
    huggingface.co/MaziyarPanahi/...
    reach_vb/status/1...
    arxiv.org/pdf/2404.07965
    arxiv.org/pdf/2404.05719
    sambanova.ai/blog/samba-coe-t...
    www.microsoft.com/en-us/resea...
    twelve_labs/statu...
    drive.google.com/file/d/1Av5j...
    arxiv.org/pdf/2404.12387
    arxiv.org/abs/2404.12241
    arxiv.org/pdf/2404.12241
    / 1780650122382049596
    audiodialogues.github.io/
    os-world.github.io/
    ai.meta.com/blog/openeqa-embo...
    arxiv.org/pdf/2404.07503
    arxiv.org/pdf/2404.06654
    amanrsanger/statu...
    huggingface.co/datasets/xai-o...
    github.com/PygmalionAI/aphrod...
    github.com/jina-ai/reader/?ta...
    r.jina.ai/x.com/elonmusk
    r.jina.ai/github.com/...
    github.com/rogeriochaves/lang...
    mvpatel2000/statu...
    github.com/databricks/megablocks
    github.com/nus-apr/auto-code-...
    github.com/nus-apr/auto-code-...
    karpathy/status/1...
    karpathy/status/1...
    github.com/BasedHardware/Friend
    argmaxinc/status/...
    awnihannun/status...
    / 1776399292036501898
    pytorch.org/blog/torchtune-fi...
    If you want to support me, the best thing to do is to share out the content :)
  • Наука та технологія

КОМЕНТАРІ • 72

  • @lone0017
    @lone0017 15 днів тому +80

    wow you are really productive on UA-cam lately, good for us

    • @EdFormer
      @EdFormer 15 днів тому +7

      Couldn't agree more. I find that my appetite for machine learning research is a strictly monotonically increasing function of Kilcher's productivity on UA-cam. As a postdoc, that's quite significant lol

    • @lone0017
      @lone0017 15 днів тому +5

      @@EdFormer I find Kilcher's videos to be the path of least resistance for me to keep up with ML research. As a working person trying to get into a PhD programme, that's extremely valuable :D

    • @EdFormer
      @EdFormer 15 днів тому

      @@lone0017 you're getting the right skeptical and evidence-based perspective for your goals while being brought up to date too, which most other channels in this day and age really don't provide. Best of luck with your applications!

    • @makhalid1999
      @makhalid1999 14 днів тому +1

      Don't jinx it

  • @the_primal_instinct
    @the_primal_instinct 15 днів тому +45

    1:58 No, Yannic, the difference between high cost robots and low cost robots is their price.

  • @andreamarisio
    @andreamarisio 14 днів тому +18

    Here the missing chapter timings!
    15:42 Samba
    17:06 Vasa-1
    17:55 Pegasus-1
    18:17 Reka
    19:58 AI safety introduction v0.5
    20:20 Longer context benchmark x post
    21:17 New Audio Dialogue Dataset
    22:33 OSWorld: Agents Environment
    24:39 OPENEQA AR QA Dataset
    26:41 Best Practices and Lessons Learned on Synthetic Data for Language Models
    27:07 Ruler: Real Context size paper
    29:57 Aman's post: SWE-bench contaminated supposition
    31:28 Realwor1dQA Dataset
    32:34 Repositories
    36:13 X Posts
    38:09 torchtune: fine tuning library from Pytorch

  • @TheTruthOfAI
    @TheTruthOfAI 10 днів тому +2

    A scientific channel, looking at papers. Yessss sir

  • @OperationDarkside
    @OperationDarkside 15 днів тому +11

    "UA-cam, Audio, No"
    The content I'm here for.

  • @thenoblerot
    @thenoblerot 14 днів тому +5

    Holy cow, you're turning out the videos. Thank you so much for sharing your expertise

  • @lexer_
    @lexer_ 15 днів тому +3

    I would be really interested in a video about the different cutting-edge fine tuning methods and how they differ.

  • @meselfobviouslyme6292
    @meselfobviouslyme6292 14 днів тому +2

    Thank you Mr Yannik for giving us the latest updates in this edition of Machine Learning news.

  • @RollingcoleW
    @RollingcoleW 14 днів тому

    Keep it up! Forgot about this series but like it!

  • @wwkk4964
    @wwkk4964 14 днів тому +2

    Best AI youtube channel!

  • @jayl9053
    @jayl9053 15 днів тому +1

    Another certified Tuesday banger 😎

  • @kaikapioka9711
    @kaikapioka9711 14 днів тому +1

    Thx yan!

  • @pladselsker8340
    @pladselsker8340 14 днів тому +1

    I would love to have your point of view on what Extropic is building.

  • @wiktorm9858
    @wiktorm9858 14 днів тому +1

    Very informative video

  • @luke2642
    @luke2642 15 днів тому +2

    An excellent video. I wonder if the RHO concept is more profound than you give credit for, it seems inherently better if you could filter out bad tokens, identify the most meaningful tokens, and not necessarily need synthetic data? Token quality is more important than token quantity for humans, why shouldn't it be the same for LLMs?

  • @calvingrondahl1011
    @calvingrondahl1011 14 днів тому +1

    You are fun Yannic 😎

  • @mahdipourmirzaei1048
    @mahdipourmirzaei1048 15 днів тому +1

    I am looking for a paper review video about Rho-1!

  • @axelmarora6743
    @axelmarora6743 14 днів тому +2

    25:20 😂😂said so casually!

  • @herp_derpingson
    @herp_derpingson 14 днів тому +3

    39:00 Well you see... the difference between high cost robots and low cost robots is the cost.

  • @donmiguel4848
    @donmiguel4848 15 днів тому +1

    @0:33 Int operations are ops, not flops.

  • @mohamedhatem325
    @mohamedhatem325 10 днів тому +1

    very good video

  • @EsotericAI
    @EsotericAI 14 днів тому

    Gpt2-chatbot could be openAI gpt5, I guess it automatically pre-process input text as low colordepth image (containing the text). So any text prompt is feed into the model as text+image. The ascii-art performance is a tell of something like that.

  • @pedrogorilla483
    @pedrogorilla483 14 днів тому +1

    Yes!

  • @Timotheeee1
    @Timotheeee1 14 днів тому +3

    8:10 the 22b model is nonsense, it performs worse than a 3b model. makes sense since averaging the experts just gets you a randomly initialized FFN with some extreme outlier weights

  • @tomCatzy
    @tomCatzy 15 днів тому +1

    Since i began following here - my AI comprehension lately skyrocketed.

  • @damienteney
    @damienteney 15 днів тому +2

    Any chance you can release these as podcasts again? I have to rip the youtube videos and convert to mp3 every time to get them on my music player.

    • @therainman7777
      @therainman7777 14 днів тому

      Who owns a music player in 2024? Not trying to be a dick. But can’t you use your phone?

  • @neq825
    @neq825 14 днів тому

    The robot arm used by Aloha is the TrossenRobotics ViperX 300. Costs $6129.95

  • @schilling3003
    @schilling3003 13 днів тому

    I just wish there were better options for DIY local/edge inference. Basically nothing since the Google Coral Edge TPU 5+ years ago.

  • @leeme179
    @leeme179 15 днів тому +1

    OpenAI has signed deal with Microsoft I think in 2018, it took them really long to create their own chip

  • @ACLozMusik
    @ACLozMusik 14 днів тому

    So, the typical UI is optimized for non-technical human users for accesibility, since they are the expect end-users. We have UI for technical people with a mix of performance and usability
    How would a UI designed for models/machines and optimized for performance look like? How much performant could a model trained on it become? It's a bit of a loop thinking about an UI optimized for a model and a model optimized for an UI. At the end the model could become a more powerful UI getting closer to both the user through natural language and conversation and also closer to the machine doing the actual work
    Basically another layer of high-low language

  • @SteffenProbst-qt5wq
    @SteffenProbst-qt5wq 14 днів тому

    16:04
    gpt2-chatbot is a top performing not (yet) public new model. A recent (30.04) X post from Sam Altman hints at exactly this. It is currently being tested in the Hugging Face Chatbot Arena.
    It appears to be a GPT-2 model trained using OpenAI's industry-leading training methods.

  • @joe_limon
    @joe_limon 15 днів тому +1

    RHO-1 seems like a perfect application for mobile ai's

  • @PCPTMCROSBY
    @PCPTMCROSBY 14 днів тому

    good show today.

  • @JerryFederspiel
    @JerryFederspiel 13 днів тому

    The last fabric manipulation demo I saw maybe last year looked pretty similar... but the video was sped up 64x. The Google demo was actual speed, tremendously better.

  • @chispun2
    @chispun2 15 днів тому +1

    In most of cases we don’t know what are the training data for LLMs. How can we be sure those benchmarks were not part of their training data?

    • @therainman7777
      @therainman7777 14 днів тому

      You can’t, that’s why you need to come up with novel evals of your own creation, which you never publish anywhere.

    • @john_blues
      @john_blues 14 днів тому

      Teaching to the test we used to call it in K12 education.

  • @john_blues
    @john_blues 14 днів тому +1

    Mmmm, chips. Oh, those kind of chips. Can't eat those.😞

    • @KEKW-lc4xi
      @KEKW-lc4xi 14 днів тому

      TLC will find someone to challenge that notion.

  • @biesman5
    @biesman5 14 днів тому

    Did you hear that George Hotz hacked the RTX 4090 driver to enable P2P? I'm pretty interested of the implications this has for running local LLMs, really fascinating stuff. It was on Hacker News' front page a few weeks ago!

  • @jeffwads
    @jeffwads 15 днів тому +1

    GPT-2 solves the Aunt Agatha riddle correctly. Even Claude Opus fails to do that.

    • @therainman7777
      @therainman7777 14 днів тому +6

      Using well-known riddles and puzzles is a really terrible test of intelligence. The model could easily just be recalling the solution from its training data.

  • @dogme666
    @dogme666 15 днів тому +1

    vroom vroom vroom gpu vroom vroom vroom

  • @ariaden
    @ariaden 15 днів тому +2

    1. Low cost robots: Wake me up when they become good enough in sorting garbage to make plastic recycling profitable.

  • @TiagoTiagoT
    @TiagoTiagoT 12 днів тому

    Could someone with money to burn on dataacenter waste heat please test if it would work to have a model that is trained to re-write a certain fraction of it's own weights at inference time (half, a third, 10 percent etc); sorta like if instead of predicting tokens directly, it first predicts some of it's own weights based on the inputs; possibly also comparing if it's better to keep just certain layers unfrozen, just certain neurons unfrozen, let it decide what to freeze at training time, or even not freeze anything in specific and just let it "predict". not only the weights values but also which neurons to change as well Or some variation of this idea that makes a little more sense if what I described has some details that don't really match how the pieces fit together or whatever.

  • @YRTB2830
    @YRTB2830 15 днів тому +1

    just tried idefics, kinda suck on zero shot task to be honest...

  • @TiagoTiagoT
    @TiagoTiagoT 12 днів тому

    Oh, and talking about ideas I don' t have the money to test (see my other comment here for another crazy idea); could someone please try to train a model with training-time random ablation of both parameter count and quantization level, all on the same training, so that the final model from the get go is already optimized to work at whatever ablation level of parameter count and quantization derived versions end up having, essentially training not only for the outputs but also for sorting it''s own parameters and bits by how important they are?

  • @LouisChiaki
    @LouisChiaki 13 днів тому

    "Has anyone heard of Tensorflow lately?" Google engineers who is still using TF: crying behind the scene.

  • @nahuelpiguillem2949
    @nahuelpiguillem2949 13 днів тому

    5:59 $2.4 billion for some ai programm, $100 million for house problems, I guess what are the priorities of the governemnt

  • @codewithtj
    @codewithtj 14 днів тому

    Can you please make a video on AI Report 2024

  • @TheRev0
    @TheRev0 14 днів тому +1

    and so on and what not

  • @FredPauling
    @FredPauling 14 днів тому

    Canada investing in EhI

  • @NeuroScientician
    @NeuroScientician 15 днів тому +1

    Ilja used to make cool tutorials, hope he hasnt been Jack Ma-ed :D

  • @eladwarshawsky7587
    @eladwarshawsky7587 13 днів тому

    just casually calling out tensorflow lol

  • @calvingrondahl1011
    @calvingrondahl1011 14 днів тому

    I like therefore I am…

  • @ZooDinghy
    @ZooDinghy 14 днів тому

    I don't get Shutterstock? Arent they digging their own grave? Selling a photo for 1€ so that nobody has to ever buy it again from them? That doesnt make sense to me.

    • @AvastarBin
      @AvastarBin 14 днів тому

      I think their logic is like : if someone is gonna dig my grave, might as well be paid for it

    • @ZooDinghy
      @ZooDinghy 14 днів тому

      @@AvastarBin with $875 million in revenue, one would assume they would focus on building that thing themselves. Their financial targets for 2027 is $1.2 billion according to the Q4/2023 earnings call. The first thing I would do is to find my biggest competitor, suggest a merger and bet everything on the card that they will be the ones with the larges high quality corpus and they would sue anyone out there who uses their pictures for training.

  • @leeme179
    @leeme179 15 днів тому +1

    How many Google robots does it take to hang a shirt...

    • @KEKW-lc4xi
      @KEKW-lc4xi 14 днів тому +2

      The aloha project is pretty neat. If I had an extra $20k laying around I'd build my own and tinker too. It's all open source

  • @hansdietrich1496
    @hansdietrich1496 14 днів тому

    Dear Yannic, with all due respect: This green screen causes eye cancer. Please just put _ANYTHING_ there :) Thanks!

  • @kalilinux8682
    @kalilinux8682 14 днів тому +1

    Some of the news seems old

  • @jeffspaulding43
    @jeffspaulding43 15 днів тому +1

    I think the whole "Ally" of suskever is thrown in the title cause our brains autocorrect to "Ilya suskever" stupid clikbaiters

  • @MoFields
    @MoFields 14 днів тому +4

    I think bro misunderstands the concept of green screen 😂😂😂😎🟩