LaMini-LM - Mini Models Maxi Data!

Поділитися
Вставка
  • Опубліковано 15 лис 2024

КОМЕНТАРІ • 52

  • @thedarsideofit
    @thedarsideofit Рік тому +7

    Hey, thanks for the video! These colabs work well with the regular accounts!

    • @samwitteveenai
      @samwitteveenai  Рік тому +1

      oh yes I totaly forgot to mention these notebooks all work on the free Colab.

  • @РыгорБородулин-ц1е

    Man, thanks a lot for covering all of this!

  • @henkhbit5748
    @henkhbit5748 Рік тому

    Yes, nice approach indeed, doing the reverse. I am also curious when they apply it to the bigger models or that vicunia or dolly2 are trained with these instruction datasets. Thanks for sharing new publications about the progress of llm.👍

  • @novantha1
    @novantha1 Рік тому +5

    I wonder how well one of these would perform if they were heavily specialized into a specific area of knowledge.
    Like, if you had a general advanced LLM, could you chain it to these domain specific LLMs to have them give specific facts that are then formatted appropriately for answers by the primary LLM?
    Regardless, interesting paper, and interesting video!

    • @samwitteveenai
      @samwitteveenai  Рік тому +3

      Also thinking this is a really interesting direction for if you want a super specialized model for a tight domain, or as you say in some kind of cascading system. They are so small you can fine tune them really quick too.

    • @clray123
      @clray123 Рік тому

      That is also the idea behind LoRA adapters - you can train those specifically as a "delta" against some general base model, and then tack the adapter on the base model to specialize it. Nothing keeps you from swapping out these adapters at runtime based on some context information (e.g. some other model running along with your chatbot and detecting topic changes in conversation). I haven't seen it done yet, though.

    • @samwitteveenai
      @samwitteveenai  Рік тому

      I have looked at the exactly this with LoRA and in our tests it took too long to be practical in production for one the fly changes. It does save a lot on space thought that you have have 5 LoRAs and just one main model. I would love to see a simple way to do it on the fly.

    • @clray123
      @clray123 Рік тому

      @@samwitteveenai Cool, thanks for sharing. How long did it took exactly? I mean the LoRA weights for Alpaca are just ~60 MB, how long can it take to load/apply?

    • @ukaszluk8474
      @ukaszluk8474 Рік тому

      Or perhaps, what would happen if we distilled only human reasoning capabilities, without any specific knowledge? With a large context window, this model could then operate efficiently on vast amounts of data.

  • @narutocole
    @narutocole Рік тому

    Super excited to see your thoughts on the MPT-7B family of models. They have one with a context window of 65k tokens!

  • @acortis
    @acortis Рік тому

    Sam, thanks for all these videos! Question: would any of these models do well as a substitute for chatGPT in the langchain understanding of which task to use? Have you done any comparisons yet? It would great to have something small and free to use as a reasoning engine there. Thanks!

    • @samwitteveenai
      @samwitteveenai  Рік тому +2

      not the LaMini models, the big Vicuna/Kolas/Open Assistants are starting to get close but its just a matter of time before we have a decent Open Source alternative to ChatGPT. I have another video I am about to drop for that, so you can test them yourself.

  • @toddnedd2138
    @toddnedd2138 Рік тому

    Thank you for keeping us up to date with recent developments and providing the colabs. I just wondering if some theoretical principals of LLMs are not more valid. I thought: Max Entropy = log2(N)
    Where N is the number of possible states that the language model can be in. In the case of a language model, each state corresponds to a specific sequence of words. The number of possible states is determined by the number of parameters in the model. 🤔

  • @lucasalvarezlacasa2098
    @lucasalvarezlacasa2098 Рік тому

    Amazing video, thank you so much!

  • @sparshgupta2881
    @sparshgupta2881 Рік тому

    Hey Sam, Great video as usual. How is the performance of these "smaller" models with zero-shot classification in downstream tasks?

    • @samwitteveenai
      @samwitteveenai  Рік тому

      I don't think its great but certainly more interesting that the base models for these etc.

  • @heshumi
    @heshumi Рік тому

    Thank you for the video, Sam. Do you think one could train one of these models from scratch in colab?

    • @samwitteveenai
      @samwitteveenai  Рік тому

      they are fine tuning these models which you could do on colab yes.

    • @heshumi
      @heshumi Рік тому

      @@samwitteveenai great, thank you

  • @hiawoood
    @hiawoood Рік тому

    Excellent video as always.
    May I suggest more Langchain content. Specifically using agents to implement complex chat flows

  • @mohitsarpal9380
    @mohitsarpal9380 Рік тому

    Thanks you updating us daily on LLMs.
    One simple problem … i want to finetune any opensource model on domain data on QnA task . Can you assist which model can be used as a base model and then how can i finetune it!

  • @elspuddo
    @elspuddo Рік тому +1

    Hey Sam, just a heads up, I think you might have a too aggressive threshold on your noise gate. Something seems to be really clamping down the volume at the end of your sentences, which distracts from otherwise really great content :)

    • @samwitteveenai
      @samwitteveenai  Рік тому

      thanks. yes I found out this was record with the wrong mic only after it was recorded so had a lot of noise reduction to try and get it to be useable. I appreciate you reaching out though.

  • @simplegeektips1490
    @simplegeektips1490 Рік тому

    hi sam , amazing video, this one too! thanks. do you know how I can run on my macbook the small LaMini (donwload models and what other files...) If I try I got an error (more likely 8bit with no GPU?)

    • @samwitteveenai
      @samwitteveenai  Рік тому

      I don't think the 8bit like this will be compatible with macOS GPUs etc.

  • @hilmiterzi3847
    @hilmiterzi3847 Рік тому

    Thank you for the video's bro!

  • @tkon99
    @tkon99 Рік тому

    Hi Sam, these models are quite cool, especially because the smaller ones (when quantized) would run on a lot of hardware at decent speeds. Have you considered doing a tutorial on how to convert these hugging face models to run locally?

    • @samwitteveenai
      @samwitteveenai  Рік тому +1

      Thats an interesting idea but the models are a bit hit and miss. Let me try it in a notebook and if it works I will turn it into a video. They are certainly small enough

    • @tkon99
      @tkon99 Рік тому +1

      @@samwitteveenai done this myself today. Ran the LaMini-Flan-T5-783M through the converter that comes with CTranslate2 and quantized to 8 bit integers, got some good results especially given the speed. This is similar to the technique used by Faster Whisper (also built on CTranslate2 for transformers).

  • @坤王-c8x
    @坤王-c8x Рік тому

    Great video!!!

  • @spoonikle
    @spoonikle Рік тому +4

    The only issues with these kinds of models is their data sets are very basic.
    Honestly, we need to harvest team communication data and their project files.
    We should hire people to do tasks and record their process entirely.
    Like pay skilled and highly trained individuals to do research and other tasks with a data collection package and then use an LLM to distill their data into a timeline and then summarize it into discrete sets.
    Then we will have a number of recorded workflows that achieve the desired results.
    we could basically make humans obsolete one domain at a time by slowly analyzing the full thought process and workflow.
    People keep asking AI to do “magic” and infer what you want from basically no input and no domain specific knowledge. We are acting like the worlds worst boss on day one.
    We need more data, specifically, the algorithms our best and brightest use to accomplish tasks. What questions they ask of themselves, how they structure their ideas, how they test and iterate and how they determine their task is complete.
    With enough domain specific task completion and team work data, AI agents would be like studios or corporations in a box, each with their own workflow that achieves different and unique solutions to problems and tasks.

    • @BHBalast
      @BHBalast Рік тому +1

      Transcripts of "thinking out loud" about a problem should do great as a dataset

    • @samwitteveenai
      @samwitteveenai  Рік тому +1

      The challenge is teams like this ending up being really expensive and then there is debate to who's answer is right etc.

    • @spoonikle
      @spoonikle Рік тому +1

      @@samwitteveenai I think the price will only be paid when people are sure it can be done. Also “correctness” will not be needed. As if nobody in any corporation has completed a collaborative project before.
      It would be an issue if you only brought on broke kids or young people - but the target group are corporate slaves looking to retire on royalties for harvesting their workflow data.
      Transparency and honesty is key. They must be aware that their actions are being used as an example for instruction. Even the person who “calls it” and “reigns in the scope” is part of the data.

  • @user-wr4yl7tx3w
    @user-wr4yl7tx3w Рік тому

    is it something we can fine-tune ourselves as well, using Colab Premium?

    • @samwitteveenai
      @samwitteveenai  Рік тому +1

      Yeah I have a number of videos about fine tuning and PEFT, check those out.

  • @just..someone
    @just..someone Рік тому

    Very nice

  • @saurav1096
    @saurav1096 Рік тому

    I think we should have lots of small models with specialise tasks, and select one of them based on given prompt, and i am sure next week i will see one paper about it too 🤦‍♂️

  • @jamesjonnes
    @jamesjonnes Рік тому

    Can I run it on Windows with CPU only?

  • @mrpropre9206
    @mrpropre9206 Рік тому

    First again, surfing on the AI wave

  • @klammer75
    @klammer75 Рік тому

    Very cool! I love this approach….Tku for showing this🥳🦾