MAMBA AI (S6): Better than Transformers?

Поділитися
Вставка
  • Опубліковано 25 лис 2024

КОМЕНТАРІ • 39

  • @mjp152
    @mjp152 11 місяців тому +23

    Another interesting architecture is the Tolman-Eichenbaum Machine which is inspired by the hippocampus and lends some interesting abilities to infer latent relationships in the data.

  • @StephenRayner
    @StephenRayner 11 місяців тому +31

    Just as they start etching the transformer architecture onto silicon ha!

    • @jumpstar9000
      @jumpstar9000 11 місяців тому

      that also made me chuckle

    • @vinvin8971
      @vinvin8971 11 місяців тому

      Just a Bullshit...

  • @sadface7457
    @sadface7457 11 місяців тому +23

    The way you say hello community is a ray of sunshine 🌞 😊

    • @code4AI
      @code4AI  11 місяців тому +5

      Big smile.

    • @planorama.design
      @planorama.design 11 місяців тому

      That's the truth! Always love the enthusiastic hellos!

    • @davidreagan1287
      @davidreagan1287 10 місяців тому +1

      Best way to learn

  • @mike-q2f4f
    @mike-q2f4f 11 місяців тому +8

    It's clear transformers can be improved. Excited to see this proposal play out. Thanks for the update!

  • @_tnk_
    @_tnk_ 10 місяців тому +2

    First video i’ve watched from you and very impressed! Looking forward to watching more

  • @lizardy2867
    @lizardy2867 11 місяців тому +4

    One of the problems I face when trying to implement simple models which utilize a latent space, is the volatility of their input and output sizes. Never should a model require truncation, nor should a model allow inaccuracies. How for example, shall you model a compression algorithm (encode-decode) for any and all data that can exist? You are required to make the latent space before the model, effectively becoming part of the preprocessing step.
    This is of course, expected and within reason.
    I am one to think the solution to this problem is one which would uppend most of the field.

  • @javiergimenezmoya86
    @javiergimenezmoya86 11 місяців тому +1

    My intuition: Transformers for capture very linked concepts and words in each chapter and its summarization and mamba for union and interconexión of all sumarized ideas (no linked words but link group of very disperse ,distributed among chapters, ideas )

    • @kevon217
      @kevon217 11 місяців тому

      That sounds like a cool combo

  • @laurinwagner8127
    @laurinwagner8127 11 місяців тому +5

    The GPT family of models are a decoder-only architecture which is not covered by the patent.

    • @code4AI
      @code4AI  11 місяців тому

      GPT (Generative Pretrained Transformer)

    • @therainman7777
      @therainman7777 11 місяців тому +4

      @@code4AIYes, GPT models are transformers. But they are not the type of transformer architecture covered by Google’s patent. Google’s patent is for the original encoder-decoder architecture only. GPT models are decoder-only, which is a different type of architecture.

  • @kevon217
    @kevon217 11 місяців тому +2

    Excellent high level overview.

  • @EkShunya
    @EkShunya 11 місяців тому +2

    can you make more content for state space

  • @shekhinah5985
    @shekhinah5985 8 місяців тому

    What's stored in the real space if not the position? Isn't the example phase space storing an even bigger vector because it now doesn't only store the position of the center of the mass but also the velocity?

  • @planorama.design
    @planorama.design 11 місяців тому +2

    Great coverage, and thanks once again. One issue I am grappling with is attention, which is managed at "run-time" (i.e. inference) on the prompt for transformers, where Mamba seems to capture this concept entirely during training. No need for an attention matrix, as with transformers. Very long context windows, improved access to early information from the stream, and faster performance. Love all this.
    My concern / reasoning: Removing the "run-time" attention at inference means we're using statistical understandings of language from training. For prompts that are quite varied from the training set data, can Mamba LLMs excel at activities that aim for creativity and brainstorming?
    Also seems to me that training Mamba LLMs on multiple languages may degrade predictability in any one language since the "attention" (conceptually) is calculated at training time. But I am still pondering this; certainly may be wrong as I wrap my head around it!

    • @code4AI
      @code4AI  11 місяців тому +3

      Like your q. I am struggling to find benchmarks on the ICL performance of Mamba like systems. Also actual performance data in direct comparison w/ current generation LLMs are missing. And some authors hint, that given the few-shot examples ability might be associated with the self-attention mechanism itself? That would pose some serious limitations to State Space systems, linear RNN and alike, if I loose the ability to inject new data and info in my prompt and the system understands the new semantic config and its semantic correlations (eg for reasoning).
      But I trust the open source community to come up with advanced solutions ....

  • @davidreagan1287
    @davidreagan1287 10 місяців тому

    Conceptually, this is brilliant: Savoir-Faire for Accuracy and Precision.
    However, a deeper understanding of non-Matrix mathematics and challenges of serial hardware engineering would be greatly appreciated.

  • @juancarlospizarromendez3954
    @juancarlospizarromendez3954 11 місяців тому

    I do think that an artificial brain does plug to many engines StepByStep as the arithmetic calculator, the logical reasoner, the theorem prover, etc becoming it to a cyborg-like.

  • @Lhtokbgkmvfknv
    @Lhtokbgkmvfknv 10 місяців тому

    Thank you!

  • @renanmonteirobarbosa8129
    @renanmonteirobarbosa8129 11 місяців тому +2

    I appreciate your attempt at simplifying and introducing how state spaces are used in a very particular application at dynamical systems. However I am afraid you are missing quite a lot and, perhaps, confused about the mathematics.

    • @TiaguinhouGFX
      @TiaguinhouGFX 11 місяців тому +1

      como assim?

    • @dhnguyen68
      @dhnguyen68 11 місяців тому +4

      Please further elaborate your claim.

    • @code4AI
      @code4AI  11 місяців тому +2

      Your comment made it into my next video on BEYOND MAMBA (ua-cam.com/video/C2fFL8pVX2M/v-deo.html) and provided a beautiful transition from the origin of State Space mathematics to overcome the limitations of current S4 and S6 State Space Model. Hope the new video explains your mistake and that you learned that interdisciplinary (from Physics to Statistics and time series) is something beautiful. Thanks for your comment.

  • @oryxchannel
    @oryxchannel 10 місяців тому

    I’m a new fan.

  • @oguzhanylmaz4586
    @oguzhanylmaz4586 11 місяців тому +1

    hi ,I am developing offline chatbot with RAG. Should I use Llama 7b as the llm model? Or should I choose the Zephyer 7B model? It needs to work locally without internet.

    • @qwertydump4720
      @qwertydump4720 11 місяців тому +7

      I don't know details of your project but I had the best experience with dolphin-mistral 7b 2.2.1

    • @oguzhanylmaz4586
      @oguzhanylmaz4586 11 місяців тому

      @@qwertydump4720 I am doing an offline chatbot as a graduation project. So I may need a lot of information about the model I'm going to use.

  • @davidjohnston4240
    @davidjohnston4240 10 місяців тому

    I came to see a new better means of AC voltage conversion. I was disappointed.

  • @qwertasd7
    @qwertasd7 11 місяців тому

    Interesting
    (and also all replies here, there doesnt seam to be a place anymore where thinkers can exchange ideas).
    Do you know of a model using this concept (to try out in lm studio or in jupiter notebook ?).
    Personally i think they way LLM's work/are trained is not the way to go.
    To many useless facts inside them, for fact they should just use a callout to wiki pedia or other sites.
    LLM's 'world domain', should be language, no politics, no famous people, but theoretical skills, translations, medicine, law, math, physics coding, etc. Not who was Trump or JF kennedy or Madona. Those gigs should be removed.

  • @remsee1608
    @remsee1608 11 місяців тому +5

    This not good for that startup that is building transformer chips

    • @thechatgpttutor
      @thechatgpttutor 11 місяців тому

      exactly what I thought

    • @redone823
      @redone823 11 місяців тому

      Rip startup. Died before birth.

    • @planorama.design
      @planorama.design 11 місяців тому

      I think the jury is still out. There isn't enough real world usage [yet] to say how well the Mamba arch really performs against Transformers. Over the past couple years, we (at large) been able to evaluate a variety of business use cases for Transformer-based LLMs. We have no idea how the Mamba arch will compare in those same use cases.