Neural Breakdown with AVB
Neural Breakdown with AVB
  • 54
  • 191 735
Breaking down Apple Intelligence LLMs one algorithm at a time!
Apple Intelligence is exciting and in their latest technical report, they talk about Apple's Foundation Large Language models ( or LLM ), namely AFM-server and AFM-on-device. In this video, I break down all the algorithmic progress made in this paper - explaining concepts like Next word Prediction with Transformer Decoders, Reinforcement Learning with Human Feedback, Low Rank Adaptation (LoRA), Knowledge Distillation, Structured Pruning, Quantization with Palettization, Mirror Descent Policy Optimization with Leave One Out (MDLOO), and many more concepts!
#ai #apple #artificialintelligence
UA-cam Members and Patrons will get access to write-ups, slides, notebooks, and bonus content from all videos on my channel!
Visit my Patreon link to see what else is available:
www.patreon.com/NeuralBreakdownwithAVB
Links:
Apple Paper: arxiv.org/pdf/2407.21075
Apple Blogpost: machinelearning.apple.com/research/introducing-apple-foundation-models
Palettization: apple.github.io/coremltools/docs-guides/source/opt-palettization-overview.html
Sheared LLama: arxiv.org/pdf/2310.06694
Structured Pruning: arxiv.org/pdf/1910.04732
Videos you may like:
The Full History of NLP Explained - ua-cam.com/video/uocYQH0cWTs/v-deo.html
Attention to Transformers Playlist - ua-cam.com/play/PLGXWtN1HUjPfq0MSqD5dX8V7Gx5ow4QYW.html
Timestamps:
0:00 - Intro
1:32 - Chapter 1 - Overview
2:55 - Pretraining
4:12 - Structured Pruning
5:12 - Knowledge Distillation
5:55 - Post Training
7:36 - Iterative Teaching Committee
9:06 - Chapter 2 - Adapters
9:30 - LoRA (Low Rank Adapters)
11:55 - Quantization (Palettization)
13:27 - Chapter 3 - RLHF
14:11 - Reward Modelling
16:03 - Leave One Out
17:43 - Mirror Descent Policy and MDLOO
19:15 - Results
Переглядів: 802

Відео

Complete DSPy Tutorial - Master LLM Prompt Programming in 8 amazing examples!
Переглядів 13 тис.День тому
In this video, we talk about Stanford NLP's DSPy - a new LLM Programming framework that helps with prompting, bootstrapping, optimizing, and fine-tuning Language Models. We go through 8 examples that step by step explain the major concepts behind DSPy and how to build really complex LLM programs with just a few lines of code! Let's build programs with both LLMs (like ChatGPT, Mixtral, Gemini, L...
How does Segment Anything 2 (SAM 2) work? Paper and Network Architecture Explained!
Переглядів 1,9 тис.14 днів тому
In this video, we will be discussing the SAM-2 paper, the latest META AI paper that tackles Promptable Visual Segmentation, the task of segmenting objects from any video! Medium article: towardsdatascience.com/segment-anything-2-what-is-the-secret-sauce-a-deep-learners-guide-1c43dd07a6f8 Members and Patrons will get access to the full write-up and powerpoint slides from this video! Visit the Pa...
How Neural Nets estimate depth from 2D images? Monocular Depth Estimation Explained!
Переглядів 1,4 тис.Місяць тому
In this video, we will be discussing the MiDAS paper, Depth Anything V1, and the latest Depth Anything V2 paper! We are going to learn the basics of Monocular Depth Estimation and some of the modern tricks, datasets, networks, and loss functions used to train these models. Members will get access the full write-up, all the animations, powerpoint slides, and annotated PDFs from this video for ch...
How have Convolutional Neural Nets evolved since the 90s?
Переглядів 1,5 тис.Місяць тому
In this video, we are going to go through the history of CNNs specifically for Image Classification tasks - starting from those early research years, to the golden era of the mid 2010s when many of the most genius Deep Learning architectures ever were conceived, and finally discuss the latest trends in CNN research now as they compete with attention and vision-transformers. Members can now acce...
What does it take to create a Text to Image Diffusion Model from scratch?
Переглядів 3,9 тис.2 місяці тому
In just 15 points, we talk about everything you need to know about Generative AI Diffusion models - from the basics to Latent Diffusion Models (LDMs) and Text-to-Image conditional Latent diffusion models. I also train a diffusion model with Pytorch on my laptop to demonstrate how it all works. To access the full code repo && 15 minute code walkthrough video && 4000 word script && 15 animations ...
Kolmogorov Arnold Networks (KAN) Paper Explained - An exciting new paradigm for Deep Learning?
Переглядів 49 тис.3 місяці тому
This is a paper breakdown video of the paper: Kolmogorov Arnold Networks, which brilliantly provides an alternative to standard Multi Layer Perceptrons. The video discusses the main contributions and core ideas of the paper, visually explaining the math, concepts, and challenges ahead. #deeplearning #machinelearning #neuralnetworks To access the animations, narration scripts, slides, notes, etc...
A Multi-Agent game where LLMs must trick each other as humans until one gets caught
Переглядів 1,2 тис.3 місяці тому
Five top LLMs - OpenAI's ChatGPT, Google Gemini, Anthropic's Claude, Meta's LLAMA 2, and Mistral AI's Mixtral 8x7B compete in this text-based Turing Test game. The objective is to fool other LLMs into thinking that they are talking to an actual human rather than another AI. A breakdown video about how Agent-based LLMs work: ua-cam.com/video/cXfnNoMgCio/v-deo.html #ai #nlp #gemini #chatgpt Learn...
Everything about Image Segmentation with UNET visually explained!
Переглядів 1,3 тис.4 місяці тому
This video is about the amazing UNet architecture and how it can be used to train any Image Segmentation model. I go over the major reasons why convolutional neural nets, especially the UNet are so good at image-segmentation tasks, going in deep with how to arrange a dataset, augment data, and code up a Image Segmentation system like this on your own using Pytorch. For accessing the code, anima...
But what does a trained Convolution Neural Network actually learn? VISUALIZED!
Переглядів 2,7 тис.4 місяці тому
In this video, I dive into Convolutional Neural Networks - WHAT they are, HOW they learn, and WHY they are so successful on computer vision tasks. The video contains a lot of illustrations and animations to better visualize the inner mechanisms of CNNs and help develop better intuition about this classic deep learning algorithm. The channel now has membership options - you can access the slides...
The questions about AGI that you should be asking instead
Переглядів 2,1 тис.5 місяців тому
The questions about AGI that you should be asking instead
Two Large Language Models DEBATE about AGI and Humanity + How i did it! (ChatGPT vs Mixtral)
Переглядів 1,5 тис.7 місяців тому
Two Large Language Models DEBATE about AGI and Humanity How i did it! (ChatGPT vs Mixtral)
If LLMs are text models, how do they generate images? (Transformers + VQVAE explained)
Переглядів 4,9 тис.8 місяців тому
If LLMs are text models, how do they generate images? (Transformers VQVAE explained)
Here is how Transformers ended the tradition of Inductive Bias in Neural Nets
Переглядів 7 тис.8 місяців тому
Here is how Transformers ended the tradition of Inductive Bias in Neural Nets
The many amazing things about Self-Attention and why they work
Переглядів 3,8 тис.9 місяців тому
The many amazing things about Self-Attention and why they work
Neural Attention - This simple example will change how you think about it
Переглядів 4,1 тис.10 місяців тому
Neural Attention - This simple example will change how you think about it
How Multi-Agent AI learn by continuously competing against themselves | Self Play
Переглядів 1,2 тис.10 місяців тому
How Multi-Agent AI learn by continuously competing against themselves | Self Play
Reinforcement Learning AI through 4 famous projects!
Переглядів 1,4 тис.11 місяців тому
Reinforcement Learning AI through 4 famous projects!
Visualizing the Latent Space: This video will change how you imagine neural nets!
Переглядів 11 тис.Рік тому
Visualizing the Latent Space: This video will change how you imagine neural nets!
JEPA Architectures - How neural networks learn abstract concepts about images (IJEPA)
Переглядів 2,7 тис.Рік тому
JEPA Architectures - How neural networks learn abstract concepts about images (IJEPA)
How Large Language Models play video games
Переглядів 3,6 тис.Рік тому
How Large Language Models play video games
Multimodal AI from First Principles - Neural Nets that can see, hear, AND write.
Переглядів 8 тис.Рік тому
Multimodal AI from First Principles - Neural Nets that can see, hear, AND write.
10 years of NLP history explained in 50 concepts | From Word2Vec, RNNs to GPT
Переглядів 24 тис.Рік тому
10 years of NLP history explained in 50 concepts | From Word2Vec, RNNs to GPT
Explaining the Segment Anything Model - Network architecture, Dataset, Training
Переглядів 19 тис.Рік тому
Explaining the Segment Anything Model - Network architecture, Dataset, Training
My AI played millions of games against itself | Self Play (Unity Devlog 05)
Переглядів 619Рік тому
My AI played millions of games against itself | Self Play (Unity Devlog 05)
Understanding Zip-NeRF - a cool new AI algorithm for 3D scene synthesis
Переглядів 18 тис.Рік тому
Understanding Zip-NeRF - a cool new AI algorithm for 3D scene synthesis

КОМЕНТАРІ

  • @Korea-Lens
    @Korea-Lens 7 годин тому

    really great overview of topics. subd n liked each im watching!

  • @s1dc0des
    @s1dc0des День тому

    💙

  • @Rishabh-ej2nx
    @Rishabh-ej2nx День тому

    can we connect somewhere to have a chat?

    • @avb_fj
      @avb_fj 41 секунда тому

      You can reach out on the email address linked on my main channel page.

  • @barbaraz5363
    @barbaraz5363 День тому

    Thanks for your awesome video! I just have one question regarding the contents of Memory Bank. As I understand: "Memory from each frame is inserted into the Memory Bank", but then you mentionned that the memory bank contains 3 elements, so is it in reality (memory + prompts + object pointers)= memory bank ? Thanks for your answer!

    • @avb_fj
      @avb_fj День тому

      That is correct. It’s the 3 elements that you mentioned.

  • @haralc
    @haralc 2 дні тому

    Btw, I'm not a football fan, and although you mentioned in the note that it is okay .... no, it's really not ... even when I watched this video a few times, who did what still doesn't register in my brain...

    • @avb_fj
      @avb_fj 2 дні тому

      Fair criticism. I regretted using these examples soon after shooting the video. Future tutorials won’t have examples like these.

  • @haralc
    @haralc 2 дні тому

    I just tried it. However it failed on the first try of the "basic" stuff. With GPT4, the BasicQA keep returning "Question: .. Answer: ..." , but I only need the answer itself, not the whole "Quesion: ... Answer: ..." .... So, which part I don't have to worry about the prompting part?

  • @artmusic6937
    @artmusic6937 2 дні тому

    if the 60m parameter model does 50% accuracy, how can you improve this without using a bigger model? Because if you use a bigger model, then it actually just memorizes the data better. So it is basically overfitting, isn't it?

  • @subhamkundu5043
    @subhamkundu5043 2 дні тому

    Can I connect multiple lora adapters at the same time to the base model?😊

    • @avb_fj
      @avb_fj 2 дні тому

      Generally, multiple independently-trained LORA adapters can be applied sequentially one after the other, or in parallel on the same input followed by adding their outputs together. I don't think the paper mentioned anything specific about stacking multiple LORAs, but I'm sure it should be possible to do properly with some fine-tuning.

    • @subhamkundu5043
      @subhamkundu5043 2 дні тому

      @@avb_fj thanks for the explanation.

  • @gyahoo
    @gyahoo 3 дні тому

    Great explanation ❤

  • @KarinaBuchmann
    @KarinaBuchmann 5 днів тому

    Great video! In my opinion, the explanation of loss is missing, but overall it's a very easy-to-understand video.

    • @avb_fj
      @avb_fj 5 днів тому

      Thanks for the feedback! I explained the loss function and iterative training process in the SAM-1 video so I sorta skipped it.

  • @SlashDL
    @SlashDL 5 днів тому

    Some more information at 10:25 - In the token to image attention, the query comes from the prompt + output tokens and the key, value comes from the image. In the image to token attention, the query comes from the image embedding and the key, value comes from the prompt + output tokens.

  • @SlashDL
    @SlashDL 5 днів тому

    At 10:03, 4 new tokens are added to the sparse embeddings, 1 representing the IoU score, the rest of the 3 representing the masks. Just a minor correction.

  • @trashchenkov
    @trashchenkov 6 днів тому

    thanks for the video! It will be great to see how to use DSPy for agents.

  • @Foba_Bett
    @Foba_Bett 6 днів тому

    Amazing work! Thank you!

  • @srinivasasatya6797
    @srinivasasatya6797 6 днів тому

    Great work

  • @krlospatrick
    @krlospatrick 7 днів тому

    Thanks for sharing this amazing tutorial

  • @amdenis
    @amdenis 7 днів тому

    Great job! I have used your videos to help several recent grads, who had some gaps in understanding. You are a very good teacher.

    • @avb_fj
      @avb_fj 7 днів тому

      Awesome! Thank you!

  • @daniloMgiomo
    @daniloMgiomo 7 днів тому

    Awesome content, u can please try some text-grad? The Stanford guys release a paper about this in July

    • @avb_fj
      @avb_fj 7 днів тому

      Thanks for the suggestion! Sounds like a good idea for a future video.

  • @AyanKhan-dc3eu
    @AyanKhan-dc3eu 8 днів тому

    When this module came out the docs were very confusing. thank you for such a great explanation

  • @r66p6r
    @r66p6r 9 днів тому

    was hoping to make the prompts autonomized. i feel like you still need to understand prompting well before you can use this :(

  • @haroldalcala9639
    @haroldalcala9639 9 днів тому

    Thanks!

  • @user-op6vd3sn2k
    @user-op6vd3sn2k 9 днів тому

    i have a question can i use any other models other than openai? im running my own models in deepinfra.

    • @avb_fj
      @avb_fj 9 днів тому

      Haven't tested this myself, but I assume that you can call deepinfra models using OpenAI apis by changing the base_url parameter. deepinfra.com/docs/openai_api

    • @user-op6vd3sn2k
      @user-op6vd3sn2k 9 днів тому

      @@avb_fj Thanks man ✨

    • @user-op6vd3sn2k
      @user-op6vd3sn2k 9 днів тому

      @@avb_fj also another doubt can i use another vector db's like astra db into rag Thanks.

    • @avb_fj
      @avb_fj 9 днів тому

      @@user-op6vd3sn2k check out the supported ones here: dspy-docs.vercel.app/docs/category/retrieval-model-clients

  • @jeffrey5602
    @jeffrey5602 9 днів тому

    As a German I enjoyed the chosen example a lot 😄

  • @shoaibsh2872
    @shoaibsh2872 9 днів тому

    Great video man, also loved the one piece T-shirt ;)

    • @avb_fj
      @avb_fj 9 днів тому

      Thanks for noticing!

  • @lakshaynz
    @lakshaynz 9 днів тому

    Thank you:)

  • @xtema1624
    @xtema1624 9 днів тому

    Plz share the colab in this video, not dspy example.

  • @Anonymous-lw1zy
    @Anonymous-lw1zy 9 днів тому

    Also at 12:42 I am getting: answer='Mario Götze' confidence=0.9 not Andre Schurrle I quadruple-checked that my code is the same as yours.

  • @JimMendenhall
    @JimMendenhall 9 днів тому

    This is the only video or resource I've seen on DSPy that makes ANY sense. Great job!

  • @ProgrammerRajaa
    @ProgrammerRajaa 9 днів тому

    Greate explanation. can i have link for the Notebook that you have show on the video

    • @avb_fj
      @avb_fj 9 днів тому

      As I mentioned in the video and on the description, the code is currently members/patrons only.

  • @fojo_reviews
    @fojo_reviews 9 днів тому

    I kinda feel guilty that I am seeing such content without paying anything! This is gold...Thank you!

    • @avb_fj
      @avb_fj 9 днів тому

      Thanks!

  • @Anonymous-lw1zy
    @Anonymous-lw1zy 9 днів тому

    Great video. However , dspy seems to be very fragile; it breaks easily. e.g. at 11:00 you ask ""What is the capital of the birth state of the person who provided the assist for Mario Gotze's in the football World Cup finals in 2014?"" and it answers 'Mainz', which you said is correct. But if I make the question slightly different by adding "goal" after "Gotze's", so the question is now ""What is the capital of the birth state of the person who provided the assist for Mario Gotze's goal in the football World Cup finals in 2014?"", it answers "Research".

    • @avb_fj
      @avb_fj 9 днів тому

      In general it’s the underlying LLM that could be “fragile”. Remember that DSPy is just converting your program into a prompt and sending to the LLM. The LLM generates the answer which depends on input prompts and temperature settings. Either way, as long as the concepts make sense, don’t worry about replicating the test cases shown in the video!

  • @orthodox_gentleman
    @orthodox_gentleman 10 днів тому

    You do realize that GPT 3.5 Turbo was deprecated aka it no longer exists.

    • @avb_fj
      @avb_fj 10 днів тому

      Thanks for your comment. The DSPy documentation and official tutorial still uses it (links below), and it worked out for the examples I was going for in the tutorial. Whether the particular LM is deprecated or not is not important at all. You can replace it with whichever model you prefer… the concepts remain the same. dspy-docs.vercel.app/docs/tutorials/rag github.com/stanfordnlp/dspy/blob/main/intro.ipynb

  • @MrMoonsilver
    @MrMoonsilver 10 днів тому

    Yes boss! Subscribed! Great video and very much untapped territory, the only well made tutorial for dspy!

    • @avb_fj
      @avb_fj 10 днів тому

      Thanks!

  • @ashadqureshi4412
    @ashadqureshi4412 10 днів тому

    Thanks man!!!

  • @spookymv
    @spookymv 11 днів тому

    Thanks to you, my friend, I learned what I haven't understood for days. I insistently want to learn dspy, but I didn't understand it. Thanks a lot.

    • @avb_fj
      @avb_fj 11 днів тому

      Glad to hear that! Your insistence has paid off! Good luck with your DSPy journey!

  • @prashlovessamosa
    @prashlovessamosa 11 днів тому

    Excellent 👌

  • @boaz9798
    @boaz9798 11 днів тому

    Excellent video. Thank you. Can I grab the resulting prompt? I know it is supposedly a new paradigm which abstracts it away, but some may still want to revert back to using a simple prompt in prod. post optimization

    • @avb_fj
      @avb_fj 11 днів тому

      Yes you can use the inspect_history() function as shown in the video (around 4:30) to check out all the previous prompts ran by a module.

  • @habilzare2451
    @habilzare2451 11 днів тому

    Thanks Neural Breakdown with AVB! I always wanted to know how "latent space arithmetic" works.

  • @joshuatettey7771
    @joshuatettey7771 12 днів тому

    Awesome video. Thanks mate🤩

  • @weixie4943
    @weixie4943 12 днів тому

    Thank you! As the others have commented, a great explanation, as always! At 4:44, should the purple bars be flipped about the "x-axis" from what is shown, since at the left edges of the contiguous red "stretches"/"blocks", the only part of the kernel that overlaps is the negative/"down" bar, and vice-versa at the right edges? Thanks again!

    • @avb_fj
      @avb_fj 12 днів тому

      Great point! You are correct. Basically, the general "signal processing" formula for convolution flips the kernel along the X-axis within it (something like _sum{f(t) g(T - t)} where f(.) = signal and g(.) = kernel). But most of the video explains convolution (how it is used in deep learning) as "correlation" (basically _sum{f(t) g(t)} ) where the flipping is not assumed. At 2:16 I mentioned the manual flipping step, but for the example at 4:44, I skipped it. Thanks for pointing that out!

  • @jjordano4127
    @jjordano4127 13 днів тому

    thank you sir, i subscribed your channel after watching this video

    • @avb_fj
      @avb_fj 13 днів тому

      Thanks! Glad you enjoyed it.

  • @xxlvulkann6743
    @xxlvulkann6743 13 днів тому

    This was a useful summary for finding papers to research developments in multimodal machine learning models!

    • @avb_fj
      @avb_fj 13 днів тому

      Thanks! Super glad you found the video resourceful!

  • @mbpiku
    @mbpiku 15 днів тому

    Gold and underrated channel, hope you get the recognition for your good work.

    • @avb_fj
      @avb_fj 15 днів тому

      Much appreciated!

  • @mbpiku
    @mbpiku 15 днів тому

    Great video.

  • @TP-ct7qm
    @TP-ct7qm 17 днів тому

    Great video as always! This is the best ML channel on UA-cam!

    • @avb_fj
      @avb_fj 13 днів тому

      Haha thanks! That’s high praise!

  • @Omsip123
    @Omsip123 17 днів тому

    Thanks a lot for these insights on Segment Anything 2 (SAM2)

  • @420_gunna
    @420_gunna 17 днів тому

    Awesome video as always, keep producing epic content and the masses will come! :)

    • @avb_fj
      @avb_fj 13 днів тому

      Thanks! Will do!

  • @pmcc067
    @pmcc067 18 днів тому

    Great video!