Machine Learning Studio
Machine Learning Studio
  • 24
  • 106 425
Enhancing LLMs (an overview)
Discover 3 powerful techniques for enhancing Large Language Models (LLMs) in this video! We cover prompt engineering, Retrieval-Augmented Generation (RAG), and fine-tuning, explaining how each method improves the performance and adaptability of LLMs for various tasks. Perfect for AI enthusiasts and developers looking to optimize their models!
#LLM #Promptengineering #RAG #finetuning
Переглядів: 383

Відео

FlashAttention: Accelerate LLM training
Переглядів 8202 місяці тому
In this video, we cover FlashAttention. FlashAttention is an Io-aware attention algorithm that significantly accelerates the training of LLMs.
An Overview of Object Recognition Tasks
Переглядів 2513 місяці тому
This video gives an overview of object recognition tasks, starting with image classification (binary, multi-class and multi-label), then localization and object detection, and finally image segmentation, including semantic, instance segmentation and finally panoptic segmentation. #computervision
Dataset Management with FiftyOne
Переглядів 1383 місяці тому
Link to the toy dataset for this tutorial on GitHub: github.com/PyML-studio/mlstudio/tree/main/Notebooks/fiftyone-dataset-management/data Link to the Jupyter notebook: github.com/PyML-studio/mlstudio/blob/main/Notebooks/fiftyone-dataset-management/fiftyone-tutorials.ipynb
OpenAI CLIP model explained
Переглядів 3,5 тис.4 місяці тому
CLIP: Contrastive Language-Image Pre-training In this video, I describe the CLIP model published by OpenAI. CLIP is based on Natural Language Supervision for pre-training. Natural Language Supervision is not a new, in fact there are two approaches for this, one approach tries to predict the exact caption for each image, whereas the other approach is based on contrastive loss, where instead of p...
DINO -- Self-supervised ViT
Переглядів 5045 місяців тому
In this video, we cover a very exciting paper, called “Emerging Properties in Self-supervised Vision Transformer”. The proposed method DINO (self-distillation with no labels) is a simplified approach for self-supervised learning in vision domain. Similar to self-supervised transformers in NLP, pre-training ViT with DINO also leads to some emerging properties beyond what they were trained for.
Swin Transformer
Переглядів 2,6 тис.6 місяців тому
In this video, we continue the vision transformer series, covering Swin Transformer, a general-purpose transformer backbone for computer vision. Swin Transformer is based on two key ideas: (1) designing a multi-scale hierarchical backbone suitable for computer vision, and (2) a carefully designed Swin Block composed of two window-based attention for efficient self-attention computation, while s...
Variants of ViT: DeiT and T2T-ViT
Переглядів 1 тис.7 місяців тому
As you recall from our previous video on ViT, the original ViT needs lots of training data such as JFT-300M. But, if we use a mid-size dataset like ImageNet-1k, the performance of ViT is lower than that of CNNs. In this video, we cover two ViT variants called DeiT (Data Efficient Image Transformers) and Tokens-to-Token ViT (T2T-ViT). Both these models have been able to design vision transformer...
Vision Transformer (ViT)
Переглядів 1,5 тис.8 місяців тому
ViT is a pivotal paper in computer vision, bringing the powers of Transformers to the vision domain, and becoming a fundamental building block of many current vision models. In this video, we delve into the intricate mechanisms of ViT, exploring how this influential model operates. Reference: "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale", available at arxiv.org/pd...
Evolution of Self-Attention in Vision
Переглядів 1,2 тис.9 місяців тому
In the second installment of this series, the video delves into the advanced world of self-attention in the visual domain. It focuses on three pivotal papers that leverage self-attention mechanisms for image processing. The video builds on the foundation laid by the non-local module discussed in the first video. The spotlight is on attention-augmented convolution and two seminal papers on stand...
Relative Self-Attention Explained
Переглядів 1,3 тис.9 місяців тому
In this video, we dive into a very interesting topic "Relative Self-Attention". First, we will see the differences between relative and absolute position embedding, and then we will cover two algorithms for incorporating relative embedding in self-attention. #transformers #deeplearning
Self-Attention in Image Domain: Non-Local Module
Переглядів 1,2 тис.10 місяців тому
Hi everyone, this is the first video in the vision transformers series, where we will delve into the evolution of self-attention in images, starting with this paper titled "Non-local neural networks".
Introducing a new series on Vision Transformers
Переглядів 80010 місяців тому
Hello Everyone! Welcome to our new video series focused on Vision Transformers. In our previous series, we have extensively covered Transformers for sequences in the domain of Natural Language Processing In this exciting new series, we will explore the vision transformers and approaches for leveraging self-attention in images for tackling computer vision problems.
Linear Complexity in Attention Mechanism: A step-by-step implementation in PyTorch
Переглядів 99710 місяців тому
In our last video, we explored eight distinct algorithms aimed at improving the efficiency of the attention mechanism by minimizing its memory and arithmetic complexity. In this video, we'll be presenting a PyTorch implementation of four of those algorithms that have linear complexity wrt sequence length This demonstration, is intended to help you get familiar with the underlying concepts. Link...
Efficient Self-Attention for Transformers
Переглядів 3,5 тис.11 місяців тому
The memory and computational demands of the original attention mechanism increase quadratically as sequence length grows, rendering it impractical for longer sequences. However, various methods have been developed to streamline the attention mechanism's complexity. In this video, we'll explore some of the most prominent models that address this challenge. #transformers Link to the activation fu...
Variants of Multi-head attention: Multi-query (MQA) and Grouped-query attention (GQA)
Переглядів 7 тис.11 місяців тому
Variants of Multi-head attention: Multi-query (MQA) and Grouped-query attention (GQA)
PostLN, PreLN and ResiDual Transformers
Переглядів 1,9 тис.Рік тому
PostLN, PreLN and ResiDual Transformers
Transformer Architecture
Переглядів 7 тис.Рік тому
Transformer Architecture
Top Optimizers for Neural Networks
Переглядів 9 тис.Рік тому
Top Optimizers for Neural Networks
A Dive Into Multihead Attention, Self-Attention and Cross-Attention
Переглядів 29 тис.Рік тому
A Dive Into Multihead Attention, Self-Attention and Cross-Attention
Self-Attention Using Scaled Dot-Product Approach
Переглядів 16 тис.Рік тому
Self-Attention Using Scaled Dot-Product Approach
GPT-4 release: a 5-minute overview
Переглядів 384Рік тому
GPT-4 release: a 5-minute overview
Matrix Multiplication Concept Explained
Переглядів 4,4 тис.Рік тому
Matrix Multiplication Concept Explained
A Review of 10 Most Popular Activation Functions in Neural Networks
Переглядів 12 тис.Рік тому
A Review of 10 Most Popular Activation Functions in Neural Networks

КОМЕНТАРІ

  • @klaverhenrik
    @klaverhenrik 3 дні тому

    Your videos are amazing! Clear and well-structured. Are the slides available anywhere?

  • @franciscosaavedra6105
    @franciscosaavedra6105 3 дні тому

    tunning your audio please

  • @Sathishreddy-fe3cp
    @Sathishreddy-fe3cp 3 дні тому

    can you suggest good book name to read maths topic about neural networks

  • @Professor_The_Trader
    @Professor_The_Trader 4 дні тому

    thanks , amzing

  • @malekricci8525
    @malekricci8525 5 днів тому

    Is swin transformer applied on classification task?

  • @artianna727
    @artianna727 6 днів тому

    ممنون بابت مطالب عالی

  • @douglasholman6300
    @douglasholman6300 10 днів тому

    This is wonderful, thank you. How did you prepare the visuals and diagrams? What tool did you use? This video is exceptional quality

  • @planetjam3529
    @planetjam3529 12 днів тому

    Great presentation.

  • @Sathishreddy-fe3cp
    @Sathishreddy-fe3cp 15 днів тому

    i want to practice all optimizers with different activation functions with some maths problems and in python could you please suggest good book

  • @LarryGow-m9w
    @LarryGow-m9w 26 днів тому

    159 Leannon Shoal

  • @bathalamallikarjuna2316
    @bathalamallikarjuna2316 27 днів тому

    Great content ❤🎉

  • @siddharthvj1
    @siddharthvj1 28 днів тому

    fine tunning please more videos on fine tunning

    • @PyMLstudio
      @PyMLstudio 28 днів тому

      Absolutely! Right after the RAG video, I’ll make a series of fine tuning videos. Thanks for the suggestion. 👍🏻

  • @AdanMatten-j8p
    @AdanMatten-j8p Місяць тому

    Cindy Islands

  • @RoseLaios-d1m
    @RoseLaios-d1m Місяць тому

    Janelle Square

  • @RobertPhillips-z6i
    @RobertPhillips-z6i Місяць тому

    Miller Cape

  • @ИринейКарандашов
    @ИринейКарандашов Місяць тому

    0291 Yolanda Viaduct

  • @praveenchandra-i8f
    @praveenchandra-i8f Місяць тому

    very nice presentation sir can you share me ppt sir it is useful for me sir

  • @yabezD
    @yabezD Місяць тому

    Could you post a video on deit

    • @PyMLstudio
      @PyMLstudio Місяць тому

      I have already covered DeIT in this video: Variants of ViT: DeiT and T2T-ViT ua-cam.com/video/h_VwFYDucP8/v-deo.html

    • @yabezD
      @yabezD Місяць тому

      @@PyMLstudio An in-depth explanation including flow, formulas and stuff could be helpful, sir

  • @PaulCollazo-l9t
    @PaulCollazo-l9t Місяць тому

    Tabitha View

  • @SandraClark-r8v
    @SandraClark-r8v Місяць тому

    Retta Creek

  • @AI_For_Scientists
    @AI_For_Scientists Місяць тому

    Great video series on vit and derivatives, watched all of it. Thank you very much for sharing.

  • @GrahamSandra-y5w
    @GrahamSandra-y5w Місяць тому

    Presley Ford

  • @GeorgeMoore-q1n
    @GeorgeMoore-q1n Місяць тому

    Purdy Underpass

  • @karthikn789
    @karthikn789 2 місяці тому

    AdamW and Yogi Optimizer could be considered Universal First order Optimizers. I have experimented on many multi-modal tasks and these two have consistently performed well.

  • @SaptarshiBasu2004
    @SaptarshiBasu2004 2 місяці тому

    Very nice video. Thank you!

  • @mehdimohsenimahani4150
    @mehdimohsenimahani4150 2 місяці тому

    great great 👍💯

  • @scottunique
    @scottunique 2 місяці тому

    Very well explained and while simple, the illustrations are exceptionally clear. Excellent work.

  • @randomstuff39280
    @randomstuff39280 2 місяці тому

    thank you for explaining! very clear! but I'm wondering how do you know WiT dataset is based on 50000 queries and 20000 pairs for each query? I can't find it in the paper.

    • @PyMLstudio
      @PyMLstudio 2 місяці тому

      Thanks for the comment! Please see Page 3, section 2.2: Creating a sufficiently large dataset But it’s 500000 queries, balancing 20000 (Image, text) pairs per query

  • @parsaforoozmand8936
    @parsaforoozmand8936 2 місяці тому

    Great video

  • @anastasia_wang17
    @anastasia_wang17 2 місяці тому

    Hi, thanks for the amazing video. One question. I get d, q,k,v but didn't get the denotation W.

    • @Lesoleil370
      @Lesoleil370 2 місяці тому

      Thanks, so w is a learnable matrix to get q, k, and v. So to get q, we use q=w_q x, and similarly , for k and v: q=W_q x k=W_k x v=W_v x

    • @anastasia_wang17
      @anastasia_wang17 2 місяці тому

      @@Lesoleil370 thank you!

  • @alirezakamyab851
    @alirezakamyab851 3 місяці тому

    I have been looking for that \bar{Q} for hours; I could not realize what "tensor reshaping can be used ..." meant in the original paper. thank you

    • @PyMLstudio
      @PyMLstudio 3 місяці тому

      Thank you for the comment , glad the video was useful for clarifying the concepts

  • @fayezalhussein7115
    @fayezalhussein7115 3 місяці тому

    Great

  • @mohamedkassar7441
    @mohamedkassar7441 3 місяці тому

    Thanks

  • @milanvasilic4510
    @milanvasilic4510 3 місяці тому

    Greate explanation, thx :). So the headnumber just tells me how many weight matrices i have for K, Q and V?

  • @jiahao2709
    @jiahao2709 3 місяці тому

    may i know how you make these videos?

  • @diasposangare1154
    @diasposangare1154 3 місяці тому

    hi sir please can i have access to the powerpoint?

  • @empanadeuwu
    @empanadeuwu 3 місяці тому

    u save me, thanks and greetings from chile!

    • @PyMLstudio
      @PyMLstudio 3 місяці тому

      Sure, I’m happy that the video was useful

  • @brahmgaur4731
    @brahmgaur4731 3 місяці тому

    I suggest, you should consider showing your face, you content is awesome and really helpful but for you to grow your channel you must provide the audience a personal touch which in this case is your visible presence. Look, I dont read channel names each time I am looking for content to study, but I have seen that each channel follows a unique way of representing their content, which adds credeblity to the content, If I know you teach well and I see you face in one of the videos realted to the content I am looking for, as a thumbnail. I would jump to it in no time. Thats my opinion but the choice is yours. - A student and Well Wisher.

    • @PyMLstudio
      @PyMLstudio 3 місяці тому

      You are right 👌🏻that would make a lot of difference. I also wanted to work on my channel page but didn’t get a chance to. Thank you for the suggestion 🙏🏻

  • @temanangka3820
    @temanangka3820 3 місяці тому

    6:00 Does each attention head only process part of embeded token? Example: Say, there is 100 token and 2 attention heads. Does each head only process 50 tokens. ?? If yes, then how can we make sure each head can understand whole context of sentence, while it only consumes half of sentence?

    • @PyMLstudio
      @PyMLstudio 3 місяці тому

      That’s a great question The multihead attention splits the feature dimension not the sequence dimension. So that way, each head is able to see entire sequence, but working on a smaller feature-size. Example : input is 100 token and each embedding vector is 256 dimensional . Then with 8 heads , each head will process tensors of size 100x16

    • @temanangka3820
      @temanangka3820 3 місяці тому

      @@PyMLstudio understood... Great explanation.. Thank you, Bro...

    • @Kevoshea
      @Kevoshea 5 днів тому

      100x16 or 100x32?

  • @zeroone1217
    @zeroone1217 3 місяці тому

    thank you for making it so easy to grasp the mathematical concepts!

    • @PyMLstudio
      @PyMLstudio 3 місяці тому

      Glad you found the videos useful

  • @raghuvallikkat3384
    @raghuvallikkat3384 3 місяці тому

    Thanks. Can you please explain the dimensionality "d" means?

    • @PyMLstudio
      @PyMLstudio 3 місяці тому

      Sure, d or d_model refers to the size of the hidden units in each layer. So that’s the size of each embedding vector , as well as the input and output of each layer . The size of query key and values are d/h because multihead attention splits the input of size d by the number of heads.

  • @fouziaanjums6475
    @fouziaanjums6475 4 місяці тому

    Please cover FasterViT model too...

    • @PyMLstudio
      @PyMLstudio 3 місяці тому

      Absolutely, I’ll cover that , I have a few other topics lined up, then I’ll get to FasterViT Thanks for the suggestion!

  • @temanangka3820
    @temanangka3820 4 місяці тому

    How to get matrix Q, K, and V?

    • @PyMLstudio
      @PyMLstudio 3 місяці тому

      So if we start from the very first step, we tokenize the input sequence , and then we pass this sequence of tokens to an embedded layer. So if we fast track, these embedding reach the attention block as the input so let’s call them tensor X. Now in this attention block, we have 3 learnable matrices Wq, Wk, and Wv, so we multiply each matrix with X and we get Q, K and V respectively.

  • @KumR
    @KumR 4 місяці тому

    Wow.. So nice.

  • @paktv858
    @paktv858 4 місяці тому

    what is the difference between self attention and multi head self attention? is both are same just instead of single attention multi head attention use multi heads?

  • @me-ou8rf
    @me-ou8rf 4 місяці тому

    Can you suggest some materials that deal with how transformer can be applied to time series database like EEG ?

  • @hosseindahaee2886
    @hosseindahaee2886 4 місяці тому

    Thanks for your concise and insightful description.🙏

  • @SebastianRaschka
    @SebastianRaschka 4 місяці тому

    Very nice video! I can also imagine that predicting the caption text exactly isn't only more difficult but it would also be more likely result in (more) overfitting if it is learned this way. At 5:43, the pair-wise similarities, they are basically like cross-attention scores?

    • @PyMLstudio
      @PyMLstudio 4 місяці тому

      Yes, in a way, it’s analogous to cross-attention, taking dot-product between the features from the text encoder and image encoder. This dot-product similarity is used as the final output of the model to determine if an image and a text caption are related or not. Good question, thanks for the comment

  • @benji6296
    @benji6296 4 місяці тому

    what would be the advantage of this methods vs Flash attention. Flash attention speeds up the computation and it is an exact computation most of these methods are approximations. I would like if possible to see a video explaining other attention types as Paged attention and Flash Attention. Great content :)

    • @PyMLstudio
      @PyMLstudio 4 місяці тому

      Thank you for the suggestion! You're absolutely right. In this video, I focused on purely algorithmic approaches, not hardware-based solutions like FlashAttention. FlashAttention is an IO-aware exact attention algorithm that uses tiling to reduce memory reads/writes between GPU memory levels, which results in significant speedup without sacrificing model quality. I appreciate your input and will definitely consider making a video to explain FlashAttention!

    • @PyMLstudio
      @PyMLstudio 2 місяці тому

      Thanks for the suggestion, I made a new video on Flash Attention: FlashAttention: Accelerate LLM training ua-cam.com/video/LKwyHWYEIMQ/v-deo.html I would love to hear your comments and if you have any other suggestions

  • @agenticmark
    @agenticmark 4 місяці тому

    you left out step and sine :D