Niels Rogge
Niels Rogge
  • 14
  • 122 631
Fine-tune PaliGemma for image to JSON use cases
In this tutorial, I'll showcase how to fine-tune PaliGemma, a new open vision-language model by Google on a receipt image to JSON use case. The goal for the model is to learn to output a JSON containing all key fields from a receipt, such as the product items, their prices and quantities.
Do note that PaliGemma is just one of many vision-language models released recently.
The notebook can be found here: github.com/NielsRogge/Transformers-Tutorials/tree/master/PaliGemma
Переглядів: 7 366

Відео

Transformers demystified: how do ChatGPT, GPT-4, LLaMa work?
Переглядів 13 тис.8 місяців тому
In this video, I explain in detail how large language models (LLMs) like GPT-2, ChatGPT, LLaMa, GPT-4, Mistral, etc. work, by going over the code as they are implemented in the Transformers library by Hugging Face. We start by converting text into so-called input_ids, which are integer indices in the vocabulary of a Transformer model. Internally, those get converted into so-called "hidden state...
Creating your own ChatGPT: Supervised fine-tuning (SFT)
Переглядів 14 тис.9 місяців тому
This video showcases how one can perform supervised fine-tuning (SFT for short) on an open-source large language model. Supervised fine-tuning, also called instruction tuning, takes a base model that has been pre-trained on billions of tokens from the web, and turns it into a useful chatbot by training on human-written instructions and corresponding completions. The video uses the following not...
Training and deploying open-source large language models
Переглядів 18 тис.10 місяців тому
2023 was an exciting year for open-source AI, with many releases like Meta's LLaMa-2 and Mistral.ai's Mixtral-8x7B. This talk is a high-level overview of what happened and how you can go ahead on train and deploy one of these models. The slides were originally made for the NLP meetup in December 2023, but as there was no recording being made I decided to record a video going over the slides. Re...
Contributing a model to HF series: part 6
Переглядів 371Рік тому
In this tutorial series, I go over all the steps to contribute a model to Hugging Face Transformers, one of the most popular machine learning libraries in the world. In the tutorial, the goal is to port the DINOv2 model from Meta AI (a state-of-the-art vision Transformer) to the library. In this video, we continue by improving the conversion script, and converting the giant checkpoint. DINOv2 m...
Contributing a model to HF series: part 5
Переглядів 264Рік тому
In this tutorial series, I go over all the steps to contribute a model to Hugging Face Transformers, one of the most popular machine learning libraries in the world. In the tutorial, the goal is to port the DINOv2 model from Meta AI (a state-of-the-art vision Transformer) to the library. In this video, we continue by implementing the image processor, which can be used to prepare images for the ...
Contributing a model to HF series: part 4
Переглядів 341Рік тому
In this tutorial series, I go over all the steps to contribute a model to Hugging Face Transformers, one of the most popular machine learning libraries in the world. In the tutorial, the goal is to port the DINOv2 model from Meta AI (a state-of-the-art vision Transformer) to the library. In this video, we continue by implementing the model tests, leveraging pytest as test framework. We also imp...
Contributing a model to HF series: part 3
Переглядів 538Рік тому
In this tutorial series, I go over all the steps to contribute a model to Hugging Face Transformers, one of the most popular machine learning libraries in the world. In the tutorial, the goal is to port the DINOv2 model from Meta AI (a state-of-the-art vision Transformer) to the library. In this video, we continue implementing the Hugging Face model, by converting all attention layers of the Tr...
Contributing a model to HF series: part 2
Переглядів 1,2 тис.Рік тому
In this tutorial series, I go over all the steps to contribute a model to Hugging Face Transformers, one of the most popular machine learning libraries in the world. In the tutorial, the goal is to port the DINOv2 model from Meta AI (a state-of-the-art vision Transformer) to the library. In this video, we use the "transformers-cli add-new-model-like" command provided by Hugging Face, which spee...
Contributing a model to HF series: part 1
Переглядів 3,2 тис.Рік тому
In this tutorial series, I go over all the steps to contribute a model to Hugging Face Transformers, one of the most popular machine learning libraries in the world. In the tutorial, the goal is to port the DINOv2 model from Meta AI (a state-of-the-art vision Transformer) to the library. As a first step, it's important to get familiar with the original code base, and run a forward pass with the...
Git-version, host and share any custom PyTorch model using HuggingFace
Переглядів 877Рік тому
Made a video to showcase the underrated Mixin classes (like PyTorchModelHubMixin) to host, document and version any custom ML model on the Huggingface hub. This might be a lot more convenient than having your models cluttered on Google Drive/Dropbox etc.! Docs: huggingface.co/docs/huggingface_hub/package_reference/mixins
How a Transformer works at inference vs training time
Переглядів 57 тис.Рік тому
I made this video to illustrate the difference between how a Transformer is used at inference time (i.e. when generating text) vs. how a Transformer is trained. Disclaimer: this video assumes that you are familiar with the basics of deep learning, and that you've used HuggingFace Transformers at least once. If that's not the case, I highly recommend this course: cs231n.stanford.edu/ which will ...
Fine-tune ConvNeXT for image classification [end-to-end tutorial]
Переглядів 4,7 тис.2 роки тому
Hi guys! In this video, I show how to fine-tune ConvNeXT, a state-of-the-art model by Facebook AI for image classification on a custom dataset. I leverage the new 🤗 ImageFolder feature, and train the model in native PyTorch. Timestamps: 0:55 Intro to ConvNeXT 1:35 Loading a custom dataset using ImageFolder 6:40 Push dataset to the 🤗 hub 8:57 Process dataset 18:10 Define model 21:20 Prepare trai...
LayoutLMv2 Gradio demo
Переглядів 1,5 тис.3 роки тому
LayoutLMv2 Gradio demo

КОМЕНТАРІ

  • @19AKS58
    @19AKS58 11 днів тому

    Excellent video. Why do we want a different post-embedding vector for the same token in the decoder versus the encoder? reference 12:34

  • @zyxie-e4m
    @zyxie-e4m 12 днів тому

    Awesome! Thank you Niels!

  • @vincenrow7190
    @vincenrow7190 20 днів тому

    the best, thank u so much

  • @YL-ln4ls
    @YL-ln4ls 29 днів тому

    what's the tool you used for plotting these figures

  • @kamal9294
    @kamal9294 Місяць тому

    i really wanted this exact content and i found you, thank you.

  • @Dim-zt5ei
    @Dim-zt5ei Місяць тому

    Great video ! Quick question : Does AutoTokenizer work same as the tiktoken lib? Or maybe is it the same and it loads the titoken.get...("gpt2") as the tokenizer? Also do we need to torch.softmax before the argmax ? Weird because everybody tells to do it but nobody really normalizes the output.

  • @johnnyg6325
    @johnnyg6325 Місяць тому

    awesome!

  • @algorithmo134
    @algorithmo134 Місяць тому

    Hi, do we also apply masking during inference?

  • @WaiPanTam
    @WaiPanTam Місяць тому

    thanks for sharing, NR God

  • @nirmesh44
    @nirmesh44 Місяць тому

    Never seen such a video best explanation line by line

  • @RicardoMlu-tw2ig
    @RicardoMlu-tw2ig 2 місяці тому

    is there any way to get the whole graph you've drawn?😀

  • @ravindra1607
    @ravindra1607 2 місяці тому

    The best explanatios , your channel is a gem ❤

  • @ARATHI2000
    @ARATHI2000 2 місяці тому

    Awesome job explaining how Inference works. This clarified my confusion about most videos which largely discuss only pre-training. 🙏

  • @richardhipsh7594
    @richardhipsh7594 2 місяці тому

    Do you have a patreon or some other way to support this work?

  • @pablovitale6058
    @pablovitale6058 2 місяці тому

    Thank you for this tutorial 😀! I still have a question about the benefit of the "json2token" function, can't we simply transform the ground truth dictionaries into a string directly? And feed them to the tokenizer ? Why should we need the special tokens <s_{k}> ?

  • @ajaykumargogineni3391
    @ajaykumargogineni3391 2 місяці тому

    Great Explanation!!

  • @Socials-d6w
    @Socials-d6w 3 місяці тому

    Hey this is a cool video. Do you have any resources to get started on HF contribution?

  • @aminekidane5757
    @aminekidane5757 3 місяці тому

    Great video! waiting for the benefits of using past_key_values and transformer tools on fine tuning

  • @sophiacas
    @sophiacas 3 місяці тому

    Are your notes from this video available anywhere online? Really liked the video and would love to add your notes to my personal study notes as well

  • @praneethkrishna6782
    @praneethkrishna6782 4 місяці тому

    I am new to this, I am just trying to understand if this during Inference or Training. I guess it is during Inference. please correct me

  • @rajdeepbanerjee6641
    @rajdeepbanerjee6641 4 місяці тому

    Thanks for the awesome tutorial, can this be run in colab T4 GPUs (16GB)?

  • @yuzhen-o3h
    @yuzhen-o3h 4 місяці тому

    Thanks for your video.

  • @sakshikumar7679
    @sakshikumar7679 4 місяці тому

    Hi sir, I have trained the model in the exact way you showed. While prompting, it takes me around 15 mins to get one response. Is the inference speed the same for you? If yes, how to optimize this?

  • @ashwanidangwal6446
    @ashwanidangwal6446 4 місяці тому

    Very informative video, thanks for it. I was trying to do some stuff like this myself, can you help me out in knowing how can i finetune a pretrained model into an instruct version but also adding a new language to it. I have looked at the tokenization part and thankfully the tokens are not too sperated that it would cause much trouble. However i wanna know that is it possible that the new layers can learn the language as well as the skill to follow the instructions. Thanks again

  • @henrik-ts
    @henrik-ts 4 місяці тому

    Great video, very comprehensible explanation of a complex subject.

  • @deepaks1356
    @deepaks1356 4 місяці тому

    can you train my dataset?

  • @abhishekg4147
    @abhishekg4147 4 місяці тому

    Superb content however if I want to add specific key elements to be extracted from custom data how will i do it pls reply

  • @henrik-ts
    @henrik-ts 4 місяці тому

    Best video i've seen so far about finetuning with huggingface. I appreciate that you also take time to explain some concepts behind. Other youtubers are just reading out loud their notebooks...

  • @vsucc3176
    @vsucc3176 4 місяці тому

    I didn't find a lot of resources that include both drawings of the process, as well as code examples / snippets that demonstrate the drawings practically. Thank you, this helps me a lot :)

  • @abhishekg4147
    @abhishekg4147 4 місяці тому

    @Niels Rogge it is amazing content and i have been following you since the time of donut tutorial keep going buddy. I have one question is it possible to get the snippets extractions also from the invoices? please let me know and reply

  • @thepierre2009
    @thepierre2009 4 місяці тому

    You're a beast mate. Thanks!

  • @251_satyamrai4
    @251_satyamrai4 4 місяці тому

    amazing video, great explanation

  • @richardyim8914
    @richardyim8914 5 місяців тому

    no more runpod?

  • @SergeBenYamin
    @SergeBenYamin 5 місяців тому

    Hi, why attention mask is added to the attn weights instead of multiplied (1h00:11)? if you add the attention weight with zero the weights will not be ignored

  • @lucasbeyer2985
    @lucasbeyer2985 5 місяців тому

    Thanks for making this super in-depth tutorial Niels! And very nice dataset/task you chose, I like it.

  • @jeffrey5602
    @jeffrey5602 5 місяців тому

    banger video as always 🔥

  • @miguelalba2106
    @miguelalba2106 5 місяців тому

    Is this model robust against incomplete labels/ground truth ? Let say you have for some images all details and for others not so many

  • @forecenterforcustomermanag7715
    @forecenterforcustomermanag7715 5 місяців тому

    Excellent overview of how the encoder-decoder work together. Thanks.

  • @lucasbandeira5392
    @lucasbandeira5392 5 місяців тому

    Thank you very much for the explanation, Niels. It was excellent. I have just one question regarding 'conditioning the decoder' during inference: How exactly does it work? Does it operate in the same way it does during training, i.e., the encoder hidden states are projected into queries, keys, and values, and then the dot products between the decoder and encoder hidden states are computed to generate the new hidden states? It seems like a lot of calculations for me, and in this way, the text generation process would be very slow, wouldn't it?

  • @junma7763
    @junma7763 5 місяців тому

    Thank you so much for the amazing tutorial!

  • @OliNorwell
    @OliNorwell 5 місяців тому

    This is awesome, thanks for sharing, I'm only 5 minutes in right now but will definitely follow it all later. These types of tutorials are gold-dust, calm, concise and not just for entertainment.

  • @aamir122a
    @aamir122a 5 місяців тому

    Thank you for this excellent presentation. In future can you do a video on how to take two take two different HF models and merge them , like this one .

  • @salesgurupro
    @salesgurupro 5 місяців тому

    Amazingg. Such a comprehensive and detailed video. Loved it

  • @taesiri
    @taesiri 5 місяців тому

    Wow, Look who's back 🔥

  • @richardyim8914
    @richardyim8914 5 місяців тому

    superb video. I will recommend this video in my next lab presentation.

  • @RezaZaheri-ow3qm
    @RezaZaheri-ow3qm 6 місяців тому

    Fantastic breakdown, thank you Niels

  • @徐迟-i2t
    @徐迟-i2t 6 місяців тому

    谢谢你,讲得很好,之前只是大概了解,现在是更清楚其中的细节了。非常感谢,爱来自瓷器

  • @omerali3320
    @omerali3320 6 місяців тому

    I learned a lot thank you.

  • @SanKum7
    @SanKum7 6 місяців тому

    Transformers are '"COMPLICATED" ? Not really after this video. Thanks.

  • @gstiwari
    @gstiwari 6 місяців тому

    Just woderful. My search ends.