- 14
- 122 631
Niels Rogge
Приєднався 23 січ 2011
Fine-tune PaliGemma for image to JSON use cases
In this tutorial, I'll showcase how to fine-tune PaliGemma, a new open vision-language model by Google on a receipt image to JSON use case. The goal for the model is to learn to output a JSON containing all key fields from a receipt, such as the product items, their prices and quantities.
Do note that PaliGemma is just one of many vision-language models released recently.
The notebook can be found here: github.com/NielsRogge/Transformers-Tutorials/tree/master/PaliGemma
Do note that PaliGemma is just one of many vision-language models released recently.
The notebook can be found here: github.com/NielsRogge/Transformers-Tutorials/tree/master/PaliGemma
Переглядів: 7 366
Відео
Transformers demystified: how do ChatGPT, GPT-4, LLaMa work?
Переглядів 13 тис.8 місяців тому
In this video, I explain in detail how large language models (LLMs) like GPT-2, ChatGPT, LLaMa, GPT-4, Mistral, etc. work, by going over the code as they are implemented in the Transformers library by Hugging Face. We start by converting text into so-called input_ids, which are integer indices in the vocabulary of a Transformer model. Internally, those get converted into so-called "hidden state...
Creating your own ChatGPT: Supervised fine-tuning (SFT)
Переглядів 14 тис.9 місяців тому
This video showcases how one can perform supervised fine-tuning (SFT for short) on an open-source large language model. Supervised fine-tuning, also called instruction tuning, takes a base model that has been pre-trained on billions of tokens from the web, and turns it into a useful chatbot by training on human-written instructions and corresponding completions. The video uses the following not...
Training and deploying open-source large language models
Переглядів 18 тис.10 місяців тому
2023 was an exciting year for open-source AI, with many releases like Meta's LLaMa-2 and Mistral.ai's Mixtral-8x7B. This talk is a high-level overview of what happened and how you can go ahead on train and deploy one of these models. The slides were originally made for the NLP meetup in December 2023, but as there was no recording being made I decided to record a video going over the slides. Re...
Contributing a model to HF series: part 6
Переглядів 371Рік тому
In this tutorial series, I go over all the steps to contribute a model to Hugging Face Transformers, one of the most popular machine learning libraries in the world. In the tutorial, the goal is to port the DINOv2 model from Meta AI (a state-of-the-art vision Transformer) to the library. In this video, we continue by improving the conversion script, and converting the giant checkpoint. DINOv2 m...
Contributing a model to HF series: part 5
Переглядів 264Рік тому
In this tutorial series, I go over all the steps to contribute a model to Hugging Face Transformers, one of the most popular machine learning libraries in the world. In the tutorial, the goal is to port the DINOv2 model from Meta AI (a state-of-the-art vision Transformer) to the library. In this video, we continue by implementing the image processor, which can be used to prepare images for the ...
Contributing a model to HF series: part 4
Переглядів 341Рік тому
In this tutorial series, I go over all the steps to contribute a model to Hugging Face Transformers, one of the most popular machine learning libraries in the world. In the tutorial, the goal is to port the DINOv2 model from Meta AI (a state-of-the-art vision Transformer) to the library. In this video, we continue by implementing the model tests, leveraging pytest as test framework. We also imp...
Contributing a model to HF series: part 3
Переглядів 538Рік тому
In this tutorial series, I go over all the steps to contribute a model to Hugging Face Transformers, one of the most popular machine learning libraries in the world. In the tutorial, the goal is to port the DINOv2 model from Meta AI (a state-of-the-art vision Transformer) to the library. In this video, we continue implementing the Hugging Face model, by converting all attention layers of the Tr...
Contributing a model to HF series: part 2
Переглядів 1,2 тис.Рік тому
In this tutorial series, I go over all the steps to contribute a model to Hugging Face Transformers, one of the most popular machine learning libraries in the world. In the tutorial, the goal is to port the DINOv2 model from Meta AI (a state-of-the-art vision Transformer) to the library. In this video, we use the "transformers-cli add-new-model-like" command provided by Hugging Face, which spee...
Contributing a model to HF series: part 1
Переглядів 3,2 тис.Рік тому
In this tutorial series, I go over all the steps to contribute a model to Hugging Face Transformers, one of the most popular machine learning libraries in the world. In the tutorial, the goal is to port the DINOv2 model from Meta AI (a state-of-the-art vision Transformer) to the library. As a first step, it's important to get familiar with the original code base, and run a forward pass with the...
Git-version, host and share any custom PyTorch model using HuggingFace
Переглядів 877Рік тому
Made a video to showcase the underrated Mixin classes (like PyTorchModelHubMixin) to host, document and version any custom ML model on the Huggingface hub. This might be a lot more convenient than having your models cluttered on Google Drive/Dropbox etc.! Docs: huggingface.co/docs/huggingface_hub/package_reference/mixins
How a Transformer works at inference vs training time
Переглядів 57 тис.Рік тому
I made this video to illustrate the difference between how a Transformer is used at inference time (i.e. when generating text) vs. how a Transformer is trained. Disclaimer: this video assumes that you are familiar with the basics of deep learning, and that you've used HuggingFace Transformers at least once. If that's not the case, I highly recommend this course: cs231n.stanford.edu/ which will ...
Fine-tune ConvNeXT for image classification [end-to-end tutorial]
Переглядів 4,7 тис.2 роки тому
Hi guys! In this video, I show how to fine-tune ConvNeXT, a state-of-the-art model by Facebook AI for image classification on a custom dataset. I leverage the new 🤗 ImageFolder feature, and train the model in native PyTorch. Timestamps: 0:55 Intro to ConvNeXT 1:35 Loading a custom dataset using ImageFolder 6:40 Push dataset to the 🤗 hub 8:57 Process dataset 18:10 Define model 21:20 Prepare trai...
Excellent video. Why do we want a different post-embedding vector for the same token in the decoder versus the encoder? reference 12:34
Awesome! Thank you Niels!
the best, thank u so much
what's the tool you used for plotting these figures
i really wanted this exact content and i found you, thank you.
Great video ! Quick question : Does AutoTokenizer work same as the tiktoken lib? Or maybe is it the same and it loads the titoken.get...("gpt2") as the tokenizer? Also do we need to torch.softmax before the argmax ? Weird because everybody tells to do it but nobody really normalizes the output.
awesome!
Hi, do we also apply masking during inference?
thanks for sharing, NR God
Never seen such a video best explanation line by line
is there any way to get the whole graph you've drawn?😀
The best explanatios , your channel is a gem ❤
Awesome job explaining how Inference works. This clarified my confusion about most videos which largely discuss only pre-training. 🙏
Do you have a patreon or some other way to support this work?
Thank you for this tutorial 😀! I still have a question about the benefit of the "json2token" function, can't we simply transform the ground truth dictionaries into a string directly? And feed them to the tokenizer ? Why should we need the special tokens <s_{k}> ?
Great Explanation!!
Hey this is a cool video. Do you have any resources to get started on HF contribution?
Great video! waiting for the benefits of using past_key_values and transformer tools on fine tuning
Are your notes from this video available anywhere online? Really liked the video and would love to add your notes to my personal study notes as well
I am new to this, I am just trying to understand if this during Inference or Training. I guess it is during Inference. please correct me
Thanks for the awesome tutorial, can this be run in colab T4 GPUs (16GB)?
Thanks for your video.
Hi sir, I have trained the model in the exact way you showed. While prompting, it takes me around 15 mins to get one response. Is the inference speed the same for you? If yes, how to optimize this?
Very informative video, thanks for it. I was trying to do some stuff like this myself, can you help me out in knowing how can i finetune a pretrained model into an instruct version but also adding a new language to it. I have looked at the tokenization part and thankfully the tokens are not too sperated that it would cause much trouble. However i wanna know that is it possible that the new layers can learn the language as well as the skill to follow the instructions. Thanks again
Great video, very comprehensible explanation of a complex subject.
can you train my dataset?
Superb content however if I want to add specific key elements to be extracted from custom data how will i do it pls reply
Best video i've seen so far about finetuning with huggingface. I appreciate that you also take time to explain some concepts behind. Other youtubers are just reading out loud their notebooks...
I didn't find a lot of resources that include both drawings of the process, as well as code examples / snippets that demonstrate the drawings practically. Thank you, this helps me a lot :)
@Niels Rogge it is amazing content and i have been following you since the time of donut tutorial keep going buddy. I have one question is it possible to get the snippets extractions also from the invoices? please let me know and reply
You're a beast mate. Thanks!
amazing video, great explanation
no more runpod?
Hi, why attention mask is added to the attn weights instead of multiplied (1h00:11)? if you add the attention weight with zero the weights will not be ignored
Thanks for making this super in-depth tutorial Niels! And very nice dataset/task you chose, I like it.
banger video as always 🔥
Is this model robust against incomplete labels/ground truth ? Let say you have for some images all details and for others not so many
Excellent overview of how the encoder-decoder work together. Thanks.
Thank you very much for the explanation, Niels. It was excellent. I have just one question regarding 'conditioning the decoder' during inference: How exactly does it work? Does it operate in the same way it does during training, i.e., the encoder hidden states are projected into queries, keys, and values, and then the dot products between the decoder and encoder hidden states are computed to generate the new hidden states? It seems like a lot of calculations for me, and in this way, the text generation process would be very slow, wouldn't it?
Thank you so much for the amazing tutorial!
This is awesome, thanks for sharing, I'm only 5 minutes in right now but will definitely follow it all later. These types of tutorials are gold-dust, calm, concise and not just for entertainment.
Thank you for this excellent presentation. In future can you do a video on how to take two take two different HF models and merge them , like this one .
Amazingg. Such a comprehensive and detailed video. Loved it
Wow, Look who's back 🔥
superb video. I will recommend this video in my next lab presentation.
Fantastic breakdown, thank you Niels
谢谢你,讲得很好,之前只是大概了解,现在是更清楚其中的细节了。非常感谢,爱来自瓷器
I learned a lot thank you.
Transformers are '"COMPLICATED" ? Not really after this video. Thanks.
Just woderful. My search ends.