- 10
- 14 296
Prince Canuma
Poland
Приєднався 7 чер 2012
Founder of Kulissiwa.com | Ex - ML Engineer at Neptune.ai.
In this channel, you'll learn about:
- MLOps (Machine Learning Operations),
- LLMs (Large Language Models),
- RAG (Retrieval Augmented Gneerations) applications
Want AI project support? My in-house team offers customized solutions. Book a call below.
In this channel, you'll learn about:
- MLOps (Machine Learning Operations),
- LLMs (Large Language Models),
- RAG (Retrieval Augmented Gneerations) applications
Want AI project support? My in-house team offers customized solutions. Book a call below.
Coding Llama 3 from scratch in PyTorch - Part 2
In this video series, you will learn how to train and fine-tune Llama 3 model from scratch.
The goal is to code LLaMA 3 from scratch in PyTorch to create models with sizes 3B, 6B, 22B, 45B, 35B and 45BM params. In this second video, you'll learn about continous pretraining, LLM benchmarks and you'll also get to see the results.
🤖 Models:
- Llama-3-6B-v0.1: huggingface.co/prince-canuma/Llama-3-6B-v0.1
- Llama-3-6B-v0.1 adapters: huggingface.co/prince-canuma/Llama-3-6B-v0.1-adapters
- Llama-3-6B-v0 (Untrained): huggingface.co/prince-canuma/Llama-3-6B-v0
📚Papers:
- LoRA: Low-Rank Adaptation of Large Language Models: arxiv.org/abs/2106.09685
- QLoRA: Efficient Finetuning of Quantized LLMs
: arxiv.org/abs/2305.14314
💻 To follow along you can use this colab notebook:
- github.com/Blaizzy/Coding-LLMs-from-scratch/tree/main/Llama-3
🎥 Coding Llama 3 from scratch video series
Part 1: ua-cam.com/video/6nYfl_iOKFM/v-deo.html
The goal is to code LLaMA 3 from scratch in PyTorch to create models with sizes 3B, 6B, 22B, 45B, 35B and 45BM params. In this second video, you'll learn about continous pretraining, LLM benchmarks and you'll also get to see the results.
🤖 Models:
- Llama-3-6B-v0.1: huggingface.co/prince-canuma/Llama-3-6B-v0.1
- Llama-3-6B-v0.1 adapters: huggingface.co/prince-canuma/Llama-3-6B-v0.1-adapters
- Llama-3-6B-v0 (Untrained): huggingface.co/prince-canuma/Llama-3-6B-v0
📚Papers:
- LoRA: Low-Rank Adaptation of Large Language Models: arxiv.org/abs/2106.09685
- QLoRA: Efficient Finetuning of Quantized LLMs
: arxiv.org/abs/2305.14314
💻 To follow along you can use this colab notebook:
- github.com/Blaizzy/Coding-LLMs-from-scratch/tree/main/Llama-3
🎥 Coding Llama 3 from scratch video series
Part 1: ua-cam.com/video/6nYfl_iOKFM/v-deo.html
Переглядів: 2 497
Відео
Coding Llama 3 from scratch in PyTorch - Part 1
Переглядів 2,7 тис.Місяць тому
In this video series, you will learn how to train and fine-tune Llama 3 model from scratch. The goal is to code LLaMA 3 from scratch in PyTorch to create models with sizes 3B, 6B, 35B and 45BM params. In this first video, you'll learn about upcycling, downcycling and infini-attention. 📚Papers: - Sparse Upcycling Training Mixture-of-Experts from Dense Checkpoints : arxiv.org/abs/2212.05055 - Pre...
LLMOps: Deploying LLMs and Scaling using Modal, LangChain and Huggingface
Переглядів 3992 місяці тому
In this video, you'll learn about LLMOps, the practice of deploying and scaling LLMs using Modal, Langchain and Huggingface. In the rapidly evolving domain of Large Language Models (LLMs), businesses and researchers grapple with the challenges of efficiently deploying, monitoring and scaling these models. The operational complexities, from infrastructure management to ensuring context-aware res...
Coding Llama 2 from scratch in PyTorch - Part 3
Переглядів 1,1 тис.3 місяці тому
In this video series, you will learn how to train and fine-tune Llama 2 model from scrach. The goal is to code LLaMA 2 from scratch in PyTorch to create models with sizes 100M, 250M and 500M params. In this third video, you'll learn about KV cache, RoPE, and Hugginface Trainer in detail. 📋 KV cache: - ua-cam.com/video/80bIUggRJf4/v-deo.html 🪢 RoPE: - ua-cam.com/video/o29P0Kpobz0/v-deo.html - nn...
Get started with Command-R Cohere's new LLM: RAG and Tool Calling on Consumer GPUs
Переглядів 4563 місяці тому
In this video, you will learn how to do tool calling and RAG with ⌘-R while running it on consumer GPU (i.e, 4090, A5000, T4) with just 24GB VRAM. Model weights 🧠 Transformers: huggingface.co/prince-canuma/c4ai-command-r-v01-4bit MLX-LM: huggingface.co/mlx-community/c4ai-command-r-v01-4bit
Claude 3: The GPT-4 Killer That Will Shock You!
Переглядів 4273 місяці тому
In this video, you'll learn how to use Anthropic's Claude 3 models to extract information from large documents, including the vision variants. ⚙️ Essential Tools We'll Be Using: - Anthropic LLM API: Gain access to Anthropic's cutting-edge language models through their powerful API. - Langchain: a framework for developing applications powered by language models. 🌟 What is Claude 3? Claude 3 is a...
Get started with Gemma Google's NEW open-source LLM model
Переглядів 3,1 тис.3 місяці тому
In this video, I'll show you how to summarize large PDF documents locally on your laptop using Gemma 2B and 7B instruct models. ⚙️ Essential Tools We'll Be Using: - MLX-LM: A library based on MLX, which is a framework for Machine learning research on your laptop or in a data center - by Apple. - Huggingface: The Hugging Face Hub is a platform with over 120k models, 20k datasets, and 50k demos i...
RAGOps: Advanced Retrieval Strategies with LangChain, Langsmith and Supabase.
Переглядів 2 тис.4 місяці тому
Learn to build advanced RAG applications in this video. I'll guide you through setting up each pipeline step, along with monitoring, evaluation, and enhancing your prompts and document processing. Also, get an inside look at how we optimize our RAG pipelines at Kulissiwa.com using LangChain, Langsmith, and Supabase. Follow me on: - LinkedIn: www.linkedin.com/in/prince-canuma/ - X: P...
This channel is pure gold. Keep it up!
So in this series, you don't use any pre-trained weights? You build and train the model from scratch on a custom dataset?
Removing every other layer or something along those lines would be much more effective. If you think about it, this just means that one layer needs to do the work of two layers (one layer + one missing layer). Whereas if you just lop off half the network you suddenly need to learn 16 layers worth of processing in one fell swoop. And not only that, but your old layers need to be retrained since it is no longer sufficient for them to just do their one layer of work they were doing before. Basically, removing every other layer is a finetune, lopping off half the network is a cataclysmic change that (almost) requires training a brand new model from scratch.
The only thing that saves this technique is using the learned embeddings / the learned output layer, but you get that with strided layer removal too. Wish I had seen this video earlier, I'd have saved you $500 lol.
😊
😊🎉
Amazing Video ! Could you Please Upload The traning scripts as well
such a high quality content piece
Can you please make sure that your future videos have higher resolution? Maybe 1440p or above? Other than that, great job! 💯
Well made Prince! Learned a lot
CS programmers are vampires. My eeeeyyyes. great content though
Why are there only 3 likes, I put 4 on HF.)
Why do you use 32 bit paged optimzier when the model is being fine-tuned with QLoRA? Surely QLoRA stores the weights in 8bit double quantized form, so using a 32 bit optimizer makes no difference, and the weight updates need to be converted back to 8 bit anyway? Please help me understand this
Additionally, 8bit states are dequantized to 32bit for the update anyways. huggingface.co/docs/bitsandbytes/main/en/explanations/optimizers
@@princecanuma Thank you for the quick response. With 8-bit optimizers, large models can be finetuned with 75% less GPU memory without losing any accuracy compared to training with standard 32-bit optimizers. The reduced memory requirements means 8-bit optimizers are 4x faster than a standard optimizer, and no hyperparameter tuning is required. Surely this means that using 32 bit just wastes compute power? Please correct me if I'm wrong, I'm really trying to understand the benefits. Is it because training with 32 bit means that despite converting to 8 bit for the weight update, the conversion leads to small accuracy gains?
There are no accuracy gains only reduced GPU usage and potentially some extra speed. In terms of speed, I personally didn’t notice any changes. I tested it yesterday and besides reduced GPU usage I noticed that it would take just as long as the 32bit to complete training.
Your English is nice
Thank you very much!
cool
Awesome, I’m happy you liked it :)
Thanks for committing to the open source and educating people on cutting edge knowledge.
Most welcome, it’s my pleasure!
Very good, can’t wait to see updates to it.
You and me both!
Bro how did you train llama 3 without paper?
Could you elaborate?
@@princecanuma As far as I know there hasn't been an official llama 3 paper released and no data Info as well. But I could be wrong... 😅
@@vivekpadman5248 true, they only released a blog detailing the data, model arch and performance. Here is how I did it: Llama-3 has the same exact architecture of Llama-2 which we already covered in this channel. ua-cam.com/play/PLDn_JsyofyfQp4td_ub6LfIg5vxyu6YJK.html&si=0Gyt9mdaA-ydiWOA Finally, if you understand how these models work you don't need the paper, the code implementation is more than enough.
@@princecanuma oh understood, thanks I'll check it out and also your video 💙
Most welcome :)
this is very impressive and great content. thank you
You're very welcome!
Best video i ever seen. thanks~~!~!~!~!
Most welcome!
It’s my pleasure
Command-R is one of the best models out there for non-English / non-European languages. In Arabic I tried it, it's almost perfect, not as good as Claude (which also perfect for Arabic), but as far as I understand command-R from cohere (the community version I guess) is free! Is that true, it's free (I know command-R-plus is not free).
Super impressive. Great value One question How do I further train the model on my custom content Instead of LORA ? Can we further full training it and add new memory
Most welcome! You can do that, but that can be very expensive.
This is very thoughtful and great initiative! researchers with enough gray matter but limited means can be still in the game . Thank you PC🙏!
Most welcome! It’s my pleasure:) I lived through this so others don’t have to.
🥳🤩👏💐
Thank you for the really nice entry into using gemma locally! Could you share how to utilize GPUs on mac - i just got a mac studio and saw you had referenced some code earlier for NVIDIA. Thnks in advance :)
Most welcome! You can use MLX: github.com/ml-explore/mlx-examples/tree/main/llms
Great work 🎉. Would be great if you can introduce tutorial on coding GPT and BERT from scratch as well using only Pytorch. And then show how to do their pre training on custom data.
Thank you very much! Llama is pretty close to GPT so I think BERT is more differentiated. What kind of data would you suggest?
Can we have the presentation please?
Sure, here you go! www.canva.com/design/DAF7MlJ2Zoc/f75ryYIZnLc80NlIFZhS5A/edit?DAF7MlJ2Zoc&
@@princecanuma Appreciate it my friend
Great video! Learnt a lot.
Thank you very much! I’m happy you liked it :) There is so much more on the way.
@@princecanuma Could you go over how to implement Parent Document retriever?
@user-vd7im8gc2w Why do you need position ids? You use it to map the input ids to their respective position in the sequence. Example: input_ids = [100, 20, 4, 50] position_ids = torch.arange(input_ids.shape…) print(position_ids) >> [0, 1, 2, 3]
Keep up the good work
Thank you!
Brilliant 🎉
Thanks!
First time watching your video. Keep going bro 💪, its your friend Afzal
Thank you very much brother! It's been long my friend :)
Really great job!
Thank you very much, Remek! I’m happy you liked it :)
Hey please continue with the coding llama 2 from scratch
Hey, thanks for watching and pinging me for part 3. Don’t worry, Coding Llama 2 from scratch part 3 should be up soon. Potentially tomorrow :) The video has been recorded, However, it was delayed due to my first ever graduation which occurred today, a very important moment for me. 👨🏾🎓
waiting for the training part
Working on it 👌🏽 The video should be out this week.
Great work! Wait for your next videos
Thank very much! New videos dropping soon.
Amazing video 🖖🏽
Thank you very much! I’m happy you enjoy it :)
Thank you very much this was very useful
Most welcome :)
Great job! Thank you!
Hi, thank you very much!
Is there a way I could go about doing the same thing in Windows and Gemma?
Hi, thanks for watching! Yes, there is and I will cover it in a future video soon. 👌🏽
Parabens Prince é um orgulho ver oque te tornaste na esfera das tecnologias. Avante
Thank you very much brother! It means a lot coming from you :) Long time no see, let’s catch up.
uau, amazing Prince, thanks for sharing this very useful content
Most welcome :) Thank you for watching, Stelio!