- 101
- 463 140
Trelis Research
Ireland
Приєднався 22 чер 2023
Learn more at Trelis.com/About:
- Fine-tuning scripts
- Inference guides
- Vision fine-tuning scripts
- Voice/transcription fine-tuning scripts
- One click LLM API templates
- Function-calling models
Get the Trelis Updates Newsletter: blog.Trelis.com
Follow Trelis Research on X: x.com/TrelisResearch
- Fine-tuning scripts
- Inference guides
- Vision fine-tuning scripts
- Voice/transcription fine-tuning scripts
- One click LLM API templates
- Function-calling models
Get the Trelis Updates Newsletter: blog.Trelis.com
Follow Trelis Research on X: x.com/TrelisResearch
How to Build an Inference Service
➡️ Lifetime access to ADVANCED-inference Repo (incl. future additions): trelis.com/ADVANCED-inference/
➡️ Runpod Affiliate Link: runpod.io?ref=jmfkcdio
➡️ One-click GPU templates: github.com/TrelisResearch/one-click-llms
➡️ Thumbnail made with this tutorial: ua-cam.com/video/ThKYjTdkyP8/v-deo.html
OTHER TRELIS LINKS:
➡️ Trelis Newsletter: blog.Trelis.com
➡️ Trelis Resources and Support: Trelis.com/About
VIDEO LINKS:
- Slides: docs.google.com/presentation/d/16pzJu0yKlVHA5TEdls62HseCA05YLLTnVfhkp24ScLQ/edit?usp=sharing
TIMESTAMPS:
00:00 - Introduction to AI Inference Scaling
00:38 - Video Agenda Overview
02:00 - Different Inference Approaches
05:13 - Understanding GPU Utilization
08:53 - Setting Up One-Click Templates
14:14 - Docker Image Configuration
24:19 - Building Auto-Scaling Service
29:19 - Model Configuration Settings
35:35 - Load Testing and Metrics
41:35 - Scaling Manager Implementation
56:15 - Setting Up API Endpoint
59:51 - Conclusion and Future Topics
➡️ Runpod Affiliate Link: runpod.io?ref=jmfkcdio
➡️ One-click GPU templates: github.com/TrelisResearch/one-click-llms
➡️ Thumbnail made with this tutorial: ua-cam.com/video/ThKYjTdkyP8/v-deo.html
OTHER TRELIS LINKS:
➡️ Trelis Newsletter: blog.Trelis.com
➡️ Trelis Resources and Support: Trelis.com/About
VIDEO LINKS:
- Slides: docs.google.com/presentation/d/16pzJu0yKlVHA5TEdls62HseCA05YLLTnVfhkp24ScLQ/edit?usp=sharing
TIMESTAMPS:
00:00 - Introduction to AI Inference Scaling
00:38 - Video Agenda Overview
02:00 - Different Inference Approaches
05:13 - Understanding GPU Utilization
08:53 - Setting Up One-Click Templates
14:14 - Docker Image Configuration
24:19 - Building Auto-Scaling Service
29:19 - Model Configuration Settings
35:35 - Load Testing and Metrics
41:35 - Scaling Manager Implementation
56:15 - Setting Up API Endpoint
59:51 - Conclusion and Future Topics
Переглядів: 419
Відео
Object and Bounding Box Detection Vision Fine tuning
Переглядів 57416 годин тому
➡️ Florence 2 Colab Notebook: colab.research.google.com/drive/1t0C7pYtcrS_BOR-0jHVL8QsZsfMNtTih?usp=sharing ➡️ Get Life-time Access to the Complete Scripts (and future improvements): Trelis.com/ADVANCED-vision/ ➡️ One-click fine-tuning and LLM templates: github.com/TrelisResearch/one-click-llms ➡️ Trelis Newsletter: blog.Trelis.com ➡️ Resources/Support/Discord: Trelis.com/About ➡️ Thumbnail mad...
Output Predictions - Faster Inference with OpenAI or vLLM
Переглядів 1,3 тис.День тому
➡️ Lifetime access to ADVANCED-inference Repo (incl. future additions): trelis.com/ADVANCED-inference/ ➡️ Runpod Affiliate Link: runpod.io?ref=jmfkcdio ➡️ One-click GPU templates: github.com/TrelisResearch/one-click-llms ➡️ Thumbnail made with this tutorial: ua-cam.com/video/ThKYjTdkyP8/v-deo.html OTHER TRELIS LINKS: ➡️ Trelis Newsletter: blog.Trelis.com ➡️ Trelis Resources and Support: Trelis....
Coding Assistant for Jupyter Lab
Переглядів 42521 день тому
➡️ FAQ and Support: github.com/TrelisResearch/trelis-assistant TIMESTAMPS: 0:00 Coding Assistant for Jupyter Lab 0:05 Installation with `pip install trelis` 1:03 Getting an API key 1:37 Fixing your code with Trelis Assistant 3:05 Running on a remote server (e.g. Runpod) 3:47 Resources
Predicting Events with Large Language Models
Переглядів 3,3 тис.28 днів тому
➡️ Colab Notebook: colab.research.google.com/drive/1FDxLic7DxO9boh-UXi84Cej9P9lQyTLC?usp=sharing ➡️ Newsletter: blog.Trelis.com ➡️ Resources/Support/Discord: Trelis.com/About ➡️ Thumbnail made with this tutorial: ua-cam.com/video/ThKYjTdkyP8/v-deo.html OTHER RESOURCES: - Slides: docs.google.com/presentation/d/12z3XWj6I46ZaBjWV_kTQdB_-AG3U5WI74D4DPhkGlbY/edit?usp=sharing - Metaculus Q4 Tournamen...
Fine tune and Serve Faster Whisper Turbo
Переглядів 2,3 тис.Місяць тому
➡️ Colab Notebook: colab.research.google.com/drive/1OkT0CLE219qbwQoXV94wNk_4Un7Du2sH?usp=sharing ➡️ Get Life-time Access to the ADVANCED Transcription Scripts (and future improvements): Trelis.com/ADVANCED-transcription ➡️ One-click-templates: github.com/TrelisResearch/one-click-llms ➡️ Newsletter: blog.Trelis.com ➡️ Resources/Support/Discord: Trelis.com/About ➡️ Thumbnail made with this tutori...
OpenAI Fine-tuning vs Distillation - Free Colab Notebook
Переглядів 1,5 тис.Місяць тому
➡️ Colab Notebook: colab.research.google.com/drive/1aXoGCvRV6pWus63oy5kYDtkbyhXRf95a?usp=sharing ➡️ Get Life-time Access to the ADVANCED Scripts (and future improvements): Trelis.com/ADVANCED-fine-tuning ➡️ Newsletter: blog.Trelis.com ➡️ Resources/Support/Discord: Trelis.com/About ➡️ Thumbnail made with this tutorial: ua-cam.com/video/ThKYjTdkyP8/v-deo.html OTHER RESOURCES: - Synthetic Data Pre...
Synthetic Data Generation and Fine tuning (OpenAI GPT4o or Llama 3)
Переглядів 2,7 тис.Місяць тому
➡️ Get Life-time Access to the Complete Scripts (and future improvements): Trelis.com/ADVANCED-fine-tuning ➡️ One-click fine-tuning and LLM templates: github.com/TrelisResearch/one-click-llms ➡️ Newsletter: blog.Trelis.com ➡️ Resources/Support/Discord: Trelis.com/About ➡️ Thumbnail made with this tutorial: ua-cam.com/video/ThKYjTdkyP8/v-deo.html VIDEO RESOURCES: - Slides: docs.google.com/presen...
Test Time Compute, Part 2: Verifiers
Переглядів 982Місяць тому
➡️ Lifetime access to ADVANCED-inference Repo (incl. future additions): trelis.com/ADVANCED-inference/ ➡️ Runpod Affiliate Link: runpod.io?ref=jmfkcdio ➡️ One-click GPU templates: github.com/TrelisResearch/one-click-llms ➡️ Thumbnail made with this tutorial: ua-cam.com/video/ThKYjTdkyP8/v-deo.html ➡️ Test Time Compute, Part 1: Sampling and Chain of Thought - ua-cam.com/video/qJoy8U27NPo/v-deo.h...
Test Time Compute, Part 1: Sampling and Chain of Thought
Переглядів 2 тис.Місяць тому
➡️ Lifetime access to ADVANCED-inference Repo (incl. future additions): trelis.com/ADVANCED-inference/ ➡️ Runpod Affiliate Link: runpod.io?ref=jmfkcdio ➡️ One-click GPU templates: github.com/TrelisResearch/one-click-llms ➡️ Thumbnail made with this tutorial: ua-cam.com/video/ThKYjTdkyP8/v-deo.html ➡️ Test Time Compute, Part 2: Verifiers - ua-cam.com/video/MvaUcc0mNOU/v-deo.html OTHER TRELIS LIN...
Distillation of Transformer Models
Переглядів 1,9 тис.Місяць тому
➡️ Get Life-time Access to the Complete Scripts (and future improvements): Trelis.com/ADVANCED-fine-tuning ➡️ One-click fine-tuning and LLM templates: github.com/TrelisResearch/one-click-llms ➡️ Newsletter: blog.Trelis.com ➡️ Resources/Support/Discord: Trelis.com/About ➡️ Thumbnail made with this tutorial: ua-cam.com/video/ThKYjTdkyP8/v-deo.html With credit to Rohan Sharma for work on these scr...
Fine tuning Pixtral - Multi-modal Vision and Text Model
Переглядів 3,4 тис.Місяць тому
➡️ Get Life-time Access to the Complete Scripts (and future improvements): Trelis.com/ADVANCED-vision/ ➡️ One-click fine-tuning and LLM templates: github.com/TrelisResearch/one-click-llms ➡️ Newsletter: blog.Trelis.com ➡️ Resources/Support/Discord: Trelis.com/About ➡️ Thumbnail made with this tutorial: ua-cam.com/video/ThKYjTdkyP8/v-deo.html VIDEO RESOURCES: Slides: docs.google.com/presentation...
Powering Gigawatt Data Centers
Переглядів 2892 місяці тому
➡️ Trelis Newsletter: blog.Trelis.com ➡️ Trelis Resources/Support/Discord: Trelis.com/About VIDEO RESOURCES: - Tool: GigawattDatacenter.com - GitHub Repo: github.com/TrelisResearch/gigawatt-datacenter - Austin Vernon's Blog: austinvernon.site/blog/datacenterpv.html - Slides: docs.google.com/presentation/d/1acAQDQROO_Q8iQT83vt4zH75jx4PPQXO511R7olnCFw/edit?usp=sharing TIMESTAMPS: 0:00 Options for...
Full Fine tuning with Fewer GPUs - Galore, Optimizer Tricks, Adafactor
Переглядів 1,5 тис.2 місяці тому
➡️ Get Life-time Access to the Complete Scripts (and future improvements): Trelis.com/ADVANCED-fine-tuning ➡️ One-click fine-tuning and LLM templates: github.com/TrelisResearch/one-click-llms ➡️ Newsletter: blog.Trelis.com ➡️ Resources/Support/Discord: Trelis.com/About VIDEO RESOURCES: - Slides: docs.google.com/presentation/d/18zooX23ltF5P5CRN8-zuMOJc6htu8fdjJfRRC1wcymQ/edit?usp=sharing - Origi...
Make Cursor Understand Folder Structure - Coding with LLMs
Переглядів 1,1 тис.2 місяці тому
➡️ Trelis Flatten: github.com/TrelisResearch/flatten OTHER TRELIS LINKS: ➡️ Trelis Repos & Discord: Trelis.com/About ➡️ Trelis Newsletter: blog.Trelis.com A simple tool to flatten your repo into, a) a yaml file with the file/folder structure b) a text file with all of your repo content (except for stuff in .gitignore and .flattenignore) Yes, it's true you can already query the entire file struc...
Automated Prompt Engineering with DSPy
Переглядів 3 тис.2 місяці тому
Automated Prompt Engineering with DSPy
Fine Tune Flux Diffusion Models with Your Photos
Переглядів 2,8 тис.2 місяці тому
Fine Tune Flux Diffusion Models with Your Photos
CONTEXT CACHING for Faster and Cheaper Inference
Переглядів 1,7 тис.2 місяці тому
CONTEXT CACHING for Faster and Cheaper Inference
Run Speech-to-Speech Models on Mac or GPU
Переглядів 2 тис.2 місяці тому
Run Speech-to-Speech Models on Mac or GPU
LLM Security 101: Jailbreaks, Prompt Injection Attacks, and Building Guards
Переглядів 9043 місяці тому
LLM Security 101: Jailbreaks, Prompt Injection Attacks, and Building Guards
Create an AI Assistant or Endpoint for your Documents
Переглядів 7153 місяці тому
Create an AI Assistant or Endpoint for your Documents
RAG - but with Verified Citations!
Переглядів 2,2 тис.3 місяці тому
RAG - but with Verified Citations!
How to pick a GPU and Inference Engine?
Переглядів 4,2 тис.3 місяці тому
How to pick a GPU and Inference Engine?
LLM Tool Use - GPT4o-mini, Groq & Llama.cpp
Переглядів 2,7 тис.3 місяці тому
LLM Tool Use - GPT4o-mini, Groq & Llama.cpp
Text to Speech Fine-tuning Tutorial
Переглядів 6 тис.4 місяці тому
Text to Speech Fine-tuning Tutorial
Mastering Retrieval for LLMs - BM25, Fine-tuned Embeddings, and Re-Rankers
Переглядів 7 тис.4 місяці тому
Mastering Retrieval for LLMs - BM25, Fine-tuned Embeddings, and Re-Rankers
Improving LLM accuracy with Monte Carlo Tree Search
Переглядів 13 тис.4 місяці тому
Improving LLM accuracy with Monte Carlo Tree Search
Preparing Fineweb - A Finely Cleaned Common Crawl Dataset
Переглядів 2,1 тис.5 місяців тому
Preparing Fineweb - A Finely Cleaned Common Crawl Dataset
Hi Ronan, Can you guide on the best resources for algorithmic trading using ML, DL and AI. Also, are you planning to offer Black Friday Sale or Christmas Discount on your Trelis Advanced Repo?
Wonderful work.
oH my, another cool content!
I am just going through this... print("Model config:") pprint(model.config.to_dict()) # Specifically get hidden size if hasattr(model.config, 'hidden_size'): print(f" Hidden dimension: {model.config.hidden_size}") elif hasattr(model.config, 'd_model'): print(f" Hidden dimension: {model.config.d_model}") else: # Try to find it in the model architecture for name, param in model.named_parameters(): if 'embed' in name: print(f" Embedding dimension (likely hidden dim): {param.shape[-1]}") break Embedding dimension (likely hidden dim): 1024 So square root of that is about 32, not 8?
Yeah you’re right. I have alpha too low. I realized that too afterwards
Note that adjusting alpha is equivalent to adjusting LR (if you’re not training embeddings).
Awesome video Ronan as always. This video is very timely as we are optimizing our costs by moving away from runpod serverless. I have a couple of questions. - Can the service you have written scale to 0? It seems that with the minimum TPS being a positive number, this wouldn't work right? Scaling to 0 is very important for us as we have bursty traffic with long idle times and this is the primary motivation for serverless. - Is there any alternative to configuring the TPS scaling limits manually for each GPU/model combination? This seems a bit cumbersome. Would it be possible to scale directly based on the GPU utilization? I am thinking something like ssh'ing into the instance with paramiko and automatically running nvidia-smi (you can output results to a csv with --format=csv and --query-gpu parameters). You can then use the output of these results to determine if the GPUs are at full utilization. Maybe take a sample over your time window as this number can fluxuate a lot. Then you can use this to determine whether you need to add or subtract instances and you could use current TPS to determine if the instances is being used at all (scale to 0). Do you think this approach would work? - Do you only support runpod or can other clouds like vast.ai or shadeform be added as well? Both have apis that allow you to create, delete, and configure specific instances. Runpod has had many GPU shortage issues lately specifically for 48gb GPUs (A40, L4, L40, 6000 Ada etc.) - Is there any configuration here for Secure cloud vs. Community cloud? I think by default if you don't specify in the runpod api, it defaults to "ALL" which means that you will get whatever. Community cloud can be less stable and less secure so many users may want to only opt for Secure cloud. Again, I really appreciate the content you produce. For anyone who hasn't purchased access to the Trelis git repos yet, they are quite the value. Ronan consistently keeps them up-to-date with the latest models and new approaches. It is a great return on your investment and the gift that keeps on giving!
Trelis at it again..
Thanks for the video! I just spoke to RunPod support and they said that you pay for an active worker whether it is running (i.e. processing API endpoint requests) or not. But in your video at 18:07 you mention that you only pay when the worker is actually running (processing requests). I'm a bit confused now as this means for a single worker GPU costing $0.00019 p/s this would be approx. $345 p/m! For a new web app without much traffic or revenue this would be too expensive to run. Would you mind clarifying the severless costs again please? And in-particular if you have run serveless endpoints with a single active worker, and if so, were you charged only when the active worker is processing requests or 24/7 even when it is idle? This is a direct quote from RunPod support: "To clarify, the cost is incurred for each active worker, and it applies while the worker is active, even if it's not actively processing requests. The worker is dedicated to your endpoint and cannot be shared with others, so you incur a cost as long as the worker is running, not just when it is processing requests."
Howdy. I just released a video on a similar topic taht you may find useful. But for serverless, yes, you pay any time a worker is active. You can set the minimum workers to zero, in which case there will be none active when there are no requests. In that case you won't be paying when it's not running. For production applications this is less recommended because not having any workers running by default means there will be a slow start when the first request comes in. But for small models, this won't be too bad.
What is main challenge you feel when there is Indian Language. It's about style vector or phonemes?
yeah some different phonemes and styles, and the letters/alphabet is different too and more out of distribution than english for most models
@@TrelisResearch hmm I think so actually phonemes are different. Because many languages were splited from devnagari styles may be it's causing problem I think
very helpful and informative video. thank you
you're welcome
Thanks for the awesome video.
You’re welcome
So this is the master class, in practical LLM fine-tuning with the latest cutting edge technology. I have searching for such a comprehensive tutorial for so long. Thanks!
Thanks! You’re welcome
whisper faster or whisper turbo?
I combine both! Whisper turbo is the model and faster whisper is the inference engine
Amazing
How do we deal with hallucination resulting from our background info?
Take a look at my video on synthetic data generation. I cover it there. Unless I’m misreading your Q and it relates to caching?
Isn't referring expression basically like "dog to the right" in a image of, lets say two dogs, it will segment the dog to the right.
Yes! Although sometimes the concepts of left and right can be hard for models to grasp. But let’s say “grey dog”
wow this is really cool, what's next? um... how about OmniParser by Microsoft?
Haven’t tried it. Would you highly recommend?
@TrelisResearch well um I'm not really sure because these days what's i thought useful might just feel old when the next model came up, and they came up pretty fast... yep AI isn't slowing down at all...
Can you please make a tutorial on ddp of computer vision or LLMs using docker and Kubernetes for industry scale training. Ray is also slowly picking up pace. Your videos are great but can you please help to scale them to large scale training.
What kind of application have you in mind? Would help me build a video if you can give something specific .
@TrelisResearch continued pre-training to adapt an open source SLM to a new domain. Or instruction fine-tuning on a large dataset that doesn't fit on a single VM's memory in one go. Please make a video to setup an iterative data loader process + DDP to train an SLM across multi node multi gpu setup using docker + Kubernetes or Ray. Currently, I am trying to figure out how to handle very large datasets and complete training in a reasonable amount of time. Single gpu training takes too long if the data size is large
@@savanthtadepalli3968 you checked this? ua-cam.com/video/gXDsVcY8TXQ/v-deo.htmlsi=sGRQIuJs22Sq-LSI but yeah I hear you on multi-node. I think you could just increase the number of nodes in the above tutorial. Data management does become important and it's true I don't dig enough on that in the vid
@@TrelisResearch yes I did follow that video to the letter and spirit for understanding DDP and FSDP. But like you pointed out, data management is a big issue that needs to be handled for large datasets. Also, my senior manager keeps bugging me about using docker + Kubernetes or even a more robust and currently trending orchestration framework like Ray to handle multi-node training that is platform agnostic. I'm new to the field and I failed to convince him that Huggingface accelerate is also platform agnostic. Can you please make a video, using a large dummy dataset to train an LLM at scale on a multi-node cluster using some orchestration tool like Ray. Recently, even Amazon ditched Apache spark in favour of Ray for their data workloads and openai uses Ray for large scale training. There are no good resources that address large scale data handling+ multi-node training using orchestration frameworks.
All this feels very exciting , i feel like someone with a great insight on the application of such amazing AI Watching these tutorials feels like watching the under the hood tech of a potential billion or even a trillion dollar empire
You are amazing!!!
Please censor your hair it's so good it's distracting xD
Sorry can’t, that’s how I get my subs
Great tutorial! There's something missing tho'. How to define correctly the examples for any DSPy methods that accept a trainset. You need to define a list of dspy.Example(). Putting it out here maybe it well help someone.
Yeah true, setting up examples is key. Perhaps in a bit I'll do another deeper DSPy vid
Best tutorial on DSPy I've watched so far.
thanks, appreciate it
nice
i see the code ```max_tokens_per_img = 4096``` in the advance demo of vLLM offline inference example, does this basically mean the maximum amount of the patches of 16x16 pixel it would support?
To first order yes, but I need to check if there’s any perceiver resampling to compress. I just don’t remember off hand.
or put it in another way, is one patch mapped to only one token?
Pros vs cons , faster but uses more tokens? Seems worse than cache, faster and cheaper. Dosent seem efficient to regeneration of everthing for just a small chage.
Yeah probably true for expensive models. But for cheaper models you probably pay for the speed up. As I understand, Cursor use a 70B model for fast apply. And they wouldn’t do fast apply if users didn’t like the speed up. So yeah I think you’re right but still there’s a case for doing the speed up - maybe more so for cheaper models
Great tutorial with no nonsense, thank you so much for creating & sharing.
You’re welcome! Thanks
very clear and thorough. subscribed!
Thanks
Thanks a lot for this incredibly useful video. I am trying to fine tune Mistral 8x7b using QLORA and PERT. I am using A100 machine. Was very confused about model and adopter combination. Your video has helped a lot.
Cheers
Great video! What I don't get is how does this actually saves inference compute? I mean to be sure that my "predicted output" is correct do I not need to calculate the output token as I traditionally would? e.g. the cat is on the -> table. I already have table in my predicted outputs, however to be sure that that's correct don't I need to run inference on the llm anyways? thanks a lot!
Yup, it doesn’t save compute. In fact it not only costs compute to check your prediction it actually costs even more because first it will guess your full prediction and the it will use more tokens for everything that needs fixing. But because you give the prediction it can do all of the compute in one call, so it gives a speed up. More speed but more compute
Nice, is there a similar feature also in transformers?
Transformers isn’t so much an inference library. Huggingface have TGI for inference and yes you can add speculation. Check my earlier video on speculative decoding
hi ronan. do you have an email id to connect?
Best to leave a comment here or else on a relevant page of trelis.com
LoRAX has something very similar. The approach is also very interesting, we can Llama 8B and train a Speculative Decoding Lora adapter and speed up the generation in a lot of cases
Nice. That’s interesting, does the adapter end up being like the SVD of the matrices? I guess you’re trying to get the adapter to simulate the full model?
Please make something on MLX and using not a very high end macbook for fine tuning small LMs locally and more
howdy, what specific kind of dataset and application are you looking for guidance on?
@@TrelisResearch something more on creating computer use of anthropic kind of functionality
Very clear and concise video, great work!
On 11:53 After connecting to Jupyter Notebook, I don't see Whisper fine-tuning notebook. Where should I get it and how to put it there? I've purchased Lifetime Access, but didn't receive any access details yet.
Howdy, you should have received an email from Github AND on your github activity page, you should have access to the advanced-transcription repo - that's where the script is. I'll email you now to ensure you get in
Very interesting... will try... tks 🎉
Brilliant Thank you
your video is totally what the world needed, I shared this to my fellow AI engineers newbies.. honestly with low IQ and dyslexia it is really hard to gain skill in this field... I'll never give up because helpful videos like this are really helping me a lot..
Nice!
Please check datacenters Nvidia supercomputing ,AI Factories even NVIDIA DGX SuperPOD Please and beyond because there will be more just help
What do you mean?
I want a python code that loads a model trained by unsloth. The code runs inference with a simple question without training.
run only without train from path or id
Code with acclrate devicemap=auto to run big model on colab t4
Can you clarify your question?
I want a Python code to run a saved or loaded model unsolsh on 4bit without doing the training step with prompt like howareyou and run it on colab t4 with viga 16g
I tried to run Nemotron70b in Colab T4 and it didn't work and I want to use a compressed version to work well on Vega 16GB only
Is there a way to finetune LLM to say I don't know? This is very important for domain-specific LLMs. Thanks.
It will also be subject to hallucination, but you have two options: 1. Train and prompt the llm to only answer based on info provided by the context/RAG 2. add some rows of data with "wrong/difficult/unknown" questions and have the answer be "I don't know". Probably less robust
@@TrelisResearch I'm finetuning the model to memorize certain data, so I should probably go with the second approach. Thanks. Your videos are immensely helpful!
Have you successfully fine-tuned an LLM to incorporate company-specific knowledge? From my experience, fine-tuning helps models learn the preferred output format for information. I'm particularly interested in approaches for fine-tuning on proprietary data. Additionally, is full fine-tuning feasible for this purpose?
Yeah maybe check my video on memorization and then the one on synthetic data. Both cover this
Awesome video and very good learning material both for beginners and more advanced! Thank you!
It's a great material! Thanks! I followed this tutorial, used a quantized version of pixtral and then fine-tuned it using lora with a A100 40GB GPU in colab. However, I still got outofmemory error. Here is my colab. Would you mind taking a look and tweaking it a bit to make it all work? colab.research.google.com/drive/1Q6KnuhgJuGvL8AxhalhrVLInkhOyKCqT?usp=sharing I also encountered some minor errors like stuck at installing flash-attention, therefore I had to use the quantized version but even though it still ran out of GPU VRAM...also when I ran the evaluation, images did not show up but it was a minor issue
This is great and thank you. Also looking forward to Colab integration.
Thanks! Yeah I need to see what’s possible with Colab. It doesn’t allow as easy access and customization as jupterlab or even Kaggle
Great ideas here. I used this model (from preprint not your example) to estimate US election outcome. I removed betting markets and polling from the search as they are not representative of voters. Surprising results. Look for "2024’s Electoral Math: The Numbers Beyond the Polls"
Hi Ronan, Love your work. Can you please - make this work inside kaggle notebooks. Cant think of typing or switching windows when we are in 2024 - Can you cover full finetune process (base plus instruction) for different libraries like unsloth, axolotl, torchtune etc. Thanks.
Howdy yeah kaggle will maybe be possible and I’m investigating For fine tuning, unsloth is fairly well covered in the fine tuning playlist. axotyl and torch tune I haven’t dug in on just yet. What are the key benefits there over unsloth/transformers?
@@TrelisResearch So unsloth does not allow a full finetune. axolotl allows a yaml based finetune. Torch tune is Native-PyTorch but dont know its positives over TRL
18:55 How will the scaling work when temp is 0? Will it not gice a divded by 0 error?
it would except the numerator and denominator cancel in such a way that you're just left with the single most likely logit as temperature goes to zero.
First!
Please check your e-mails. :-)
Hi, do you think if its possible to fine-tune LLaVA One Vision model with traffic data for traffic scene understanding in the case of autonomous vehicles? I am working on a similar project and would love your take on this? Also, if you don't mind sharing, is there any cloud based service that provides low-cost GPUs for fine-tuning these models?
Yes, it should be - although these models are quite big compared to what is used for autonomous driving (I think) because you need very fast inference. Vast.ai is probably lowest cost but runpod is better UI. you can see a few tips (see fine-tuning) in the github.com/TrelisResearch/one-click-llms repo