[1hr Talk] Intro to Large Language Models

LoRA explained (and a bit about precision and quantization)

QLoRA paper explained (Efficient Finetuning of Quantized LLMs)

Картина яєшня

DELETE TOXICITY = 5 LEGENDARY STARR DROPS!

ПОРТНИКОВ: замість перемоги - ЗУПИНКА ВОГНЮ! Варіант Кореї неминучий. Путін дозволив удар по США

Tim Dettmers | QLoRA: Efficient Finetuning of Quantized Large Language Models

London Machine Learning Meetup

Переглядів 5 050

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 15 чер 2024
Sponsored by Evolution AI: www.evolution.ai
Abstract: Recent open-source large language models (LLMs) like LLaMA and Falcon are both high-quality and provide strong performance for their memory footprint. However, finetuning these LLMs is still challenging on consumer and mobile devices with a 32B LLaMA model requiring 384 GB of GPU memory for finetuning. In this talk, I introduce QLoRA, a technique that reduces the finetuning requirement of LLMs by roughly 17 times, making a 32B LLM finetunable on 24 GB consumer GPUs and 7B language models finetunable on mobile devices. The talk provides a self-contained introduction on quantization and discusses the critical factors which allow QLoRA to use 4-bit for LLM finetuning while still replicating full 16-bit finetuning performance. I also discuss the evaluation of LLMs and how we used insights from our LLM evaluation study to build one the most powerful open-source chatbots, Guanaco.
Speaker bio: Tim is a PhD student at the University of Washington advised by Luke Zettlemoyer, working on efficient deep learning to make training, fine-tuning, and inference of deep learning models more accessible in particular to those with the least resources. Tim is the maintainer of the bitsandbytes, a widely used machine learning library for 4-bit and 8-bit quantization with 200k pip installations per month. He has a background in applied math and industry automation.
Наука та технологія

КОМЕНТАРІ • 6

@AlfredPros 7 місяців тому
Amazing talk! I'm looking forward for more breakthrough researches on LLM and alike!
@Mawubo 6 місяців тому
Incredible work!
@mirach5072 10 місяців тому
Great preso. are the slides posted anywhere?
@billykotsos4642 10 місяців тому
BASED
@MrEmbrance 10 місяців тому
6:00 why int4 starts from -7 not -8 ?
@MartinAndrews-mdda 5 місяців тому
Because there are 16 4-bit numbers, and would like to have 1..8. Zero takes a space, so -1..-7 is all we can do on the negative side.

Наступне

Автоматичне відтворення

[1hr Talk] Intro to Large Language Models

[1hr Talk] Intro to Large Language Models

LoRA explained (and a bit about precision and quantization)

LoRA explained (and a bit about precision and quantization)

QLoRA paper explained (Efficient Finetuning of Quantized LLMs)

QLoRA paper explained (Efficient Finetuning of Quantized LLMs)

DELETE TOXICITY = 5 LEGENDARY STARR DROPS!

DELETE TOXICITY = 5 LEGENDARY STARR DROPS!

ПОРТНИКОВ: замість перемоги - ЗУПИНКА ВОГНЮ! Варіант Кореї неминучий. Путін дозволив удар по США

ПОРТНИКОВ: замість перемоги - ЗУПИНКА ВОГНЮ! Варіант Кореї неминучий. Путін дозволив удар по США

ВИРУСНЫЕ ВИДЕО / Личные границы 😅

ВИРУСНЫЕ ВИДЕО / Личные границы 😅

LLaMa GPTQ 4-Bit Quantization. Billions of Parameters Made Smaller and Smarter. How Does it Work?

LLaMa GPTQ 4-Bit Quantization. Billions of Parameters Made Smaller and Smarter. How Does it Work?

QLoRA-How to Fine-tune an LLM on a Single GPU (w/ Python Code)

QLoRA—How to Fine-tune an LLM on a Single GPU (w/ Python Code)

A Guide to Parameter-Efficient Fine-Tuning - Vlad Lialin | Munich NLP Hands-on 021

A Guide to Parameter-Efficient Fine-Tuning - Vlad Lialin | Munich NLP Hands-on 021

Democratizing Foundation Models via k-bit Quantization - Tim Dettmers | Stanford MLSys #82

Democratizing Foundation Models via k-bit Quantization - Tim Dettmers | Stanford MLSys #82

Understanding 4bit Quantization: QLoRA explained (w/ Colab)

Understanding 4bit Quantization: QLoRA explained (w/ Colab)

What are Transformer Models and how do they work?

What are Transformer Models and how do they work?

Quantization vs Pruning vs Distillation: Optimizing NNs for Inference

Quantization vs Pruning vs Distillation: Optimizing NNs for Inference

Large Language Models and Knowledge Graphs: Merging Flexibility and Structure

Large Language Models and Knowledge Graphs: Merging Flexibility and Structure

Lessons From Fine-Tuning Llama-2

Lessons From Fine-Tuning Llama-2

First repair of the day 📱

First repair of the day 📱

How charged your battery?

How charged your battery?

5 НЕЛЕГАЛЬНЫХ гаджетов, за которые вас посадят

5 НЕЛЕГАЛЬНЫХ гаджетов, за которые вас посадят

Короче говоря я купил новый комп за 500к

Короче говоря я купил новый комп за 500к

iPhone 15 Pro vs Samsung s24🤣 #shorts

iPhone 15 Pro vs Samsung s24🤣 #shorts

Урна с айфонами!

Урна с айфонами!

Обзор iOS 18 - ПОЛНЫЙ АХ...Й, ИЗМЕНИЛИ ВСЁ! Моя РЕАКЦИЯ и ПЕРВОЕ ВПЕЧАТЛЕНИЕ!

Обзор iOS 18 — ПОЛНЫЙ АХ...Й, ИЗМЕНИЛИ ВСЁ! Моя РЕАКЦИЯ и ПЕРВОЕ ВПЕЧАТЛЕНИЕ!