Learning by Distilling Context with Charlie Snell

8-bit Methods for Efficient Deep Learning -- Tim Dettmers (University of Washington)

QLoRA: Efficient Finetuning of Quantized LLMs | Tim Dettmers

Що задумав Китай? Саміт миру у Швейцарії: очікування і реальність | Діалоги з Портниковим

Железная задница #орехов #типичный #мотоциклист #байкер

как видит мама vs что происходит на самом деле ( вкусняшки )

8-bit Methods for Efficient Deep Learning with Tim Dettmers

Cohere

Переглядів 3 917

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 18 чер 2024
Tim Dettmers (PhD candidate, University of Washington) presents "8-bit Methods for Efficient Deep Learning" in this Cohere For AI Technical Talk.
Abstract: Large language models are effective tools for many tasks but are difficult to train and inference due to their size. Moving from 32-bit models to 16-bit models resulted in considerable efficiency gains that made training and inference of large models easier. Can we train and inference in 8-bit to make further gains? In this talk, Tim will show that 8-bit inference and training can be used without degrading performance while improving efficiency. To make 8-bit methods work, it is essential to understand how quantization precision affects model performance and training stability as we scale the model size. He will talk about how these factors change with scale and how we need to adjust 8-bit methods to make them work. In particular, he will speak about 8-bit optimizers for training and Int8 inference for large language models with up to 175B parameters. These methods make training and inference more efficient and make large models more accessible to researchers.
Learn more about Tim and his work at timdettmers.com/
Learn more about Cohere For AI at cohere.for.ai.
Наука та технологія

КОМЕНТАРІ • 8

@yacinegaci2831 22 дні тому
Very informative video, thanks.
In the slide where you explain the use of INT4 quantization + LoRA, you said that you pass the inputs through the frozen 4bit quantized pre-trained model, and finetune only the adapters. My question is, do you dequantize the int4 weights of the pre-trained model to fp16, or computations are carried out in int4 (so there is a need to quantize the input to int4 as well)?
@raynardzhang4986 Рік тому
Why does this video don't have any comment, the elaboration on how to experimenting this problem is beautiful. Please publish more video like this.
@wayne5676 Рік тому
Amazing talk! Thanks!
@shahrohit1990 11 місяців тому
I think one of the important findings of this is that as we go higher in model size we see a lot of outliers even though we have a batch normalization layer. so if we improve the training process we can actually do better in quantization?
@heejuneAhn 11 місяців тому
Please explain the implementation more the theory is quite straight forwards in fact
@wayne5676 Рік тому
@8:09 Should it be the opposite, in the sense that
more bits for exponent + fewer bits for fraction => good for big numbers bad for small numbers? Since the range can be covered is bigger hence good for big numbers.
@wayne5676 Рік тому
Can someone illustrate why 10011001 is -6.06e-3? In particular, why 00 is 1e-2 and 1001 is 0.1 + 0.9*9/16?
@heejuneAhn 11 місяців тому
GPTQ is far faster than bitsandbytes in fact.

Наступне

Автоматичне відтворення

Learning by Distilling Context with Charlie Snell

Learning by Distilling Context with Charlie Snell

8-bit Methods for Efficient Deep Learning -- Tim Dettmers (University of Washington)

8-bit Methods for Efficient Deep Learning -- Tim Dettmers (University of Washington)

QLoRA: Efficient Finetuning of Quantized LLMs | Tim Dettmers

QLoRA: Efficient Finetuning of Quantized LLMs | Tim Dettmers

Що задумав Китай? Саміт миру у Швейцарії: очікування і реальність | Діалоги з Портниковим

Що задумав Китай? Саміт миру у Швейцарії: очікування і реальність | Діалоги з Портниковим

Железная задница #орехов #типичный #мотоциклист #байкер

Железная задница #орехов #типичный #мотоциклист #байкер

как видит мама vs что происходит на самом деле ( вкусняшки )

как видит мама vs что происходит на самом деле ( вкусняшки )

Вся правда про український Південь!

Вся правда про український Південь!

Trends in Deep Learning Hardware: Bill Dally (NVIDIA)

Trends in Deep Learning Hardware: Bill Dally (NVIDIA)

LLaMa GPTQ 4-Bit Quantization. Billions of Parameters Made Smaller and Smarter. How Does it Work?

LLaMa GPTQ 4-Bit Quantization. Billions of Parameters Made Smaller and Smarter. How Does it Work?

Contextual and Semantic Information Retrieval using LLMs and Knowledge Graphs

Contextual and Semantic Information Retrieval using LLMs and Knowledge Graphs

LoRA explained (and a bit about precision and quantization)

LoRA explained (and a bit about precision and quantization)

Sparsity for Efficient Long Sequence Generation of LLMs

Sparsity for Efficient Long Sequence Generation of LLMs

Democratizing Foundation Models via k-bit Quantization - Tim Dettmers | Stanford MLSys #82

Democratizing Foundation Models via k-bit Quantization - Tim Dettmers | Stanford MLSys #82

Quantization in Deep Learning (LLMs)

Quantization in Deep Learning (LLMs)

Panel discussion #1 | with Tim Dettmers, Johnathan Frankle, Julien Launay and Ce Zhang

Panel discussion #1 | with Tim Dettmers, Johnathan Frankle, Julien Launay and Ce Zhang

Quantization explained with PyTorch - Post-Training Quantization, Quantization-Aware Training

Quantization explained with PyTorch - Post-Training Quantization, Quantization-Aware Training

Нашел еще 70+ нововведений в iOS 18!

Нашел еще 70+ нововведений в iOS 18!

Will the battery emit smoke if it rotates rapidly?

Will the battery emit smoke if it rotates rapidly?

ПК за 40к для игр и работы в 2024 | Arc A580 - новый топ

ПК за 40к для игр и работы в 2024 | Arc A580 – новый топ

Я перейшов на iPhone, ВИБАЧТЕ! Тепер AppleNews? Хітяра від Xiaomi за копійки! Повернення HTC і т.д

Я перейшов на iPhone, ВИБАЧТЕ! Тепер AppleNews? Хітяра від Xiaomi за копійки! Повернення HTC і т.д

wireless switch without wires part 6

wireless switch without wires part 6

К чёрту Сяоми! Купил СЕБЕ Honor Magic 6 PRO и сравнил с Oneplus 12

К чёрту Сяоми! Купил СЕБЕ Honor Magic 6 PRO и сравнил с Oneplus 12

HP ELECTRONICS #asmr #electronic #repair #oddlysatisfying #shortvideo #shorts #satisfying #short

HP ELECTRONICS #asmr #electronic #repair #oddlysatisfying #shortvideo #shorts #satisfying #short