Tim Dettmers | QLoRA: Efficient Finetuning of Quantized Large Language Models

Поділитися
Вставка
  • Опубліковано 15 чер 2024
  • Sponsored by Evolution AI: www.evolution.ai
    Abstract: Recent open-source large language models (LLMs) like LLaMA and Falcon are both high-quality and provide strong performance for their memory footprint. However, finetuning these LLMs is still challenging on consumer and mobile devices with a 32B LLaMA model requiring 384 GB of GPU memory for finetuning. In this talk, I introduce QLoRA, a technique that reduces the finetuning requirement of LLMs by roughly 17 times, making a 32B LLM finetunable on 24 GB consumer GPUs and 7B language models finetunable on mobile devices. The talk provides a self-contained introduction on quantization and discusses the critical factors which allow QLoRA to use 4-bit for LLM finetuning while still replicating full 16-bit finetuning performance. I also discuss the evaluation of LLMs and how we used insights from our LLM evaluation study to build one the most powerful open-source chatbots, Guanaco.
    ​Speaker bio: Tim is a PhD student at the University of Washington advised by Luke Zettlemoyer, working on efficient deep learning to make training, fine-tuning, and inference of deep learning models more accessible in particular to those with the least resources. Tim is the maintainer of the bitsandbytes, a widely used machine learning library for 4-bit and 8-bit quantization with 200k pip installations per month. He has a background in applied math and industry automation.
  • Наука та технологія

КОМЕНТАРІ • 6

  • @AlfredPros
    @AlfredPros 7 місяців тому

    Amazing talk! I'm looking forward for more breakthrough researches on LLM and alike!

  • @Mawubo
    @Mawubo 6 місяців тому

    Incredible work!

  • @mirach5072
    @mirach5072 10 місяців тому

    Great preso. are the slides posted anywhere?

  • @billykotsos4642
    @billykotsos4642 10 місяців тому

    BASED

  • @MrEmbrance
    @MrEmbrance 10 місяців тому

    6:00 why int4 starts from -7 not -8 ?

    • @MartinAndrews-mdda
      @MartinAndrews-mdda 5 місяців тому

      Because there are 16 4-bit numbers, and would like to have 1..8. Zero takes a space, so -1..-7 is all we can do on the negative side.