The NLP Lab
The NLP Lab
  • 85
  • 91 672
QLORA: Efficient Finetuning of Quantized LLMs | Paper summary
If you are looking to add advanced expertise in Natural Language Processing to your team, you should check out our services at The Global NLP Lab: nlplab.tech/.
===
Link: arxiv.org/abs/2305.14314
Blog post regarding how to use QLoRa: huggingface.co/blog/4bit-transformers-bitsandbytes
Abstract: We present QLoRA, an efficient finetuning approach that reduces memory usage enough to finetune a 65B parameter model on a single 48GB GPU while preserving full 16-bit finetuning task performance. QLoRA backpropagates gradients through a frozen, 4-bit quantized pretrained language model into Low Rank Adapters~(LoRA). Our best model family, which we name Guanaco, outperforms all previous openly released models on the Vicuna benchmark, reaching 99.3% of the performance level of ChatGPT while only requiring 24 hours of finetuning on a single GPU. QLoRA introduces a number of innovations to save memory without sacrificing performance: (a) 4-bit NormalFloat (NF4), a new data type that is information theoretically optimal for normally distributed weights (b) double quantization to reduce the average memory footprint by quantizing the quantization constants, and (c) paged optimziers to manage memory spikes. We use QLoRA to finetune more than 1,000 models, providing a detailed analysis of instruction following and chatbot performance across 8 instruction datasets, multiple model types (LLaMA, T5), and model scales that would be infeasible to run with regular finetuning (e.g. 33B and 65B parameter models). Our results show that QLoRA finetuning on a small high-quality dataset leads to state-of-the-art results, even when using smaller models than the previous SoTA. We provide a detailed analysis of chatbot performance based on both human and GPT-4 evaluations showing that GPT-4 evaluations are a cheap and reasonable alternative to human evaluation. Furthermore, we find that current chatbot benchmarks are not trustworthy to accurately evaluate the performance levels of chatbots. A lemon-picked analysis demonstrates where Guanaco fails compared to ChatGPT. We release all of our models and code, including CUDA kernels for 4-bit training.
===
Follow us on social media to get regular updates on NLP developments:
LinkedIn: www.linkedin.com/company/86900096/admin/
Weekly NLP newsletter: theglobalnlplab.substack.com/
Twitter: TheGlobalNLPLab
#artificialintelligence #nlp #chatgpt #ml #ai #nlproc #machinelearning
Переглядів: 931

Відео

Language Models Don’t Always Say What They Think | Paper summary
Переглядів 37911 місяців тому
If you are looking to add advanced expertise in Natural Language Processing to your team, you should check out our services at The Global NLP Lab: nlplab.tech/. Link: arxiv.org/abs/2305.04388 Abstract: Large Language Models (LLMs) can achieve strong performance on many tasks by producing step-by-step reasoning before giving a final output, often referred to as chain-of-thought reasoning (CoT). ...
Automatic Prompt Optimization with “Gradient Descent”and Beam Search | Paper summary
Переглядів 1,2 тис.11 місяців тому
If you are looking to add advanced expertise in Natural Language Processing to your team, you should check out our services at The Global NLP Lab: nlplab.tech/. Link: arxiv.org/pdf/2305.03495.pdf Abstract: Large Language Models (LLMs) have shown impressive performance as general purpose agents, but their abilities remain highly dependent on prompts which are hand written with onerous trial-and-...
Falcon LLM: the Best Open-source LLM Available at the Moment
Переглядів 2,1 тис.11 місяців тому
If you are looking to add advanced expertise in Natural Language Processing to your team, you should check out our services at The Global NLP Lab: nlplab.tech/. Link: falconllm.tii.ae/ Models link: huggingface.co/tiiuae Abstract: Falcon LLM is a foundational large language model (LLM) with 40 billion parameters trained on one trillion tokens. TII has now released Falcon LLM - a 40B model. The m...
Tree of Thoughts: Deliberate Problem Solving with Large Language Models | Paper summary
Переглядів 923Рік тому
If you are looking to add advanced expertise in Natural Language Processing to your team, you should check out our services at The Global NLP Lab: nlplab.tech/. Link: arxiv.org/abs/2305.10601 Abstract: Language models are increasingly being deployed for general problem solving across a wide range of tasks, but are still confined to token-level, left-to-right decision-making processes during inf...
LIMA: Less Is More for Alignment | Paper summary
Переглядів 969Рік тому
If you are looking to add advanced expertise in Natural Language Processing to your team, you should check out our services at The Global NLP Lab: nlplab.tech/. Link: arxiv.org/abs/2305.11206 Abstract: Large language models are trained in two stages: (1) unsupervised pretraining from raw text, to learn general-purpose representations, and (2) large scale instruction tuning and reinforcement lea...
Learning to Reason and Memorize with Self-Notes | Paper summary
Переглядів 278Рік тому
If you are looking to add advanced expertise in Natural Language Processing to your team, you should check out our services at The Global NLP Lab: nlplab.tech/. Paper title: Learning to Reason and Memorize with Self-Notes Link: arxiv.org/abs/2305.00833 Paper abstract: Large language models have been shown to struggle with limited context memory and multi-step reasoning. We propose a simple meth...
What Will the NLP Industry Look Like in 6-12 Months? with @Slator
Переглядів 275Рік тому
This is a clip from the Slator podcast on which I was a guest. You can watch the full podcast here: ua-cam.com/video/YpjI5F24bbU/v-deo.html Summary: Which types of companies will benefit the most from NLP technology? How will the field look like in 6-12 months? If you are looking to add advanced expertise in Natural Language Processing to your team, you should check out our services at The Glob...
Who will Win the Large Language Model App Race? with @Slator
Переглядів 231Рік тому
This is a clip from the Slator podcast on which I was a guest. You can watch the full podcast here: ua-cam.com/video/YpjI5F24bbU/v-deo.html Summary: we discuss the topic of who will win from the current explosion of LLM use cases and applications? Will it be mainly the big companies, or are we going to see an explosion of startups? If you are looking to add advanced expertise in Natural Languag...
Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models | Paper and Demo
Переглядів 623Рік тому
This video was created with Synthesia: www.synthesia.io/?via=nlplab If you are looking to add advanced expertise in Natural Language Processing to your team, you should check out our services at The Global NLP Lab: nlplab.tech/. Paper link: arxiv.org/abs/2303.04671 Code: github.com/microsoft/visual-chatgpt Abstract: ChatGPT is attracting a cross-field interest as it provides a language interfac...
The rise of API-powered NLP apps: hype cycle, or a new disruptive industry?
Переглядів 234Рік тому
This video was created with Synthesia: www.synthesia.io/?via=nlplab If you are looking to add advanced expertise in Natural Language Processing to your team, you should check out our services at The Global NLP Lab: nlplab.tech/. What is the disruptive potential of API-powered NLP apps? Are they poised to deliver transformative results to all industries? Or, will their impact be limited to certa...
LLaMA | New open foundation Large Language Model by Meta AI | Paper summary
Переглядів 3,9 тис.Рік тому
This video was created with Synthesia: www.synthesia.io/?via=nlplab If you are looking to add advanced expertise in Natural Language Processing to your team, you should check out our services at The Global NLP Lab: nlplab.tech/. Title: LLaMA: Open and Efficient Foundation Language Models Link: research. publications/llama-open-and-efficient-foundation-language-models/ Follow us on s...
ChatGPT: One model for any NLP task? | Paper explained
Переглядів 988Рік тому
This video was created with Synthesia: www.synthesia.io/?via=nlplab If you are looking to add advanced expertise in Natural Language Processing to your team, you should check out our services at The Global NLP Lab: nlplab.tech/. Paper title: Is ChatGPT a General-Purpose Natural Language Processing Task Solver? Link: arxiv.org/abs/2302.06476 Abstract: Spurred by advancements in scale, large lang...
The Flan Collection: Open Source Instruction Tuning | Paper explained
Переглядів 1,9 тис.Рік тому
If you are looking to add advanced expertise in Natural Language Processing to your team, you should check out our services at The Global NLP Lab: nlplab.tech/. This video was created with Synthesia: www.synthesia.io/?via=nlplab Papers to read: - Scaling Instruction-Finetuned Language Models (arxiv.org/abs/2210.11416) - The Flan Collection: Designing Data and Methods for Effective Instruction T...
Adaptive Machine Translation with Large Language Models | Paper explained
Переглядів 527Рік тому
If you are looking to add advanced expertise in Natural Language Processing to your team, you should check out our services at The Global NLP Lab: nlplab.tech/. This video was created with Synthesia: www.synthesia.io/?via=nlplab Paper Title: Adaptive Machine Translation with Large Language Models Link: arxiv.org/abs/2301.13294 Abstract: Consistency is a key requirement of high-quality translati...
Easy Way to Link a Large Language Model to an External Database | REPLUG Paper explained
Переглядів 879Рік тому
Easy Way to Link a Large Language Model to an External Database | REPLUG Paper explained
How Close is ChatGPT to Human Experts?
Переглядів 302Рік тому
How Close is ChatGPT to Human Experts?
Is GPT 3 a Good Data Annotator?
Переглядів 319Рік тому
Is GPT 3 a Good Data Annotator?
When to use a large language model? 4 points to consider in 2023.
Переглядів 715Рік тому
When to use a large language model? 4 points to consider in 2023.
State-of-the-art Zero-shot Speech Synthesis with Vall-E
Переглядів 1,9 тис.Рік тому
State-of-the-art Zero-shot Speech Synthesis with Vall-E
Advanced Reasoning with Large Language Models with Chain of Thought Prompting | Paper explained!
Переглядів 10 тис.Рік тому
Advanced Reasoning with Large Language Models with Chain of Thought Prompting | Paper explained!
In-context Learning - A New Paradigm in NLP?
Переглядів 6 тис.Рік тому
In-context Learning - A New Paradigm in NLP?
Four Natural Language Processing Research Trends to Watch in 2023
Переглядів 3,7 тис.Рік тому
Four Natural Language Processing Research Trends to Watch in 2023
Automatic Prompt Tuning for Large Language Models | RLPROMPT paper explained!
Переглядів 4,9 тис.Рік тому
Automatic Prompt Tuning for Large Language Models | RLPROMPT paper explained!
New Embedding Model by OpenAI - Intro and Explanation
Переглядів 5 тис.Рік тому
New Embedding Model by OpenAI - Intro and Explanation
A Neural Corpus Indexer for Document Retrieval | Paper explained
Переглядів 522Рік тому
A Neural Corpus Indexer for Document Retrieval | Paper explained
ChatGPT Explained | Optimizing Language Models for Dialogue
Переглядів 2,5 тис.Рік тому
ChatGPT Explained | Optimizing Language Models for Dialogue
Efficient Training of Language Models to Fill in the Middle | Paper summary
Переглядів 837Рік тому
Efficient Training of Language Models to Fill in the Middle | Paper summary
BLOOM Large-scale Open source Language Model
Переглядів 418Рік тому
BLOOM Large-scale Open source Language Model
*Paper summary* ByT5: Towards a Token-Free Future with Pre-trained Byte-to-Byte Models
Переглядів 698Рік тому
*Paper summary* ByT5: Towards a Token-Free Future with Pre-trained Byte-to-Byte Models

КОМЕНТАРІ

  • @jasonwong8934
    @jasonwong8934 13 днів тому

    The problem with cot is that it requires the human user to know how to solve that sort of problem and if they knew that, it'd be faster for them to solve the problem themselves. Also it just requires more work from the user.

  • @flightsim-nl
    @flightsim-nl Місяць тому

    The claimed advantage that discrete prompt can be understood by humans is questionable if the prompt looks like "ouslyicals downright certainly consistently" (sic) as in the given example.

  • @MBison-oc5tc
    @MBison-oc5tc 3 місяці тому

    are you a robot? is it ai generated?

  • @rl1111rl
    @rl1111rl 3 місяці тому

    Background music is annoying and distracting

  • @lemark221
    @lemark221 8 місяців тому

    nothing new, She didnt even bother to run a test herself lol waste of time.

  • @VijayBhaskarSingh
    @VijayBhaskarSingh 8 місяців тому

    is that a robot speaking?

  • @GauravKumar-gh5fz
    @GauravKumar-gh5fz 8 місяців тому

    hi,, can i get your mailid , i want to run the github repository of it.

  • @adityaramesh551
    @adityaramesh551 9 місяців тому

    I would just read a blog instead of a shitty video like this.

  • @morris5648
    @morris5648 9 місяців тому

    What no human?

  • @wangdongsheng2076
    @wangdongsheng2076 9 місяців тому

    In the last slides, the first two takeaways contradict with each other. That is, you take an LLM, you get FIM for free, but you also mentioned fine-tuning an existing LM on FIM is less effective than pretraining a LM. It is confusing

    • @keyvannarimani
      @keyvannarimani 6 місяців тому

      It means in order to fine tune an exisiting LM on FIM you would need to add three new token embeddings to the model and fine tune it to learn these embeddings. This is less effective than training the model using FIM tokens from scratch.

  • @WillGilpin
    @WillGilpin 9 місяців тому

    UA-cam will add subtitles if we need them. No need to add them to the videos - they are distracting

  • @best_songs_ever3889
    @best_songs_ever3889 10 місяців тому

    is the host also computer generated using AI...!!🤔

  • @alexisbracav
    @alexisbracav 11 місяців тому

    Bonjour, comment on peu l'utiliser aujourd'hui ? faut il l'acheter ou le télécharger ? pour ajouté des voix a nous ?

  • @computerconcepts3352
    @computerconcepts3352 Рік тому

    Noice 👍

  • @GaoyuanFanboy123
    @GaoyuanFanboy123 Рік тому

    Thanks for the summary. Interesting how much high quality data matters. But they did classic fune-tuning without LoRA? Obviously they have enough compute power to train the whole model but it would be interesting to know benchmarks for a LoRA trained model. To my knowledge, LoRA can even improve fine-tuning performance (?)

    • @TheGlobalNLPLab
      @TheGlobalNLPLab Рік тому

      Hello, thanks! Yes, they did classic fine tuning. Good point about LoRA, it would be interesting to see results with that.

  • @user-wr4yl7tx3w
    @user-wr4yl7tx3w Рік тому

    Your explanation of paper is always clear and easy to follow.

  • @morris5648
    @morris5648 Рік тому

    If you want to be taken seriously, get rid of the fake person. It is the most comon used.

  • @hailking5588
    @hailking5588 Рік тому

    Great videos! I have question regarding the data, did they actually added the chain of thoughts to all of the training data, or only some of them?

  • @JorgetePanete
    @JorgetePanete Рік тому

    Why would you force subtitles?

  • @user-wr4yl7tx3w
    @user-wr4yl7tx3w Рік тому

    great insights. thanks

  • @OmkarRahane-sh5vu
    @OmkarRahane-sh5vu Рік тому

    Did anyone find the person narrating the content to be AI generated, or was that just me perception?

  • @cmehta994
    @cmehta994 Рік тому

    I want to prompt tune my azure openai gpt35 turbo model with my data. Can you please guide?

    • @cmehta994
      @cmehta994 Рік тому

      need asap Can you tell if available

    • @hashhash461
      @hashhash461 Рік тому

      @@cmehta994 how it's going now !

  • @user-wr4yl7tx3w
    @user-wr4yl7tx3w Рік тому

    Why hasn’t ChatGPT use this?

  • @AaronWacker
    @AaronWacker Рік тому

    That is a good thing imho, meaning wow well done! Share details of how you did the lip sync, etc. Nice agent too. Love the style and the coverage of this most important prompt based sidecar student model to a teacher model and presentation of context learning. I've been really using it and RLHF lately and can't believe how performant it is finding solutions to a well described problem. Kudos and keep up great coverage and excellent analysis on making it easier for others. --Much appreciated!! 🍮 yum

  • @ChaoticNeutralMatt
    @ChaoticNeutralMatt Рік тому

    That uncanny valley consistency. So weird.

  • @yb3134
    @yb3134 Рік тому

    Great content, indeed will be interesting to see which new players will succeed

  • @user-wr4yl7tx3w
    @user-wr4yl7tx3w Рік тому

    How can we learn about scaling challenges with NLP?

  • @user-wr4yl7tx3w
    @user-wr4yl7tx3w Рік тому

    Truly amazing. But it seems to be the same as OpenAI ChatGPT 4.0. Released yesterday.

  • @rickrejeleene8298
    @rickrejeleene8298 Рік тому

    Great job Nikola!

  • @incameet
    @incameet Рік тому

    What kinds of tasks encoder-decoder LLMs like Flan-T5 can't do well while decoder-only LLMs like PaLM or GPT3 can do well?

    • @TheGlobalNLPLab
      @TheGlobalNLPLab Рік тому

      Check out this paper that explores this arxiv.org/pdf/2204.05832.pdf

  • @incameet
    @incameet Рік тому

    This introduction is too basic and its content can be easily found online. Look forward to more in-depth coverage on NLP topics.

    • @TheGlobalNLPLab
      @TheGlobalNLPLab Рік тому

      Thanks for your feedback! Hopefully I'll have time to make some more in-depth videos soon!

  • @user-wr4yl7tx3w
    @user-wr4yl7tx3w Рік тому

    the quality of the content is quite high. it should be highly recommended by UA-cam.

  • @enes-the-cat-father
    @enes-the-cat-father Рік тому

    Quality content! Please keep up this great work!

  • @temanapotaka-dewes9097
    @temanapotaka-dewes9097 Рік тому

    CoT prompting seems to be a logical solution to getting the LLM to do what you want it to do. What are the limitations of CoT prompting?

    • @TheGlobalNLPLab
      @TheGlobalNLPLab Рік тому

      Thanks for the comment! The limitations might be that for some tasks CoT might potentially be overcomplicating / diluting the target problem. Also, CoT increases the input/output sequence lengths, leading to slower inference / greater cost. But it's definitely a great technique to test for your problem!

  • @incameet
    @incameet Рік тому

    Siri is very poor at recognize non-native speakers. Are there any recent models which can recognize accent and understand speakers correctly?

    • @TheGlobalNLPLab
      @TheGlobalNLPLab Рік тому

      I think there are several companies/startups working on this. I don't have any specific links though.

  • @incameet
    @incameet Рік тому

    VALL-E still needs 3 second sample. Why it is considered as zero-shot?

    • @TheGlobalNLPLab
      @TheGlobalNLPLab Рік тому

      Thanks for the comment! It's zero shot, because you need to use some data to generate speech like a specific speaker. Otherwise, the problem becomes a general text-to-speech problem.

  • @incameet
    @incameet Рік тому

    B should be read as Billion not as B.

    • @TheGlobalNLPLab
      @TheGlobalNLPLab Рік тому

      Thanks for the comment! Yeah, that's because the video is actually created with AI! Check out: www.synthesia.io/?via=nlplab

    • @firelordzaki1600
      @firelordzaki1600 Рік тому

      Bra go read that actual paper it's only 43pgs. Whoops I meant pages! 😬

  • @ruocaled
    @ruocaled Рік тому

    oh no, you finally became an AI yourself. 🙂 I tested with a few English-Chinese translation base on this method, it works pretty well. The hard part is having a large dababase with related translation examples. But I guess if I am translating a novel, I can only do 1/10 and let AI mimic my style and finish the work.

    • @TheGlobalNLPLab
      @TheGlobalNLPLab Рік тому

      Thanks for the comment! Haha yes! 😃 Indeed, probably a big benefit will be personalisation over time.

  • @rohaanmanzoor3268
    @rohaanmanzoor3268 Рік тому

    this video looks AI generated

  • @nft8888
    @nft8888 Рік тому

    Good content, can I know what app is for video avatar?

    • @TheGlobalNLPLab
      @TheGlobalNLPLab Рік тому

      Thanks! This is the video avatar www.synthesia.io/

  • @juanadelossantos5671
    @juanadelossantos5671 Рік тому

    Thanks so much. Really helped me!

  • @andybaker4861
    @andybaker4861 Рік тому

    She's a robot, right? She's good. It was her occasionally unnatural inflection that tipped me off.

    • @TheGlobalNLPLab
      @TheGlobalNLPLab Рік тому

      Yes indeed! I'm using www.synthesia.io/

    • @tNotimportant
      @tNotimportant Рік тому

      It wasn't the inflections that got me it was the unnatural movements.

  • @jueliang
    @jueliang Рік тому

    Is this person AI generated?

  • @ruocaled
    @ruocaled Рік тому

    Ha ! looks like Ai already took the “prompt engineer” s job!😂

  • @prathams8685
    @prathams8685 Рік тому

    Underrated stuff

  • @motherofgod8265
    @motherofgod8265 Рік тому

    where do I access this

  • @brandomiranda6703
    @brandomiranda6703 Рік тому

    what is the largest size you've seen with current models e.g. A100?

  • @gareebmanus2387
    @gareebmanus2387 Рік тому

    Thanks for sharing--very well explained!

  • @user-fz7db4ls3i
    @user-fz7db4ls3i Рік тому

    Omg, that's what I needed. I got degree in applied linguistics and would like to study nlp. But I don't know where to start. It would be great if you could recommend resources for this, especially for Algebra and math stuff. As I got from job vacancies I will also need to. learn Machine Learning right? I know it's gonna be a long way but I really want to take it

  • @aamir122a
    @aamir122a Рік тому

    When you say fusion of previous hidden states, with current vector, what do you mean by that, add them, multiply them etc?

    • @TheGlobalNLPLab
      @TheGlobalNLPLab Рік тому

      Hey, they integrate them in the attention, such that the model attends to both current hidden states and previous hidden states