- 85
- 91 672
The NLP Lab
Switzerland
Приєднався 4 січ 2020
Welcome! In this channel, we'll cover the latest in Natural Language Processing (NLP) and related areas, and go over interesting papers recently published on arXiv.
Topics include but are not limited to: text generation/rewriting, summarization, language models, embeddings, question answering, translation.
Join the Discord of the channel to connect with other NLP enthusiasts: discord.gg/2hErCe3ZZg
If you would like to support the channel, you can consider buying me a coffee: www.buymeacoffee.com/ninikolov
Topics include but are not limited to: text generation/rewriting, summarization, language models, embeddings, question answering, translation.
Join the Discord of the channel to connect with other NLP enthusiasts: discord.gg/2hErCe3ZZg
If you would like to support the channel, you can consider buying me a coffee: www.buymeacoffee.com/ninikolov
QLORA: Efficient Finetuning of Quantized LLMs | Paper summary
If you are looking to add advanced expertise in Natural Language Processing to your team, you should check out our services at The Global NLP Lab: nlplab.tech/.
===
Link: arxiv.org/abs/2305.14314
Blog post regarding how to use QLoRa: huggingface.co/blog/4bit-transformers-bitsandbytes
Abstract: We present QLoRA, an efficient finetuning approach that reduces memory usage enough to finetune a 65B parameter model on a single 48GB GPU while preserving full 16-bit finetuning task performance. QLoRA backpropagates gradients through a frozen, 4-bit quantized pretrained language model into Low Rank Adapters~(LoRA). Our best model family, which we name Guanaco, outperforms all previous openly released models on the Vicuna benchmark, reaching 99.3% of the performance level of ChatGPT while only requiring 24 hours of finetuning on a single GPU. QLoRA introduces a number of innovations to save memory without sacrificing performance: (a) 4-bit NormalFloat (NF4), a new data type that is information theoretically optimal for normally distributed weights (b) double quantization to reduce the average memory footprint by quantizing the quantization constants, and (c) paged optimziers to manage memory spikes. We use QLoRA to finetune more than 1,000 models, providing a detailed analysis of instruction following and chatbot performance across 8 instruction datasets, multiple model types (LLaMA, T5), and model scales that would be infeasible to run with regular finetuning (e.g. 33B and 65B parameter models). Our results show that QLoRA finetuning on a small high-quality dataset leads to state-of-the-art results, even when using smaller models than the previous SoTA. We provide a detailed analysis of chatbot performance based on both human and GPT-4 evaluations showing that GPT-4 evaluations are a cheap and reasonable alternative to human evaluation. Furthermore, we find that current chatbot benchmarks are not trustworthy to accurately evaluate the performance levels of chatbots. A lemon-picked analysis demonstrates where Guanaco fails compared to ChatGPT. We release all of our models and code, including CUDA kernels for 4-bit training.
===
Follow us on social media to get regular updates on NLP developments:
LinkedIn: www.linkedin.com/company/86900096/admin/
Weekly NLP newsletter: theglobalnlplab.substack.com/
Twitter: TheGlobalNLPLab
#artificialintelligence #nlp #chatgpt #ml #ai #nlproc #machinelearning
===
Link: arxiv.org/abs/2305.14314
Blog post regarding how to use QLoRa: huggingface.co/blog/4bit-transformers-bitsandbytes
Abstract: We present QLoRA, an efficient finetuning approach that reduces memory usage enough to finetune a 65B parameter model on a single 48GB GPU while preserving full 16-bit finetuning task performance. QLoRA backpropagates gradients through a frozen, 4-bit quantized pretrained language model into Low Rank Adapters~(LoRA). Our best model family, which we name Guanaco, outperforms all previous openly released models on the Vicuna benchmark, reaching 99.3% of the performance level of ChatGPT while only requiring 24 hours of finetuning on a single GPU. QLoRA introduces a number of innovations to save memory without sacrificing performance: (a) 4-bit NormalFloat (NF4), a new data type that is information theoretically optimal for normally distributed weights (b) double quantization to reduce the average memory footprint by quantizing the quantization constants, and (c) paged optimziers to manage memory spikes. We use QLoRA to finetune more than 1,000 models, providing a detailed analysis of instruction following and chatbot performance across 8 instruction datasets, multiple model types (LLaMA, T5), and model scales that would be infeasible to run with regular finetuning (e.g. 33B and 65B parameter models). Our results show that QLoRA finetuning on a small high-quality dataset leads to state-of-the-art results, even when using smaller models than the previous SoTA. We provide a detailed analysis of chatbot performance based on both human and GPT-4 evaluations showing that GPT-4 evaluations are a cheap and reasonable alternative to human evaluation. Furthermore, we find that current chatbot benchmarks are not trustworthy to accurately evaluate the performance levels of chatbots. A lemon-picked analysis demonstrates where Guanaco fails compared to ChatGPT. We release all of our models and code, including CUDA kernels for 4-bit training.
===
Follow us on social media to get regular updates on NLP developments:
LinkedIn: www.linkedin.com/company/86900096/admin/
Weekly NLP newsletter: theglobalnlplab.substack.com/
Twitter: TheGlobalNLPLab
#artificialintelligence #nlp #chatgpt #ml #ai #nlproc #machinelearning
Переглядів: 931
Відео
Language Models Don’t Always Say What They Think | Paper summary
Переглядів 37911 місяців тому
If you are looking to add advanced expertise in Natural Language Processing to your team, you should check out our services at The Global NLP Lab: nlplab.tech/. Link: arxiv.org/abs/2305.04388 Abstract: Large Language Models (LLMs) can achieve strong performance on many tasks by producing step-by-step reasoning before giving a final output, often referred to as chain-of-thought reasoning (CoT). ...
Automatic Prompt Optimization with “Gradient Descent”and Beam Search | Paper summary
Переглядів 1,2 тис.11 місяців тому
If you are looking to add advanced expertise in Natural Language Processing to your team, you should check out our services at The Global NLP Lab: nlplab.tech/. Link: arxiv.org/pdf/2305.03495.pdf Abstract: Large Language Models (LLMs) have shown impressive performance as general purpose agents, but their abilities remain highly dependent on prompts which are hand written with onerous trial-and-...
Falcon LLM: the Best Open-source LLM Available at the Moment
Переглядів 2,1 тис.11 місяців тому
If you are looking to add advanced expertise in Natural Language Processing to your team, you should check out our services at The Global NLP Lab: nlplab.tech/. Link: falconllm.tii.ae/ Models link: huggingface.co/tiiuae Abstract: Falcon LLM is a foundational large language model (LLM) with 40 billion parameters trained on one trillion tokens. TII has now released Falcon LLM - a 40B model. The m...
Tree of Thoughts: Deliberate Problem Solving with Large Language Models | Paper summary
Переглядів 923Рік тому
If you are looking to add advanced expertise in Natural Language Processing to your team, you should check out our services at The Global NLP Lab: nlplab.tech/. Link: arxiv.org/abs/2305.10601 Abstract: Language models are increasingly being deployed for general problem solving across a wide range of tasks, but are still confined to token-level, left-to-right decision-making processes during inf...
LIMA: Less Is More for Alignment | Paper summary
Переглядів 969Рік тому
If you are looking to add advanced expertise in Natural Language Processing to your team, you should check out our services at The Global NLP Lab: nlplab.tech/. Link: arxiv.org/abs/2305.11206 Abstract: Large language models are trained in two stages: (1) unsupervised pretraining from raw text, to learn general-purpose representations, and (2) large scale instruction tuning and reinforcement lea...
Learning to Reason and Memorize with Self-Notes | Paper summary
Переглядів 278Рік тому
If you are looking to add advanced expertise in Natural Language Processing to your team, you should check out our services at The Global NLP Lab: nlplab.tech/. Paper title: Learning to Reason and Memorize with Self-Notes Link: arxiv.org/abs/2305.00833 Paper abstract: Large language models have been shown to struggle with limited context memory and multi-step reasoning. We propose a simple meth...
What Will the NLP Industry Look Like in 6-12 Months? with @Slator
Переглядів 275Рік тому
This is a clip from the Slator podcast on which I was a guest. You can watch the full podcast here: ua-cam.com/video/YpjI5F24bbU/v-deo.html Summary: Which types of companies will benefit the most from NLP technology? How will the field look like in 6-12 months? If you are looking to add advanced expertise in Natural Language Processing to your team, you should check out our services at The Glob...
Who will Win the Large Language Model App Race? with @Slator
Переглядів 231Рік тому
This is a clip from the Slator podcast on which I was a guest. You can watch the full podcast here: ua-cam.com/video/YpjI5F24bbU/v-deo.html Summary: we discuss the topic of who will win from the current explosion of LLM use cases and applications? Will it be mainly the big companies, or are we going to see an explosion of startups? If you are looking to add advanced expertise in Natural Languag...
Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models | Paper and Demo
Переглядів 623Рік тому
This video was created with Synthesia: www.synthesia.io/?via=nlplab If you are looking to add advanced expertise in Natural Language Processing to your team, you should check out our services at The Global NLP Lab: nlplab.tech/. Paper link: arxiv.org/abs/2303.04671 Code: github.com/microsoft/visual-chatgpt Abstract: ChatGPT is attracting a cross-field interest as it provides a language interfac...
The rise of API-powered NLP apps: hype cycle, or a new disruptive industry?
Переглядів 234Рік тому
This video was created with Synthesia: www.synthesia.io/?via=nlplab If you are looking to add advanced expertise in Natural Language Processing to your team, you should check out our services at The Global NLP Lab: nlplab.tech/. What is the disruptive potential of API-powered NLP apps? Are they poised to deliver transformative results to all industries? Or, will their impact be limited to certa...
LLaMA | New open foundation Large Language Model by Meta AI | Paper summary
Переглядів 3,9 тис.Рік тому
This video was created with Synthesia: www.synthesia.io/?via=nlplab If you are looking to add advanced expertise in Natural Language Processing to your team, you should check out our services at The Global NLP Lab: nlplab.tech/. Title: LLaMA: Open and Efficient Foundation Language Models Link: research. publications/llama-open-and-efficient-foundation-language-models/ Follow us on s...
ChatGPT: One model for any NLP task? | Paper explained
Переглядів 988Рік тому
This video was created with Synthesia: www.synthesia.io/?via=nlplab If you are looking to add advanced expertise in Natural Language Processing to your team, you should check out our services at The Global NLP Lab: nlplab.tech/. Paper title: Is ChatGPT a General-Purpose Natural Language Processing Task Solver? Link: arxiv.org/abs/2302.06476 Abstract: Spurred by advancements in scale, large lang...
The Flan Collection: Open Source Instruction Tuning | Paper explained
Переглядів 1,9 тис.Рік тому
If you are looking to add advanced expertise in Natural Language Processing to your team, you should check out our services at The Global NLP Lab: nlplab.tech/. This video was created with Synthesia: www.synthesia.io/?via=nlplab Papers to read: - Scaling Instruction-Finetuned Language Models (arxiv.org/abs/2210.11416) - The Flan Collection: Designing Data and Methods for Effective Instruction T...
Adaptive Machine Translation with Large Language Models | Paper explained
Переглядів 527Рік тому
If you are looking to add advanced expertise in Natural Language Processing to your team, you should check out our services at The Global NLP Lab: nlplab.tech/. This video was created with Synthesia: www.synthesia.io/?via=nlplab Paper Title: Adaptive Machine Translation with Large Language Models Link: arxiv.org/abs/2301.13294 Abstract: Consistency is a key requirement of high-quality translati...
Easy Way to Link a Large Language Model to an External Database | REPLUG Paper explained
Переглядів 879Рік тому
Easy Way to Link a Large Language Model to an External Database | REPLUG Paper explained
When to use a large language model? 4 points to consider in 2023.
Переглядів 715Рік тому
When to use a large language model? 4 points to consider in 2023.
State-of-the-art Zero-shot Speech Synthesis with Vall-E
Переглядів 1,9 тис.Рік тому
State-of-the-art Zero-shot Speech Synthesis with Vall-E
Advanced Reasoning with Large Language Models with Chain of Thought Prompting | Paper explained!
Переглядів 10 тис.Рік тому
Advanced Reasoning with Large Language Models with Chain of Thought Prompting | Paper explained!
In-context Learning - A New Paradigm in NLP?
Переглядів 6 тис.Рік тому
In-context Learning - A New Paradigm in NLP?
Four Natural Language Processing Research Trends to Watch in 2023
Переглядів 3,7 тис.Рік тому
Four Natural Language Processing Research Trends to Watch in 2023
Automatic Prompt Tuning for Large Language Models | RLPROMPT paper explained!
Переглядів 4,9 тис.Рік тому
Automatic Prompt Tuning for Large Language Models | RLPROMPT paper explained!
New Embedding Model by OpenAI - Intro and Explanation
Переглядів 5 тис.Рік тому
New Embedding Model by OpenAI - Intro and Explanation
A Neural Corpus Indexer for Document Retrieval | Paper explained
Переглядів 522Рік тому
A Neural Corpus Indexer for Document Retrieval | Paper explained
ChatGPT Explained | Optimizing Language Models for Dialogue
Переглядів 2,5 тис.Рік тому
ChatGPT Explained | Optimizing Language Models for Dialogue
Efficient Training of Language Models to Fill in the Middle | Paper summary
Переглядів 837Рік тому
Efficient Training of Language Models to Fill in the Middle | Paper summary
BLOOM Large-scale Open source Language Model
Переглядів 418Рік тому
BLOOM Large-scale Open source Language Model
*Paper summary* ByT5: Towards a Token-Free Future with Pre-trained Byte-to-Byte Models
Переглядів 698Рік тому
*Paper summary* ByT5: Towards a Token-Free Future with Pre-trained Byte-to-Byte Models
The problem with cot is that it requires the human user to know how to solve that sort of problem and if they knew that, it'd be faster for them to solve the problem themselves. Also it just requires more work from the user.
The claimed advantage that discrete prompt can be understood by humans is questionable if the prompt looks like "ouslyicals downright certainly consistently" (sic) as in the given example.
are you a robot? is it ai generated?
Background music is annoying and distracting
nothing new, She didnt even bother to run a test herself lol waste of time.
is that a robot speaking?
hi,, can i get your mailid , i want to run the github repository of it.
I would just read a blog instead of a shitty video like this.
What no human?
In the last slides, the first two takeaways contradict with each other. That is, you take an LLM, you get FIM for free, but you also mentioned fine-tuning an existing LM on FIM is less effective than pretraining a LM. It is confusing
It means in order to fine tune an exisiting LM on FIM you would need to add three new token embeddings to the model and fine tune it to learn these embeddings. This is less effective than training the model using FIM tokens from scratch.
UA-cam will add subtitles if we need them. No need to add them to the videos - they are distracting
is the host also computer generated using AI...!!🤔
Bonjour, comment on peu l'utiliser aujourd'hui ? faut il l'acheter ou le télécharger ? pour ajouté des voix a nous ?
Noice 👍
Thanks for the summary. Interesting how much high quality data matters. But they did classic fune-tuning without LoRA? Obviously they have enough compute power to train the whole model but it would be interesting to know benchmarks for a LoRA trained model. To my knowledge, LoRA can even improve fine-tuning performance (?)
Hello, thanks! Yes, they did classic fine tuning. Good point about LoRA, it would be interesting to see results with that.
Your explanation of paper is always clear and easy to follow.
If you want to be taken seriously, get rid of the fake person. It is the most comon used.
Great videos! I have question regarding the data, did they actually added the chain of thoughts to all of the training data, or only some of them?
Why would you force subtitles?
great insights. thanks
Did anyone find the person narrating the content to be AI generated, or was that just me perception?
I want to prompt tune my azure openai gpt35 turbo model with my data. Can you please guide?
need asap Can you tell if available
@@cmehta994 how it's going now !
Why hasn’t ChatGPT use this?
That is a good thing imho, meaning wow well done! Share details of how you did the lip sync, etc. Nice agent too. Love the style and the coverage of this most important prompt based sidecar student model to a teacher model and presentation of context learning. I've been really using it and RLHF lately and can't believe how performant it is finding solutions to a well described problem. Kudos and keep up great coverage and excellent analysis on making it easier for others. --Much appreciated!! 🍮 yum
That uncanny valley consistency. So weird.
Great content, indeed will be interesting to see which new players will succeed
How can we learn about scaling challenges with NLP?
Truly amazing. But it seems to be the same as OpenAI ChatGPT 4.0. Released yesterday.
Great job Nikola!
Thanks, Rick!
What kinds of tasks encoder-decoder LLMs like Flan-T5 can't do well while decoder-only LLMs like PaLM or GPT3 can do well?
Check out this paper that explores this arxiv.org/pdf/2204.05832.pdf
This introduction is too basic and its content can be easily found online. Look forward to more in-depth coverage on NLP topics.
Thanks for your feedback! Hopefully I'll have time to make some more in-depth videos soon!
the quality of the content is quite high. it should be highly recommended by UA-cam.
Quality content! Please keep up this great work!
Thanks, Enes!
CoT prompting seems to be a logical solution to getting the LLM to do what you want it to do. What are the limitations of CoT prompting?
Thanks for the comment! The limitations might be that for some tasks CoT might potentially be overcomplicating / diluting the target problem. Also, CoT increases the input/output sequence lengths, leading to slower inference / greater cost. But it's definitely a great technique to test for your problem!
Siri is very poor at recognize non-native speakers. Are there any recent models which can recognize accent and understand speakers correctly?
I think there are several companies/startups working on this. I don't have any specific links though.
VALL-E still needs 3 second sample. Why it is considered as zero-shot?
Thanks for the comment! It's zero shot, because you need to use some data to generate speech like a specific speaker. Otherwise, the problem becomes a general text-to-speech problem.
B should be read as Billion not as B.
Thanks for the comment! Yeah, that's because the video is actually created with AI! Check out: www.synthesia.io/?via=nlplab
Bra go read that actual paper it's only 43pgs. Whoops I meant pages! 😬
oh no, you finally became an AI yourself. 🙂 I tested with a few English-Chinese translation base on this method, it works pretty well. The hard part is having a large dababase with related translation examples. But I guess if I am translating a novel, I can only do 1/10 and let AI mimic my style and finish the work.
Thanks for the comment! Haha yes! 😃 Indeed, probably a big benefit will be personalisation over time.
this video looks AI generated
it is
the ai generated person was so distracting. couldn't watch the video.
I came here to say the exact same thing
Good content, can I know what app is for video avatar?
Thanks! This is the video avatar www.synthesia.io/
Thanks so much. Really helped me!
She's a robot, right? She's good. It was her occasionally unnatural inflection that tipped me off.
Yes indeed! I'm using www.synthesia.io/
It wasn't the inflections that got me it was the unnatural movements.
Is this person AI generated?
Yup
Yes!
@@TheGlobalNLPLab what tools?
@@asiddiqi123 www.synthesia.io/?via=nlplab
I was asking myself the same question!
Ha ! looks like Ai already took the “prompt engineer” s job!😂
Underrated stuff
where do I access this
what is the largest size you've seen with current models e.g. A100?
2048
Thanks for sharing--very well explained!
Omg, that's what I needed. I got degree in applied linguistics and would like to study nlp. But I don't know where to start. It would be great if you could recommend resources for this, especially for Algebra and math stuff. As I got from job vacancies I will also need to. learn Machine Learning right? I know it's gonna be a long way but I really want to take it
When you say fusion of previous hidden states, with current vector, what do you mean by that, add them, multiply them etc?
Hey, they integrate them in the attention, such that the model attends to both current hidden states and previous hidden states