Це відео не доступне.

Перепрошуємо.

Should you switch from BERT to ALBERT?

ChrisMcCormickAI

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 19 сер 2024

КОМЕНТАРІ • 46

@pjoshi_15 4 роки тому ⁺³
Most of the people go along with the hype. You on the other hand have done a critical analysis of this paper. That is commendable. Thanks a lot 👍
@ChrisMcCormickAI 4 роки тому ⁺¹
Thanks, Prateek!
@maikeguderlei2146 4 роки тому ⁺²⁰
Thank you so much for this video! I've been fine-tuning bert, roberta, distilbert, albert and xlnet on a stance detection task and used the base configuration for all models. albert is by far the worst model which makes perfect sense for the arguments you already pointed out. I'm glad I used the v1 version from the beginning though!
@ChrisMcCormickAI 4 роки тому
Haha, thanks for that confirmation!
Have you tried using anything larger than the base models? Or does it not seem worth it for what you’re doing? Thanks again!
@maikeguderlei2146 4 роки тому ⁺¹
no unfortunately not! I have focused on experimenting with a grid of hyperparameters for finetuning and finding out, how much changing hyperparameters affects performance on downstream tasks :) also my vm is quite limited and I'm already having difficutlies with effectively running and saving the base versions
@sumanthbalaji1768 4 роки тому
@@maikeguderlei2146 Hey i'm planning on working on a Stance detection task too. Can you
direct me to your paper or tell me your findings on how well all these
different models fare in the Stance detection task ? What was your
dataset? SemEval16? or some online debate platform dataset?
@tooshallpass Рік тому
Chris, I love how informative your videos are! Please continue to add more such videos where you compare newer variants of these models and analyze them in a similar way. Learning a lot!
@majtales 4 роки тому ⁺²
Nice piece of work. It's nice to have someone with a deep understanding zoom a bit out of not so useful gory details and debunk the magic or hype about some of these models
@jorgeih 4 роки тому ⁺⁷
I am pleased to be watching your videos, thank you and congratulations on this work.
@ChrisMcCormickAI 4 роки тому
Thanks Jorge, appreciate it!
@davidma1194 4 роки тому ⁺¹
Great video series, always hope for the next one!
@chronicfantastic 4 роки тому ⁺²
Very interesting. Thanks for the video!
@AbhayShuklaSilpara 4 роки тому ⁺¹
These videos are quite informative! Please keep making them :)
@ChrisMcCormickAI 4 роки тому
Thanks, Abhay--that's the plan! :)
@mukuljoshi1475 4 роки тому
sir, I appreciate your work and always get motivated with your ease of simplifying things
@ChrisMcCormickAI 4 роки тому ⁺¹
Glad to hear that, thanks!
@RajibDas-kq2uz 4 роки тому ⁺²
Great work.
@ChrisMcCormickAI 4 роки тому
Thanks!
@cahyawirawan 4 роки тому ⁺¹
Thanks for the video, it’s really good. Btw, do you plan to review the longformer? That would be great.
@nikhilz38 4 роки тому ⁺²
Keep making these great video
@ChrisMcCormickAI 4 роки тому
Thanks, you got it! 😊
@2107mann 4 роки тому
We encourage you for producing more videos. Would love to contribute in your productions.
@ProfessionalTycoons 3 роки тому
Good explanation thank you
@leonardopikachu343 4 роки тому
Great work!
@AswinCandra 4 роки тому
How about DistilBERT, Chris? Could we get your analysis about that in your next video? 😁
And where can we get your AlBERT's notebook, please?
@marziehzargari4940 3 роки тому
Respect.
@sak8485 4 роки тому ⁺¹
Great job. Can you please provide me with the links to the notebook.
@chelmartin 4 роки тому
Good evening Chris, great videos thanks. On ALBERT, curious, are you going to post a blog on training ALBERT from scratch on domain specific data. basically also generation of my own vocab...thank again.
@ayushyadav5744 2 роки тому
Hi Chris, I have been exploring Bert/Albert models to classify app reviews into user defined categories. I was looking for a tutorial but couldnt find any, is there any code/notebook which I can refer to which gives a walkthrough of how to implement it. Thank you
@ChrisMcCormickAI 2 роки тому
Hi Ayush,
Yeah, that sounds like multiclass classification!
I've got a free tutorial here on fine-tuning BERT for classification tasks: mccormickml.com/2019/07/22/BERT-fine-tuning/. It's binary classification, but you can just change the 'num_labels' parameter.
My membership site also has another example project specific to multiclass classification, and goes deeper on using BERT for classification in general. www.chrismccormick.ai/membership?
Thanks,
Chris
@dardodel 4 роки тому
Hi Chris, Great materials. Keep it up please. I have a question: One of the key elements of BERT is the "bidirectional" capability. But if words are analyzed independent of each order and the are not fed in to the model one after the other (like RNN) and since we have the word positions, what is the point of dual directionality and how does it add value to BERT? Thanks.
@donnychan1999 2 роки тому
I think it's called bidirectional simply because in the MLM task, the masked token can attend to both directions.
@ugaray96 4 роки тому
And why instead of running the Encoder 12 times, make the Encoder bigger?
@lmbk8957 3 роки тому
I'm surprised that there is not deeper studies about amount of encoder reptition with parameters sharing. Why not make the experience of using the encoder 1 to 12 times and see if it makes sense to use less than 12 encoder.
@ChrisMcCormickAI 3 роки тому ⁺¹
Yeah, you can see the different numbers of layers and embedding sizes that they tried in table 1 of the paper: arxiv.org/pdf/1909.11942.pdf
They report on versions with 12 layers and 24 layers, but nothing less.
I think there's an interesting trend in the BERT research where groups choose model sizes that match the original BERT sizes just for the sake of making an equal comparison. Not as helpful for you and me who just want to use the models!
I've heard that T5 is a project where they tried a lot of different parameter choices...
Also, more recently, in the GitHub BERT repo you can see that they've published a wider variety of model sizes for the original BERT.
@pedro_kangxiao2598 4 роки тому ⁺²
Would you put your Albert tutuorial on Udemy? I am more than happier to pay $14 to learn it but slightly prefer not to buy a book .
@ChrisMcCormickAI 4 роки тому ⁺⁴
Hi Pedro, thanks for the feedback! Just to clarify, do you mean that you’d prefer a video course over a book? I went the book route on this one because it’s kind of a subtopic, and I wanted the content to be easier to update. I can definitely make a course version, though, if there’s interest! Thanks again!
@pedro_kangxiao2598 4 роки тому
@@ChrisMcCormickAI , yeah, I mean a video course version and more than happier to pay $14.99 or more.
Thank you for the awesome materials!
@nsuryapa1 4 роки тому
I have created quantized model on bert, Could you suggest to deserialize the same
@trevorpfiz 4 роки тому
Hey Chris, tremendous work. Had the same findings with 'albert-base-v2' when ALBERT was not learning for me. Two questions, 1. Due to a 500 MB model size restriction currently set by the Google Cloud AI Platform, I explored quantization of base BERT and xxlarge ALBERT to cut the size in half. However, it seems that I can't quantize the huggingface implementation of ALBERT in the first place due to an error, and for the quantized BERT model everything works great until I try to load in the model (reverts to original size and accuracy is slashed). Have you had any luck with quantization? 2. I have a pretty small dataset that I am training on and the model seems to take on a pretty strong bias towards a question mark existing or not when classifying a sentence. This bias is helpful in some cases, but other times if solely a question mark is not included the classification is way off. How should I go about balancing a small dataset to mitigate this bias? Should I get more data, train without question marks, better mix some with and without, duplicate data where one includes a question mark and the other does not? I will continue to play around, but was wondering if you had any intuitions about this. Thanks again for these videos, and thank you for the ALBERT eBook + notebooks, they are awesome resources. Trevor
@ChrisMcCormickAI 4 роки тому
Hi Trevor! Thanks for your support!
I haven’t played with quantization yet, but I’d be interested to, so I added it to my list of Notebook ideas :).
The question mark problem is interesting. I wonder if an ensemble approach would work--one classifier that doesn’t see the question mark, and another that does, and then let the ensemble algorithm decide how much weight to give to each model?
I’ll ask Nick if he has any better suggestions :)
@thaGkillah 4 роки тому
Hey Chris, could you also offer PayPal as a payment for your eBook? Feel free to add the PayPal fees to the price.
@FREELEARNING 4 роки тому
I kind of like your Tutorials about BERT and how you demystify it. But one thing I have to mention is the trick you presented here (here is something, if you want more, go premium and buy the rest of it for ..$). I think That if you keep everything Interesting and Free, Your channel will get you even more views, and everyone will enjoy the interesting information you share.
I know, you will say, I have spent more time preparing that book ..., but even though, I'm not with your idea of making things paid.
@ChrisMcCormickAI 4 роки тому ⁺³
Thanks for the feedback! Working out the right business model is definitely tricky... My current sense of it is that what I teach is too specialized to draw a large enough audience for ads alone to make it profitable.
@mahsa.me. 3 роки тому
Great work.

Наступне

Автоматичне відтворення

BERT Research - Ep. 8 - Inner Workings V - Masked Language Model