SetFit: Few Shot Learning for Text Classification

A Helping Hand for LLMs (Retrieval Augmented Generation) - Computerphile

Sentence Transformers - EXPLAINED!

⚡️путін у Монголії: що ЗАГРОЖУЄ країні за ігнорування ордера на арешт

Я уговариваю своего друга выпить Лава Лава

Люди в Курській області просять українську армію захистити їх від російської. ЕКСКЛЮЗИВ ТСН.Тижня

Efficient Few-Shot Learning with Sentence Transformers

HuggingFace

Переглядів 14 992

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 5 вер 2024
Join researchers from Hugging Face, Intel Labs, and UKP for a presentation about their recent work on SetFit, a new framework for few-shot learning with language models.
For more details about the research, check out the blog post: huggingface.co...

КОМЕНТАРІ • 15

@ivanalvarenga7194 5 місяців тому
Great Presentation!
@gigabytechanz9646 Рік тому
Very interesting and useful information! Thanks!
@EkShunya Рік тому
great work
@sanjaychawla2008 Рік тому
This is great. Would you have any performance comparisons between SetFit and deberta (say, v3) on NLI tasks? Also, how many examples are needed to fine tune these models. Thanks
@manabchetia8382 Рік тому
GREAT VIDEO!
Where can i try T FEW?
@andrea-mj9ce Рік тому ⁺²
Can SetFit be used for topic modeling (find the topics that a text deals with)?
@tweak3871 Рік тому ⁺⁴
In an unsupervised setting like LDA, LSA, etc. not really, but if you have topic classes that you have already identified and want to classify them then you could do that pretty easily.
For example you could identify some 30 examples that are talking about the Pennsylvania senate race, train set fit on that small dataset, then run it on a larger stack of news articles, I would expect that to work reasonably well with some iteration on the dataset.
@Hellas11 Рік тому ⁺¹
Hi, may I ask what is the minimum number of texts that should be labelled to perform best, let's say in classifying 1 million short texts?
Also, may I ask how this method would compare to current topic algos, and more specifically BERTopic (using "all-MiniLM-L6-v2" model)? Would SetFit perform better than BERTopic?
Thanks in advance for your time 🙏
@tweak3871 Рік тому ⁺¹
@@Hellas11 Was unfamiliar w/BERTopic before this, just looked it up and it looks simple enough so I think I have an intuition on how well that would work.
So comparing SetFit vs. BertTopic, honestly it really depends on your use case. SetFit at it's core is a few shot classification system, whereas BertTopic is an unsupervised methods that clusters on top of factorized document embeddings embedded by BERT.
So topic modeling in general the goal is to find "topics" that describe the dataset well, most commonly for either for EDA or to be used as features into another model. Perhaps a more modern use would be document tagging for an app of some kind, but in general I don't see topic modeling done all that much nowadays as there is better methods to do what you want in NLP.
w/SetFit, the idea is to already know what it is that you're looking for in a set of texts, build a small dataset (say 16 labels per class) then train a classifier on that dataset.
So '"How would they compare" really depends on your use case, but generally speaking I would say a supervised method like SetFit would enable you to have more control to do whatever it is you're trying to do, especially in NLP.
That aside, "How much data?" I mean the more the merrier always, what I would do is probably label in increments of 8 or 16 for each class, so if you have 2 classes, label 32, run it, see how it's doing, if not good enough label some more, rinse repeat.
In few shot, the quality of your data & labels matters a lot more though, so when evaluating how the model is doing, try to build an intuition on why the model is getting what it's getting wrong, then find data examples for the model to learn whatever it is it's struggling with. Try to give the model "hard" examples, i.e. stuff that the model clearly does not already know, but mix in some easy ones too. Also try to find examples where it's just barely one class or the other, like challenging borderline examples that are still definitely one class or another, this really helps in fewshot. You can also just write your own examples if you're struggling to find good ones.
Also remember to always label a dataset for eval as well.
Finally, vary model size. Start small if you want, but I'd personally jump into colab w/a standard GPU ( you will get a 16gb v100 or a T4 on a free account)
You should be able to train MPnet on any GPU you get on colab, and pretty quickly at that as well.
@rajibahsan6292 Рік тому
hi, If i want to train the model with my own dataset how do i prepare the dataset ? I am passing the train and eval data as dictionary but its not able to read the colnames. how do I prepare my own data to train this model ?
@jacehua7334 Рік тому
for num examples 640 how is it calculated?
@mikael_aldo Рік тому
Can this used for a Regression Task? e.g. comparing answers and calculate the score based on its similarity.
@_luca_marinelli 9 місяців тому
setfit is based on categorical labels, but you can just quantize the regression into classes
@X1011 Рік тому
pineapple and pizza are too distant in flavor space 😋
@CppExpedition Рік тому
huggings face 'emoticon' is so annoying for the excellent presentation. It adds noise to the communication

Наступне

Автоматичне відтворення

SetFit: Few Shot Learning for Text Classification

SetFit: Few Shot Learning for Text Classification

A Helping Hand for LLMs (Retrieval Augmented Generation) - Computerphile

A Helping Hand for LLMs (Retrieval Augmented Generation) - Computerphile

Sentence Transformers - EXPLAINED!

Sentence Transformers - EXPLAINED!

⚡️путін у Монголії: що ЗАГРОЖУЄ країні за ігнорування ордера на арешт

⚡️путін у Монголії: що ЗАГРОЖУЄ країні за ігнорування ордера на арешт

Я уговариваю своего друга выпить Лава Лава

Я уговариваю своего друга выпить Лава Лава

Люди в Курській області просять українську армію захистити їх від російської. ЕКСКЛЮЗИВ ТСН.Тижня

Люди в Курській області просять українську армію захистити їх від російської. ЕКСКЛЮЗИВ ТСН.Тижня

He bought this so I can drive too🥹😭 #tiktok #elsarca

He bought this so I can drive too🥹😭 #tiktok #elsarca

Sentence Transformers and Embedding Evaluation - Nils Reimers - Talking Language AI Ep#3

Sentence Transformers and Embedding Evaluation - Nils Reimers - Talking Language AI Ep#3

Simple Training with the 🤗 Transformers Trainer

Simple Training with the 🤗 Transformers Trainer

CODE SetFit w/ SBERT for Text Classification (Few-Shot Learning) multi-class multi-label (SBERT 44)

CODE SetFit w/ SBERT for Text Classification (Few-Shot Learning) multi-class multi-label (SBERT 44)

GraphRAG: The Marriage of Knowledge Graphs and RAG: Emil Eifrem

GraphRAG: The Marriage of Knowledge Graphs and RAG: Emil Eifrem

SETFIT Few-Shot Learning outperforms GPT-3 | SBERT Text Classification (SBERT 43)

SETFIT Few-Shot Learning outperforms GPT-3 | SBERT Text Classification (SBERT 43)

Intro to Sentence Embeddings with Transformers

Intro to Sentence Embeddings with Transformers

Sentence Transformers (SBERT) with PyTorch: Similarity and Semantic Search

Sentence Transformers (SBERT) with PyTorch: Similarity and Semantic Search

BERT explained: Training, Inference, BERT vs GPT/LLamA, Fine tuning, [CLS] token

BERT explained: Training, Inference, BERT vs GPT/LLamA, Fine tuning, [CLS] token

A Hackers' Guide to Language Models

A Hackers' Guide to Language Models

😳 Отец девушки испортил предложение руки и сердца! Но всё оказалось иначе!.. | Новостничок

😳 Отец девушки испортил предложение руки и сердца! Но всё оказалось иначе!.. | Новостничок

Як ПОТРАПИЛА в миротворець? Росіяни НА ВЕСІЛЛІ. До мене ВЧИНИЛИ насильство / Okay Eva Bar

Як ПОТРАПИЛА в миротворець? Росіяни НА ВЕСІЛЛІ. До мене ВЧИНИЛИ насильство / Okay Eva Bar

СТРІМ ДО ДНЯ ЗНАНЬ З ЛЕВАМИ НА ДЖИПІ

СТРІМ ДО ДНЯ ЗНАНЬ З ЛЕВАМИ НА ДЖИПІ

skibidi toilet 77 (part 2)

skibidi toilet 77 (part 2)

💥Ця битва вирішить все. Чи є рішення, яке зупинить росіян? Покровськ готується. А Дніпро? | Воєнкор

💥Ця битва вирішить все. Чи є рішення, яке зупинить росіян? Покровськ готується. А Дніпро? | Воєнкор

Провел 3 НОЧИ с ПРОКЛЯТЫМИ КУКЛАМИ ! 100 часов в закрытом доме

Провел 3 НОЧИ с ПРОКЛЯТЫМИ КУКЛАМИ ! 100 часов в закрытом доме

❌Разве такое возможно? #story

❌Разве такое возможно? #story

I Took a LUNCHBAR OFF A Poster 🤯 #shorts

I Took a LUNCHBAR OFF A Poster 🤯 #shorts