The EASIEST! way to do Text Classification with spaCy and Classy Classification

Python Tutorials for Digital Humanities

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 27 сер 2024

КОМЕНТАРІ • 44

@python-programming 2 роки тому
Repo: github.com/wjbmattingly/fewshot-text
@kosemekars 2 роки тому ⁺³
Best text-related ML channel on youtube
@python-programming 2 роки тому
Thank you so much!
@user-sn5nm5rm3v 2 місяці тому
Oh, really? Did you manage to see something else?
@wdonno 2 роки тому ⁺¹
You are reading my mind! Looking forward to this!
@python-programming 2 роки тому
Awesome! I hope you like it!
@giantdutchviking 11 місяців тому
Thanks for making this vid, been learning Python for a bit and this stuff makes Python shine!
@VitthalGusinge 2 роки тому ⁺²
i am just searching for best NER algorithms since last two dasy for my usecase can't wait to see what you have it here
@python-programming 2 роки тому ⁺¹
This won't focus on NER, but there is a few-shot NER from the same company called concise_concepts. I have tested it and found it good for some labels and bad for others.
@shahidmahmood7252 2 роки тому ⁺¹
Good knowledge, shared wonderfully. Looks like a great module. Now thinking of all the applications in works of English literature. thanks!
@python-programming 2 роки тому
Thanks!
@nguyenngochai6245 2 роки тому ⁺²
Thank you very much for sharing! Love it.
May I ask would it be possible to add more classes to the data ? It would be even more awesome If it could be done for other non-English language models.
@python-programming 2 роки тому ⁺¹
Yes it will be possible to add other classes and you can use any language model on hugging face
@nguyenngochai6245 2 роки тому
@@python-programming Thank you for your instant reply!
I have successfully tried it with the "ja_core_news_lg" model, but I could not get a satisfactory result out of the Japanese sentence-transformers model. Do you have any tips for choosing the appropriate models?
@python-programming 2 роки тому ⁺¹
@@nguyenngochai6245 no problem! I will test it out today
@luiztauffer8513 Рік тому ⁺¹
This is gold material, thanks so much for putting this out in such a comprehensive way!
@Python Tutorials for Digital Humanities In one of your videos you mentioned you do research in History, is that right? I’m curious to know how people are using text classification methods such as this in History research, do you have any material you could point me out to?
@python-programming Рік тому ⁺¹
Thanks!! Yes, my background is a PhD in medieval history but I mostly work with archival material at Smithsonian and USHMM. A lot of the publications you can find in history with text classification deal with sentiment analysis. You can find articles on Digital Humanities Quarterly and the Oxford Digital Humanities journal.
@rf1890 2 роки тому ⁺¹
I was trying to identify "local indicators of climate change impacts" (what changes people observe in their environment -... not city people... :D ) in a database of scientific articles. results are ok. its hard, but it might use as a pre-scan
@python-programming 2 роки тому
That is really interesting!
@Hypothermia1337 2 роки тому ⁺¹
Hello Dr. Mattingly, do you know if it's possible, to fine-tune a pre trained model? I'm really not familiar with that but I need to tweek a model with a few exceptions.
Yours Sincerly
@python-programming 2 роки тому
It is! If you want to fine tune a language model that can be done via Gensim or the Transformer library from HuggingFace. If you want to fine tune NER you will have some problems, namely catastrophic forgetting.
@ezrakassa3472 2 роки тому ⁺²
Cant wait. Is it multiple or binary classification though? I am hoping there would be a multiple classification as there is an elaborated video you did on binary classification?
@python-programming 2 роки тому
This will be binary, but it works for multi-class just as well. Remember when you use few-shot classification, you are not doing traditional supervised learning. Instead, you are using the vectors of a support set (not training set) to then auto-identify similar vector sentences. The similarities are then scored so that you know how much something belongs to a certain category.
The more classes that you have, the more support samples you need. I recommend using it to get a quick sense of your data and generate a starting data set quickly to then train a new model via supervised learning.
This video is meant to serve as my transition into multi-class classification on this channel =), so those videos should be coming out shortly. We will use spaCy (simpler) and Keras (more advanced). It multiclass text classification will also receive a whole chapter in my forthcoming book on spaCy ML.
@DK-rl1sf Рік тому
Thank you for this tutorial. I tried saving the trained model using nlp.to_disk('D:/ABC'). But when I load it back using spacy.load('D:/ABC') in a fresh Jupyter Notebook, I get the error "[E002] Can't find factory for 'text_categorizer' for language English (en). This usually happens when spaCy calls `nlp.create_pipe` with a custom component name that's not registered on the current language class. ...". I am still in the same conda environment so I can't be missing dependencies. What is causing this problem?
@victordeleon9988 2 роки тому ⁺¹
Great video, thanks a lot. Do you recommend any models in spanish besides those already available in spacy?
@python-programming 2 роки тому
No problem! It depends on what you are trying to do, there are some great BERT models for Spanish. You can find them on HuggingFace's website.
@victordeleon9988 2 роки тому ⁺¹
@@python-programming Great, thanks a lot, your channel is awesome.
@python-programming 2 роки тому
@@victordeleon9988 Thanks!!
@maxwellmandela 2 роки тому ⁺¹
great stuff!
@python-programming 2 роки тому
Thanks!
@gangs0846 8 місяців тому ⁺¹
Is this still relevant comparing to using gpt for classification?
@python-programming 8 місяців тому ⁺¹
That is a great question. Yes, though GPT 4 is better at few shot than this approach. I still think this is useful for getting a quick classifier up and running locally to help annotating.
@gangs0846 8 місяців тому ⁺¹
@@python-programming thank you sir
@Filipkasic 2 роки тому
Is there a way to utilize this model without having to define what the keywords are but simply to provide a list of them without any definition?
@youTanod 2 роки тому ⁺¹
Thank you very much for this useful video. This is exactly what I need.
I tried it with real data, but I get this warning message, what should I do?
UserWarning: The least populated class in y has only 1 members, which is less than n_splits=2.
@python-programming 2 роки тому
Can you paste what your support data dictionary looks like?
@youTanod 2 роки тому
@@python-programming drive.google.com/file/d/1WcXuI2a7x_EvTreG5GWOE3lyj3Y9CAPc/view?usp=sharing
@szachynakubie4955 2 роки тому
thank you
@CoreyMalcom Рік тому
This is a really good tutorial Thank you!
I have not been able to get it running so far. When I attempt to "nlp.add_pipe( ) " on the text_categorizer, the kernel crashes and restarts. Any clue as to why this would be happening? I have a fresh environment with spacy and the classy_classification newly installed.
@python-programming Рік тому
Thanks! Hmmm that is odd. What is your OS? Mind DMing me on Twitter with some pics?
@CoreyMalcom Рік тому ⁺¹
@@python-programming Sent. Thanks for looking at this. Will be really helpful.
@python-programming Рік тому
@@CoreyMalcom no problem! I am in the middle of traveling. Will try and respond tomorrow
@lisagilyarovskaya5593 2 роки тому
Thank you very much for this video, was looking for something exactly like this !! I was wondering if there is any way to save the model config on the disk once the pipe with support samples was added, do you have any ideas on that?
@trashyAIguy Рік тому
Cool! I'll use it in my trashy ai to make it less trashy 🤣 to make it understand intentions

Наступне

Автоматичне відтворення

How to Easily Add a Coreference Resolution Model into a spCy Pipeline with Crosslingual Coreference