This won't focus on NER, but there is a few-shot NER from the same company called concise_concepts. I have tested it and found it good for some labels and bad for others.
Thank you very much for sharing! Love it. May I ask would it be possible to add more classes to the data ? It would be even more awesome If it could be done for other non-English language models.
@@python-programming Thank you for your instant reply! I have successfully tried it with the "ja_core_news_lg" model, but I could not get a satisfactory result out of the Japanese sentence-transformers model. Do you have any tips for choosing the appropriate models?
This is gold material, thanks so much for putting this out in such a comprehensive way! @Python Tutorials for Digital Humanities In one of your videos you mentioned you do research in History, is that right? I’m curious to know how people are using text classification methods such as this in History research, do you have any material you could point me out to?
Thanks!! Yes, my background is a PhD in medieval history but I mostly work with archival material at Smithsonian and USHMM. A lot of the publications you can find in history with text classification deal with sentiment analysis. You can find articles on Digital Humanities Quarterly and the Oxford Digital Humanities journal.
I was trying to identify "local indicators of climate change impacts" (what changes people observe in their environment -... not city people... :D ) in a database of scientific articles. results are ok. its hard, but it might use as a pre-scan
Hello Dr. Mattingly, do you know if it's possible, to fine-tune a pre trained model? I'm really not familiar with that but I need to tweek a model with a few exceptions. Yours Sincerly
It is! If you want to fine tune a language model that can be done via Gensim or the Transformer library from HuggingFace. If you want to fine tune NER you will have some problems, namely catastrophic forgetting.
Cant wait. Is it multiple or binary classification though? I am hoping there would be a multiple classification as there is an elaborated video you did on binary classification?
This will be binary, but it works for multi-class just as well. Remember when you use few-shot classification, you are not doing traditional supervised learning. Instead, you are using the vectors of a support set (not training set) to then auto-identify similar vector sentences. The similarities are then scored so that you know how much something belongs to a certain category. The more classes that you have, the more support samples you need. I recommend using it to get a quick sense of your data and generate a starting data set quickly to then train a new model via supervised learning. This video is meant to serve as my transition into multi-class classification on this channel =), so those videos should be coming out shortly. We will use spaCy (simpler) and Keras (more advanced). It multiclass text classification will also receive a whole chapter in my forthcoming book on spaCy ML.
Thank you for this tutorial. I tried saving the trained model using nlp.to_disk('D:/ABC'). But when I load it back using spacy.load('D:/ABC') in a fresh Jupyter Notebook, I get the error "[E002] Can't find factory for 'text_categorizer' for language English (en). This usually happens when spaCy calls `nlp.create_pipe` with a custom component name that's not registered on the current language class. ...". I am still in the same conda environment so I can't be missing dependencies. What is causing this problem?
That is a great question. Yes, though GPT 4 is better at few shot than this approach. I still think this is useful for getting a quick classifier up and running locally to help annotating.
Thank you very much for this useful video. This is exactly what I need. I tried it with real data, but I get this warning message, what should I do? UserWarning: The least populated class in y has only 1 members, which is less than n_splits=2.
This is a really good tutorial Thank you! I have not been able to get it running so far. When I attempt to "nlp.add_pipe( ) " on the text_categorizer, the kernel crashes and restarts. Any clue as to why this would be happening? I have a fresh environment with spacy and the classy_classification newly installed.
Thank you very much for this video, was looking for something exactly like this !! I was wondering if there is any way to save the model config on the disk once the pipe with support samples was added, do you have any ideas on that?
Repo: github.com/wjbmattingly/fewshot-text
Best text-related ML channel on youtube
Thank you so much!
Oh, really? Did you manage to see something else?
You are reading my mind! Looking forward to this!
Awesome! I hope you like it!
Thanks for making this vid, been learning Python for a bit and this stuff makes Python shine!
i am just searching for best NER algorithms since last two dasy for my usecase can't wait to see what you have it here
This won't focus on NER, but there is a few-shot NER from the same company called concise_concepts. I have tested it and found it good for some labels and bad for others.
Good knowledge, shared wonderfully. Looks like a great module. Now thinking of all the applications in works of English literature. thanks!
Thanks!
Thank you very much for sharing! Love it.
May I ask would it be possible to add more classes to the data ? It would be even more awesome If it could be done for other non-English language models.
Yes it will be possible to add other classes and you can use any language model on hugging face
@@python-programming Thank you for your instant reply!
I have successfully tried it with the "ja_core_news_lg" model, but I could not get a satisfactory result out of the Japanese sentence-transformers model. Do you have any tips for choosing the appropriate models?
@@nguyenngochai6245 no problem! I will test it out today
This is gold material, thanks so much for putting this out in such a comprehensive way!
@Python Tutorials for Digital Humanities In one of your videos you mentioned you do research in History, is that right? I’m curious to know how people are using text classification methods such as this in History research, do you have any material you could point me out to?
Thanks!! Yes, my background is a PhD in medieval history but I mostly work with archival material at Smithsonian and USHMM. A lot of the publications you can find in history with text classification deal with sentiment analysis. You can find articles on Digital Humanities Quarterly and the Oxford Digital Humanities journal.
I was trying to identify "local indicators of climate change impacts" (what changes people observe in their environment -... not city people... :D ) in a database of scientific articles. results are ok. its hard, but it might use as a pre-scan
That is really interesting!
Hello Dr. Mattingly, do you know if it's possible, to fine-tune a pre trained model? I'm really not familiar with that but I need to tweek a model with a few exceptions.
Yours Sincerly
It is! If you want to fine tune a language model that can be done via Gensim or the Transformer library from HuggingFace. If you want to fine tune NER you will have some problems, namely catastrophic forgetting.
Cant wait. Is it multiple or binary classification though? I am hoping there would be a multiple classification as there is an elaborated video you did on binary classification?
This will be binary, but it works for multi-class just as well. Remember when you use few-shot classification, you are not doing traditional supervised learning. Instead, you are using the vectors of a support set (not training set) to then auto-identify similar vector sentences. The similarities are then scored so that you know how much something belongs to a certain category.
The more classes that you have, the more support samples you need. I recommend using it to get a quick sense of your data and generate a starting data set quickly to then train a new model via supervised learning.
This video is meant to serve as my transition into multi-class classification on this channel =), so those videos should be coming out shortly. We will use spaCy (simpler) and Keras (more advanced). It multiclass text classification will also receive a whole chapter in my forthcoming book on spaCy ML.
Thank you for this tutorial. I tried saving the trained model using nlp.to_disk('D:/ABC'). But when I load it back using spacy.load('D:/ABC') in a fresh Jupyter Notebook, I get the error "[E002] Can't find factory for 'text_categorizer' for language English (en). This usually happens when spaCy calls `nlp.create_pipe` with a custom component name that's not registered on the current language class. ...". I am still in the same conda environment so I can't be missing dependencies. What is causing this problem?
Great video, thanks a lot. Do you recommend any models in spanish besides those already available in spacy?
No problem! It depends on what you are trying to do, there are some great BERT models for Spanish. You can find them on HuggingFace's website.
@@python-programming Great, thanks a lot, your channel is awesome.
@@victordeleon9988 Thanks!!
great stuff!
Thanks!
Is this still relevant comparing to using gpt for classification?
That is a great question. Yes, though GPT 4 is better at few shot than this approach. I still think this is useful for getting a quick classifier up and running locally to help annotating.
@@python-programming thank you sir
Is there a way to utilize this model without having to define what the keywords are but simply to provide a list of them without any definition?
Thank you very much for this useful video. This is exactly what I need.
I tried it with real data, but I get this warning message, what should I do?
UserWarning: The least populated class in y has only 1 members, which is less than n_splits=2.
Can you paste what your support data dictionary looks like?
@@python-programming drive.google.com/file/d/1WcXuI2a7x_EvTreG5GWOE3lyj3Y9CAPc/view?usp=sharing
thank you
This is a really good tutorial Thank you!
I have not been able to get it running so far. When I attempt to "nlp.add_pipe( ) " on the text_categorizer, the kernel crashes and restarts. Any clue as to why this would be happening? I have a fresh environment with spacy and the classy_classification newly installed.
Thanks! Hmmm that is odd. What is your OS? Mind DMing me on Twitter with some pics?
@@python-programming Sent. Thanks for looking at this. Will be really helpful.
@@CoreyMalcom no problem! I am in the middle of traveling. Will try and respond tomorrow
Thank you very much for this video, was looking for something exactly like this !! I was wondering if there is any way to save the model config on the disk once the pipe with support samples was added, do you have any ideas on that?
Cool! I'll use it in my trashy ai to make it less trashy 🤣 to make it understand intentions