Yes! Please, definitely make a second part. I teach in the Humanities (college literature and creative writing classes), and I'm actively searching for tools I can use for creative experiments with texts.
46:00 # Print tokens and their part of speech print("Tokens and their POS tags:") for token in doc: print(f"{token.text}: {token.pos_}") print(' Sentences:') for sent in doc.sents: print(sent) # Print named entities print(" Named Entities:") for ent in doc.ents: print(f"{ent.text} ({ent.label_})")
45:04 *Named Entity Recognition:* A named entity is a real object that you can refer to by a proper name. It can be a person, organization, location, or other entity. Named entities are important in NLP because they reveal the place or organization the user is talking about. doc = nlp(u'I have flown to Islamabad. Now I am flying to Lahore.') for token in doc: if token.ent_type != 0: # If the ent_type attribute of a token is not set to 0 print(token.text, token.ent_type_) # then the token is a named entity
doc = nlp(u'A severe storm hit the beach. It started to rain.') for sent in doc.sents: print([sent[i] for i in range(len(sent))]) for sent in doc.sents: print([word for word in sent]) print([doc[i] for i in range(len(doc))]) # Check if the first word in the second sentence of the text is a pronoun for i, sent in enumerate(doc.sents): if i == 1 and sent[0].pos_ == 'PRON': print('The second sentence begins with a pronoun.') # Check how many sentences in the text end with a verb counter = 0 for sent in doc.sents: if sent[-2].pos_ == 'VERB': counter += 1 print(f'{counter} sentence(s) in the document end with a verb.')
**Named Entity Recognition**: A named entity is a real object that you can refer to by a proper name. It can be a person, organization, location, or other entity. Named entities are important in NLP because they reveal the place or organization the user is talking about. doc = nlp(u'I have flown to Islamabad. Now I am flying to Lahore.') for token in doc: if token.ent_type != 0: # If the ent_type attribute of a token is not set to 0 print(token.text, token.ent_type_) # then the token is a named entity
doc = nlp(u'A severe storm hit the beach. It started to rain.') for sent in doc.sents: print([sent[i] for i in range(len(sent))]) for sent in doc.sents: print([word for word in sent]) print([doc[i] for i in range(len(doc))]) # Check if the first word in the second sentence of the text is a pronoun for i, sent in enumerate(doc.sents): if i == 1 and sent[0].pos_ == 'PRON': print('The second sentence begins with a pronoun.') # Check how many sentences in the text end with a verb counter = 0 for sent in doc.sents: if sent[-2].pos_ == 'VERB': counter += 1 print(f'{counter} sentence(s) in the document end with a verb.')
I’ve come back to this video several times. The ONLY tutorial I’ve seen which walks through the whole process . The Python Tutorials for the digital humanities videos are also great. I am focused on biomedical text, but text is text when you are trying to get started.
doc = nlp(u'A severe storm hit the beach. It started to rain.') for sent in doc.sents: print([sent[i] for i in range(len(sent))]) for sent in doc.sents: print([word for word in sent]) print([doc[i] for i in range(len(doc))]) # Check if the first word in the second sentence of the text is a pronoun for i, sent in enumerate(doc.sents): if i == 1 and sent[0].pos_ == 'PRON': print('The second sentence begins with a pronoun.') # Check how many sentences in the text end with a verb counter = 0 for sent in doc.sents: if sent[-2].pos_ == 'VERB': counter += 1 print(f'{counter} sentence(s) in the document end with a verb.')
Thank you Dr William for taking me through such wonderful journey on NLP - it was my first learning on this area of python application and i found it quite useful and excited to do some more. Looking forward to having your part 2 soon!
Definitely important to dig into the .similarity() output before using it in one's own work. One of its flaws is that it cares too much about the number of words in the spans being compared. For example: print(nlp2("fries").similarity(nlp2("burgers"))) = .65 print(nlp2("fries").similarity(nlp2("hamburgers"))) = .58 print(nlp2("fries").similarity(nlp2("ham burgers"))) = .70 print(nlp2("french fries").similarity(nlp2("hamburgers"))) = .46 print(nlp2("french fries").similarity(nlp2("ham burgers"))) = .64 Also, I find that the small model correctly identifies West Chestertenfieldville as a GPE without modification, and I find that nlp.add_pipe("entity_ruler") does not add of the pipeline-description we see via nlp.analyze_pipes(). Rather, it seems that element of this description is in alphabetical order, and every nested sub-element is also in alphabetical order. I suspect this does not say anything about the true order of the pipeline.
@@python-programming really looking forward to the next video. One topic I have not seen you address is the question of tools for Annotation. When working in specialized language domains, extra training of models is a key step. As a newcomer, I have not yet found a process compatible with Spacy which is reasonably efficient. Prodigy?
We can extract noun chunks by iterating over the nouns in the sentence and finding the syntactic children for each noun to form a chunk. doc = nlp(u'The quick brown fox jumps over the lazy dog.') '''for chunk in doc.noun_chunks: # Regular method print(chunk)''' for token in doc: # Manual method if token.pos_ == 'NOUN': chunk = '' for w in token.children: if w.pos_ == 'DET' or w.pos_ == 'ADJ': chunk += w.text + ' ' chunk += token.text print(chunk)
Thank you for the great video! When I run the most_similar method, copying the code on your notebook, I end up receiving a complete set of differemt words, some unrelated to the word and some in other languages. Example: country gave me ['country-0,467', 'nationâ\x80\x99s', 'countries-', 'continente', 'Carnations', 'pastille', 'бесплатно', 'Argents', 'Tywysogion', 'Teeters'] Can somebody help me understand why this is happening?
Same here. Curious but...maybe the transformers/models (not sure which) are retrained thus giving us a different set of words? Hopefully someone can answer this!
You must clone the repository first. open the website in the description, click on the github button on the top right (pretty hard to find tbh) will link to the repository clone the repository then you can run the existing notebook file or create a new one
Thanks Great course and I love how easy and smooth the explanation is. Moreover I like how explaining each step before diving into it is really making the understanding easier for us to follow thanks a lot. BTW I've spent some time looking for the github account and repo related to this video here is it if anyone needs it to begin following the video, ENJOY...
The Doc object’s **doc.sents** property lets us separate a text into its individual sentences: doc = nlp(u'A severe sand storm hit the Gobi desert. It started to rain.') for sent in doc.sents: print([sent[i] for i in range(len(sent))])
Everyone seemed to be asking for part 2, but this coverage is good enough - so good that I don't think it deserved a part 2, otherwise a large part is going to be lots of repetition. I will keep exploring deeper based on this video itself.
Where’s part 2!!! If there’s time in part 2, I would definitely be interested to know how to train ML to help with research and literature reviews as an example
Thank you very much for making this video. I want to create my own corpus to analyze data. But as a newbie to Python, I found it really hard to start without a clear direction. Looking forward to Part 2!
im getting an error early on at 22:40 for opening up the first text file --------------------------------------------------------------------------- FileNotFoundError Traceback (most recent call last) in ----> 1 with open ("data/wiki_us.txt", "r") as f: 2 text = f.read() FileNotFoundError: [Errno 2] No such file or directory: 'data/wiki_us.txt' anyone know why? i am running jupyter notebooks via anaconda on a 2021 MacBook pro M1
Lol. spaCy's most recent build as of Feb 2, 2022 does properly identify West Chestertenfieldville as a GPE. ua-cam.com/video/dIUTsFT2MeQ/v-deo.html EDIT: Just finished the course - A phenomenal piece of work. Thanks so much for doing this. I can tell you put in an incredible amount of time and effort to do this and you provide it so graciously for free. It is so much clearer than spacy's documentation. They have a new spacy 101 so I'll give that a go now to cement this all in the nogin. And yes, eagerly awaiting part 2.
Awesome content there Dr. William. I was really hyped during the series and every aspects of spaCy you've described perfectly. Now I'm interested on ML aspect of spaCY and It'd be great if you come with ML aspect of spaCy.
@@andrijor indeed! I am still working on it. Between the textbook and the video it takes a while to make. I am hoping to have it ready in early January.
Problems: 1) at minute 1:42:23= It returns me an empty list even if I did the very same code 2) Minute 2:14:32= I don't understand why nlp doesn't recognize and returns 'Mary' as entity. I'm using the same "en_core_web_sm"
i found this to be an excellent tutorial - very clear, great examples and thorough. thank you for sharing this and i look forward to seeing you continue with another covering machine learning in spacy.
This is very helpful. I am very new to Machine Learning and NLP. I am in a situation where I have thousands of documents which don't always have correct spellings. I have to analyze these documents to look for trends related to parts failure especially if the failure has resulted in death or injury. Ideally like to learn from the data that can inform the future failures before there is a death or injury. Can SpaCy help with this?
Hi all, for some reason I get a different output from the script at 57:35: ['POVERTY', 'inner-city', 'Poverty', 'INTERSECT', 'INEQUALITY', 'Inequality', 'ILLITERACY', 'illiteracy', 'handicaps', 'poorest'] did something go wrong?
I get the same thing: ['POVERTY', 'inner-city', 'Poverty', 'INTERSECT', 'INEQUALITY', 'Inequality', 'ILLITERACY', 'illiteracy', 'handicaps', 'poorest']
Yes! Please, definitely make a second part. I teach in the Humanities (college literature and creative writing classes), and I'm actively searching for tools I can use for creative experiments with texts.
Check out Microsft Power Automate AI Builder
@@NickWindham hey is the second part out yet?
hey, I`m not getting the expected output at 1:57:26. It`s showing KeyError: 0. Can you help me with that?
Where do you teach?
Bangers one after another. This channel is a treasure.
46:00
# Print tokens and their part of speech
print("Tokens and their POS tags:")
for token in doc:
print(f"{token.text}: {token.pos_}")
print('
Sentences:')
for sent in doc.sents:
print(sent)
# Print named entities
print("
Named Entities:")
for ent in doc.ents:
print(f"{ent.text} ({ent.label_})")
45:04
*Named Entity Recognition:*
A named entity is a real object that you can refer to by a proper name. It can be a person, organization, location, or other entity. Named entities are important in NLP because they reveal the place or organization the user is talking about.
doc = nlp(u'I have flown to Islamabad. Now I am flying to Lahore.')
for token in doc:
if token.ent_type != 0: # If the ent_type attribute of a token is not set to 0
print(token.text, token.ent_type_) # then the token is a named entity
Clicked this video by accident but got hypnotised by the shirt and now I'm learning Python.
doc = nlp(u'A severe storm hit the beach. It started to rain.')
for sent in doc.sents:
print([sent[i] for i in range(len(sent))])
for sent in doc.sents:
print([word for word in sent])
print([doc[i] for i in range(len(doc))])
# Check if the first word in the second sentence of the text is a pronoun
for i, sent in enumerate(doc.sents):
if i == 1 and sent[0].pos_ == 'PRON':
print('The second sentence begins with a pronoun.')
# Check how many sentences in the text end with a verb
counter = 0
for sent in doc.sents:
if sent[-2].pos_ == 'VERB':
counter += 1
print(f'{counter} sentence(s) in the document end with a verb.')
50 minutes in and it is already the best practical explanation of how spaCy works.
Best Helpline for those who really want to learn NLP with ease and for free , can't wait for part 2
**Named Entity Recognition**:
A named entity is a real object that you can refer to by a proper name. It can be a person, organization, location, or other entity. Named entities are important in NLP because they reveal the place or organization the user is talking about.
doc = nlp(u'I have flown to Islamabad. Now I am flying to Lahore.')
for token in doc:
if token.ent_type != 0: # If the ent_type attribute of a token is not set to 0
print(token.text, token.ent_type_) # then the token is a named entity
this is absurd, opened yt for NLP videos and it was uploaded 1 sec ago.
Happened to me for a deep learning course.
@@avnishpanwar9502 this channel is a boon
Destiny
I started building a personal NLP agent and this was immediately recommended
Wild
doc = nlp(u'A severe storm hit the beach. It started to rain.')
for sent in doc.sents:
print([sent[i] for i in range(len(sent))])
for sent in doc.sents:
print([word for word in sent])
print([doc[i] for i in range(len(doc))])
# Check if the first word in the second sentence of the text is a pronoun
for i, sent in enumerate(doc.sents):
if i == 1 and sent[0].pos_ == 'PRON':
print('The second sentence begins with a pronoun.')
# Check how many sentences in the text end with a verb
counter = 0
for sent in doc.sents:
if sent[-2].pos_ == 'VERB':
counter += 1
print(f'{counter} sentence(s) in the document end with a verb.')
I’ve been watching Dr. Mattingly’s other videos and they’re great.
I’ve come back to this video several times. The ONLY tutorial I’ve seen which walks through the whole process . The Python Tutorials for the digital humanities videos are also great. I am focused on biomedical text, but text is text when you are trying to get started.
I was searching for Spacy tutorials yesterday, and FCC uploaded it, thank you 💝. Interested in part 2.
Hi, can you tell where can i find the repository for the data?
doc = nlp(u'A severe storm hit the beach. It started to rain.')
for sent in doc.sents:
print([sent[i] for i in range(len(sent))])
for sent in doc.sents:
print([word for word in sent])
print([doc[i] for i in range(len(doc))])
# Check if the first word in the second sentence of the text is a pronoun
for i, sent in enumerate(doc.sents):
if i == 1 and sent[0].pos_ == 'PRON':
print('The second sentence begins with a pronoun.')
# Check how many sentences in the text end with a verb
counter = 0
for sent in doc.sents:
if sent[-2].pos_ == 'VERB':
counter += 1
print(f'{counter} sentence(s) in the document end with a verb.')
I can't believe such good content is for free, thank you.
Thank you so much. The best course on SpaCy I have founded. Please make Part Two! We are waiting for it!
so much value! thanks for making this material available for free. Incredible value
This video lesson was great. Looking forward to see the second part.
Thank you Dr William for taking me through such wonderful journey on NLP - it was my first learning on this area of python application and i found it quite useful and excited to do some more. Looking forward to having your part 2 soon!
where can I find text book
Thanks for this incredible class and textbook, it was very helpful. Greetings from Brazil
Definitely interested in part2 of this course
I'm definitely interested in the ML aspects of spaCy) Thank you very much for the video!
Definitely important to dig into the .similarity() output before using it in one's own work. One of its flaws is that it cares too much about the number of words in the spans being compared. For example:
print(nlp2("fries").similarity(nlp2("burgers"))) = .65
print(nlp2("fries").similarity(nlp2("hamburgers"))) = .58
print(nlp2("fries").similarity(nlp2("ham burgers"))) = .70
print(nlp2("french fries").similarity(nlp2("hamburgers"))) = .46
print(nlp2("french fries").similarity(nlp2("ham burgers"))) = .64
Also, I find that the small model correctly identifies West Chestertenfieldville as a GPE without modification, and I find that nlp.add_pipe("entity_ruler") does not add of the pipeline-description we see via nlp.analyze_pipes(). Rather, it seems that element of this description is in alphabetical order, and every nested sub-element is also in alphabetical order. I suspect this does not say anything about the true order of the pipeline.
just finished 1/3 and I have to say very good introduction. thanks a lot on the sharing
Also an historian looking for ways to extract info from old documents. very looking forward to the second part.
I am trying to have it out in early January.
@@python-programming really looking forward to the next video. One topic I have not seen you address is the question of tools for Annotation. When working in specialized language domains, extra training of models is a key step. As a newcomer, I have not yet found a process compatible with Spacy which is reasonably efficient. Prodigy?
Where can I access the textbook? Can someone let me know! Would really appreciate it!
Excelent!!! The best of the best!!!! Please do the second showing how to train the model.
Super interessant de meer de diepte in te gaan. Met andere woorden stukje geschiedenis les.💪💪👍
Great video. Please make the second part ASAP. Keep up the good work.
We can extract noun chunks by iterating over the nouns in the sentence and finding the syntactic children for each noun to form a chunk.
doc = nlp(u'The quick brown fox jumps over the lazy dog.')
'''for chunk in doc.noun_chunks: # Regular method
print(chunk)'''
for token in doc: # Manual method
if token.pos_ == 'NOUN':
chunk = ''
for w in token.children:
if w.pos_ == 'DET' or w.pos_ == 'ADJ':
chunk += w.text + ' '
chunk += token.text
print(chunk)
Thank you for the great video!
When I run the most_similar method, copying the code on your notebook, I end up receiving a complete set of differemt words, some unrelated to the word and some in other languages. Example: country gave me ['country-0,467', 'nationâ\x80\x99s', 'countries-', 'continente', 'Carnations', 'pastille', 'бесплатно', 'Argents', 'Tywysogion', 'Teeters']
Can somebody help me understand why this is happening?
Same here. Curious but...maybe the transformers/models (not sure which) are retrained thus giving us a different set of words? Hopefully someone can answer this!
You're a wizard, W.J.B. Mattingly! Sincerely yours, a stan
Why I cannot run the following?
I cannot find where the repository contain.
with open ("data/wiki_us.txt", "r") as f:
text = f.read()
did you find a solution ?
@@furkanfiratli7908 create your own wiki_us.txt, open notepad copy the text from the website and paste
@@bbppchan did you encounter the error:" 'gbk' codec can't decode byte 0x93 in position 1186: illegal multibyte sequence " ?
@@bernardmontgomery3859 no
You must clone the repository first.
open the website in the description, click on the github button on the top right (pretty hard to find tbh) will link to the repository
clone the repository then you can run the existing notebook file or create a new one
Has the Textbook been uploaded somewhere else? The link isn't working
Hello , do u find?
@@ziya5811 No luck so far
Thanks Great course and I love how easy and smooth the explanation is. Moreover I like how explaining each step before diving into it is really making the understanding easier for us to follow thanks a lot. BTW I've spent some time looking for the github account and repo related to this video here is it if anyone needs it to begin following the video, ENJOY...
where?`
Thank you 🙏 , Interested in part 2
Best Helpline for those who really want to learn
Thanks
Thank you! interested in part 2.
The Doc object’s **doc.sents** property lets us separate a text into its individual sentences:
doc = nlp(u'A severe sand storm hit the Gobi desert. It started to rain.')
for sent in doc.sents:
print([sent[i] for i in range(len(sent))])
Please create Part 2!!!!! Part one was 🔥🔥🔥🔥🔥🔥🔥
Thanks!
Superb, Waiting for part 2 with thanks🙏👍
Awesome Video! Can't wait for part 2
crossing my fingers 🤞🤞🤞
Everyone seemed to be asking for part 2, but this coverage is good enough - so good that I don't think it deserved a part 2, otherwise a large part is going to be lots of repetition. I will keep exploring deeper based on this video itself.
There are also a lot of other resources available (and free), if u have time to go through: course.spacy.io/en/
@@tthtlc thank you so much! do you have other resources like that?
Great Tutorial. Learnt a lot about SpaCy fundamentals.
(1:35:44) Matcher
03/22/2023 2:21:22
This video is fantastic! I would really appreciate part 2
Thank you for the explanations. They are very clear and relevant. Excellent video.
Love the work you are doing. Many thanks from India
Please make the second video about machine learning! this was so helpful
can anyone give me the link of the repo that is being used
Hi, quick question, where the repo mentioned in 22:38 is located?
Very much interested in the machine learning aspect of SpaCy. Thank you, this course was informative and handy.
Where’s part 2!!! If there’s time in part 2, I would definitely be interested to know how to train ML to help with research and literature reviews as an example
very simple and easy to understand thank you for this
Let's do the second part of it 🙂
Thanks so much Dr. Mattingly. Where can we find the machine learning related video?
Outstanding overview of Spacy, can't wait for part 2! Thank you so much.
is part 2 out?
Thank you Dr. William. Looking forward for a part two.
Thank you very much for making this video. I want to create my own corpus to analyze data. But as a newbie to Python, I found it really hard to start without a clear direction. Looking forward to Part 2!
Very very helpful stuff! 31 minutes in the video and I'm already using spacy for my own analyses! Thank you so much!
im getting an error early on at 22:40 for opening up the first text file
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
in
----> 1 with open ("data/wiki_us.txt", "r") as f:
2 text = f.read()
FileNotFoundError: [Errno 2] No such file or directory: 'data/wiki_us.txt'
anyone know why? i am running jupyter notebooks via anaconda on a 2021 MacBook pro M1
This is a great NLP tutorial. I have checked out a few others but this one here takes the cake. Thanks for the excellent resource!
Really fascinating and accessible. Thank you.
Lol. spaCy's most recent build as of Feb 2, 2022 does properly identify West Chestertenfieldville as a GPE. ua-cam.com/video/dIUTsFT2MeQ/v-deo.html
EDIT: Just finished the course - A phenomenal piece of work. Thanks so much for doing this. I can tell you put in an incredible amount of time and effort to do this and you provide it so graciously for free. It is so much clearer than spacy's documentation. They have a new spacy 101 so I'll give that a go now to cement this all in the nogin. And yes, eagerly awaiting part 2.
Awesome content there Dr. William. I was really hyped during the series and every aspects of spaCy you've described perfectly. Now I'm interested on ML aspect of spaCY and It'd be great if you come with ML aspect of spaCy.
Waiting for the second part ! This tutorial is perfect , thank you so much !
This tutorial is so freaking inspiring to me. NLP is so exciting and I'd love to integrate it with machine learning!!!!
I'd be 100% down to watch a tutorial with part 2!!!!!
Thanks! Good to know. I think I will start planning it this week.
@@python-programming Hi hi, any updates on part 2? I hope everything's ok :)
@@andrijor indeed! I am still working on it. Between the textbook and the video it takes a while to make. I am hoping to have it ready in early January.
@@python-programming Looking forward to it! 😊
This video is very useful for me. Thanks for always bringing the great video. Mad respect from me
Great work! Really a good video to learn using spaCy.
Thanks for your awesome introduction :). Would love to have your next course on using spaCy for ML.
This is ready awesome teaching video. I feel highly interesting in the part two video.
Where i can get the datasets ?
Excellent tutorial. Straight into the subject. Hats off to you !!
Such a nice video, 2nd part please!!
where can we find these datasets?
Enjoyed This video waiting for part2.
I did not get the INFO: confirmations at 16:23 when running import spacy, any hints?
Eagerly waiting for the second part.
This is super awesome tutorial. Just what I need. Thanks!
Thanks for the depth with this library sir
Thank you so much! Such a wonderful video.
Problems:
1) at minute 1:42:23= It returns me an empty list even if I did the very same code
2) Minute 2:14:32= I don't understand why nlp doesn't recognize and returns 'Mary' as entity. I'm using the same "en_core_web_sm"
hey, I`m not getting the expected output at 1:57:26. It`s showing KeyError: 0. Can anybody help me with that?
This is very helpful, thank you!
Hi and thank you very much for your tutorial. I really enjoyed it and looking forward to the second part of the tutorial
Hi Guys, could someone inform me, which is the hardest to learn/master: Computer Vision OR NLP ?
Yes please for a part 2 on Machine Learning with Spacy!
Can't wait for Part 2
i found this to be an excellent tutorial - very clear, great examples and thorough. thank you for sharing this and i look forward to seeing you continue with another covering machine learning in spacy.
Has anyone been able to find the source material for this? He's referencing the text and says it's in the description but it is not...
This is very helpful. I am very new to Machine Learning and NLP. I am in a situation where I have thousands of documents which don't always have correct spellings. I have to analyze these documents to look for trends related to parts failure especially if the failure has resulted in death or injury. Ideally like to learn from the data that can inform the future failures before there is a death or injury. Can SpaCy help with this?
I would like the ML version too. So looking forward to seeing that
Thank you so Much. It is very helpful
Hi all,
for some reason I get a different output from the script at 57:35:
['POVERTY', 'inner-city', 'Poverty', 'INTERSECT', 'INEQUALITY', 'Inequality', 'ILLITERACY', 'illiteracy', 'handicaps', 'poorest']
did something go wrong?
I get the same thing: ['POVERTY', 'inner-city', 'Poverty', 'INTERSECT', 'INEQUALITY', 'Inequality', 'ILLITERACY', 'illiteracy', 'handicaps', 'poorest']
How do I reach the data folder that you work with?
Thanks!
Did you ever find out ? I get an error because obviously we don't have that data file in the same file path
I don't think we can🤗
Looking forward for Machine Learning aspects of Spacy.
Waiting for 2nd part sir 👌🙏
That was a very nice explanation and an awesome tutorial. Waiting for the machine learning part.
Where can I find the text book? The link in the description is dead.
super interesting , already quickly subscribe both of the channels (y)
Your tutorials and your UA-cam channel are great. Thanks so much for sharing your knoledge online. So helpful and well made.