What's the advantage of using spacy as oppose to having a csv of ticker names and comparing it to the data scraped from the Reddit API? Is spacy faster and/or efficient?
Good question! It depends really, spaCy will be slower for sure, as there is a lot more complexity under the hood - but it will also pick up on different versions of the same organization name (TSLA, $TSLA, Tesla, Tesla Motors) and differentiate similar words/names (Nikola Tesla), whereas a rule-based approach (the CSV) would struggle with that. After that, however, we need to build out a rule-based/intelligent process for compiling all of the different versions of the organization names into one - which is something I want to explore, but I would imagine a simple 'similarity' match would be pretty effective - although I'm sure there are methods built specifically for this too :)
@@jamesbriggs Ahh okay, that makes a lot of sense. I was thinking about testing each. Thank you for the explanation, and thank you for making these kinds of videos :D
@@egomalego definitely try testing each if you have the time - before NER I'd been relying on rule-based stuff / regex, and it still works well :) Thankyou for watching!
This proved really difficult for me. I have the csv files of ticker names , NASDAQ , AMEX, NYSE. Then compared each comment with help of PRAW reddit webscraping module. I had a set of rules in order to identifying a stock as a stock symbol/ticker name. I did manage to obtain a reasonable list of the most mentioned stocks but still there was a lot of incorrectly identified stocks. I had to in the end add a list to exceptions manually, which is not a really nice solution. For example there exist ticker names such as: ONE, GO, OPEN etc really commonly used words in sentences in the comment section on WSB reddit. I started to understand I need to use machine learning/AI and found spacy. Thank you for making this video :)
@@Alex-costanza I had the exact same problem. I decided, in the end, to go with Spacy and train my own model on data that would make sense for my project. The default model is pretty accurate, but I am only using it for organizations, and I'm looking to only grab ticker symbols.
One of the best UA-cam video. Thanks James!
Super happy you think so, thanks Sajid!
Many thanks. Super helpful!
What's the advantage of using spacy as oppose to having a csv of ticker names and comparing it to the data scraped from the Reddit API? Is spacy faster and/or efficient?
Good question! It depends really, spaCy will be slower for sure, as there is a lot more complexity under the hood - but it will also pick up on different versions of the same organization name (TSLA, $TSLA, Tesla, Tesla Motors) and differentiate similar words/names (Nikola Tesla), whereas a rule-based approach (the CSV) would struggle with that.
After that, however, we need to build out a rule-based/intelligent process for compiling all of the different versions of the organization names into one - which is something I want to explore, but I would imagine a simple 'similarity' match would be pretty effective - although I'm sure there are methods built specifically for this too :)
@@jamesbriggs Ahh okay, that makes a lot of sense. I was thinking about testing each. Thank you for the explanation, and thank you for making these kinds of videos :D
@@egomalego definitely try testing each if you have the time - before NER I'd been relying on rule-based stuff / regex, and it still works well :)
Thankyou for watching!
This proved really difficult for me. I have the csv files of ticker names , NASDAQ , AMEX, NYSE. Then compared each comment with help of PRAW reddit webscraping module. I had a set of rules in order to identifying a stock as a stock symbol/ticker name. I did manage to obtain a reasonable list of the most mentioned stocks but still there was a lot of incorrectly identified stocks. I had to in the end add a list to exceptions manually, which is not a really nice solution. For example there exist ticker names such as: ONE, GO, OPEN etc really commonly used words in sentences in the comment section on WSB reddit. I started to understand I need to use machine learning/AI and found spacy. Thank you for making this video :)
@@Alex-costanza I had the exact same problem. I decided, in the end, to go with Spacy and train my own model on data that would make sense for my project. The default model is pretty accurate, but I am only using it for organizations, and I'm looking to only grab ticker symbols.