Identify Stocks on Reddit with SpaCy (NER in Python)

Поділитися
Вставка
  • Опубліковано 11 жов 2024

КОМЕНТАРІ • 10

  • @AlgoTradingX
    @AlgoTradingX 3 роки тому +2

    One of the best UA-cam video. Thanks James!

    • @jamesbriggs
      @jamesbriggs  3 роки тому

      Super happy you think so, thanks Sajid!

  • @lemuffinity
    @lemuffinity 3 роки тому

    Many thanks. Super helpful!

  • @egomalego
    @egomalego 3 роки тому

    What's the advantage of using spacy as oppose to having a csv of ticker names and comparing it to the data scraped from the Reddit API? Is spacy faster and/or efficient?

    • @jamesbriggs
      @jamesbriggs  3 роки тому +1

      Good question! It depends really, spaCy will be slower for sure, as there is a lot more complexity under the hood - but it will also pick up on different versions of the same organization name (TSLA, $TSLA, Tesla, Tesla Motors) and differentiate similar words/names (Nikola Tesla), whereas a rule-based approach (the CSV) would struggle with that.
      After that, however, we need to build out a rule-based/intelligent process for compiling all of the different versions of the organization names into one - which is something I want to explore, but I would imagine a simple 'similarity' match would be pretty effective - although I'm sure there are methods built specifically for this too :)

    • @egomalego
      @egomalego 3 роки тому

      @@jamesbriggs Ahh okay, that makes a lot of sense. I was thinking about testing each. Thank you for the explanation, and thank you for making these kinds of videos :D

    • @jamesbriggs
      @jamesbriggs  3 роки тому +1

      @@egomalego definitely try testing each if you have the time - before NER I'd been relying on rule-based stuff / regex, and it still works well :)
      Thankyou for watching!

    • @Alex-costanza
      @Alex-costanza 3 роки тому +1

      This proved really difficult for me. I have the csv files of ticker names , NASDAQ , AMEX, NYSE. Then compared each comment with help of PRAW reddit webscraping module. I had a set of rules in order to identifying a stock as a stock symbol/ticker name. I did manage to obtain a reasonable list of the most mentioned stocks but still there was a lot of incorrectly identified stocks. I had to in the end add a list to exceptions manually, which is not a really nice solution. For example there exist ticker names such as: ONE, GO, OPEN etc really commonly used words in sentences in the comment section on WSB reddit. I started to understand I need to use machine learning/AI and found spacy. Thank you for making this video :)

    • @egomalego
      @egomalego 3 роки тому

      @@Alex-costanza I had the exact same problem. I decided, in the end, to go with Spacy and train my own model on data that would make sense for my project. The default model is pretty accurate, but I am only using it for organizations, and I'm looking to only grab ticker symbols.