LLM-powered Topic Modeling

Поділитися
Вставка
  • Опубліковано 20 січ 2025

КОМЕНТАРІ • 11

  • @54LZ
    @54LZ 9 місяців тому +3

    An interesting and great presentation. Thanks for sharing.

  • @saminakhalid3840
    @saminakhalid3840 18 днів тому

    i think tokenization should be perform before embedding .but in your video Bertopic picture have 1st step is embedding.its confusing for me

    • @windowviews150
      @windowviews150 6 днів тому

      Bertopic starts with embeddings, which create numerical representations of the documents (by default using a sentence embeddings model). It then reduces the dimensions of these representations and clusters the result to identify topics. After clustering, it tokenizes the text to create a document-term matrix. Finally, it uses a class-based TF-IDF, to identify the most representative words for each topic.

    • @saminakhalid3840
      @saminakhalid3840 6 днів тому

      @@windowviews150 thanks for explanation

  • @ColabCorgi
    @ColabCorgi 9 місяців тому +1

    Excellent content. Just what I was looking for! Any tips for how to optimize topic modeling process using gpt models from OpenAI?

    • @giacomocassano1439
      @giacomocassano1439 8 місяців тому

      hello! I'm a researcher in Politecnico di Milano and University of South Australia, I'm trying to do the same thing, maybe we can have a chat!

    • @ColabCorgi
      @ColabCorgi 8 місяців тому

      @giacomocassano1439 sure, how can I reach you

  • @joshed790
    @joshed790 9 місяців тому

    Could you give an example of how to merge this topic modeling with our original dataset for further analysis and report creation

    • @rolandabi2848
      @rolandabi2848 5 місяців тому

      Hey Josh, you found a way to do this?

  • @romanonugia8180
    @romanonugia8180 5 місяців тому +5

    Topic -1 is an outlier and should be ignored