SPLADE: the first search model to beat BM25

Поділитися
Вставка
  • Опубліковано 7 лис 2024

КОМЕНТАРІ • 59

  • @jamesbriggs
    @jamesbriggs  Рік тому +13

    To install the naver labs splade library you need `pip install git+github.com/naver/splade.git`

  • @JulianHarris
    @JulianHarris Рік тому +16

    Came here curious about SPLADE, discovered a super understandable introduction to transformers and attention networks. Thank you!

    • @jamesbriggs
      @jamesbriggs  Рік тому +2

      I really wanted to get the point across about SPLADE but there was a lot of foundational stuff to cover from sparse vs. dense, transformers, etc - so I'm glad the extra info helped :)

    • @zazouille2264
      @zazouille2264 Рік тому

      Agreed. Great video. Nicely layered.
      Thank you OP

    • @magicofjafo
      @magicofjafo Рік тому

      I agree!

  • @shamaldesilva9533
    @shamaldesilva9533 Рік тому +3

    dude you are a gold mine when it comes to these topics 😍😍 .

  • @alivecoding4995
    @alivecoding4995 Рік тому +1

    Which graphics library do you use for these Transformer illustrations? Are these pre-built assests?

  • @ברקמנחם-ק3ק
    @ברקמנחם-ק3ק Рік тому +1

    Thank you! when using embeddings and asking the model gpt-3.5 some question like "write me some code that use this and that" does the model automaticlly search in the embedding too in order to give the answer?

    • @jamesbriggs
      @jamesbriggs  Рік тому

      gpt 3 doesn't, you need to add a knowledge base to do this, like I do here ua-cam.com/video/rrAChpbwygE/v-deo.html

  • @ArnavJaitly
    @ArnavJaitly Рік тому +1

    James, this is awesome and very relevant to my current project! Thank you for your efforts in putting this together and sharing it, much appreciated!

  • @danrosher6658
    @danrosher6658 Рік тому

    Great talk, thanks James ... Would an alternative to the cosine sim to compare query/doc is to index the tokens and weights for docs (from SPLADE model outputs) , also convert a query to tokens(and weights) , then return docs having the query tokens where the doc weight > query token weight for each token? .. would this work ?

  • @MaheshJha-y3j
    @MaheshJha-y3j Рік тому

    Hello James, the above pinned method for pip install splade is not working and giving error like "error: subprocess-exited-with-error" so, Can you please let what is the issue or what alternate we can use if not this.

  • @williamxion2806
    @williamxion2806 14 днів тому

    Hi james. I know this video is already a year old and there has been a lot of new development, but didn't contriever already outperform BM25 at the time on most benchmarks? I believe Contriever fine tuned on MS MARCO basically outperformed BM25 on everything.

  • @IamalwaysOK
    @IamalwaysOK Рік тому +1

    Hey James, as usual, thanks a ton for your awesome videos! I've got a quick question for you. Have you ever thought about using a knowledge graph alongside SPLADE to expand terms? And is there any way we can embed that knowledge into sparse vectors using transformers? Curious to hear your thoughts on this!

  • @ttharita
    @ttharita 3 місяці тому

    Super informative. Thank you so much!!!

  • @kevon217
    @kevon217 Рік тому

    Great tutorial as always. Your slide animations are next level!

  • @gorgolyt
    @gorgolyt 11 місяців тому

    Great video. But you should link to the SPLADE paper(s). Are you just talking about the original paper here?

  • @avidlearner8117
    @avidlearner8117 Рік тому

    Fantastic content! Especially since I'm building an app and need to find a proper solution for data retrieval....

  • @lutune
    @lutune Рік тому +2

    Have you built any of these apps? Your content is so great, as you get into more media, some development of those apps could really help with putting this into a visual space

    • @jamesbriggs
      @jamesbriggs  Рік тому +2

      started building some demos and testing splade a couple months ago, will be sharing more soon - it's really cool though and I intend on making it a big part of my "go to toolkit" in the future

    • @lutune
      @lutune Рік тому

      @@jamesbriggs Your DC seems to be getting a lot of new people! ill get some things updated on there today for ya

  • @SinanAkkoyun
    @SinanAkkoyun Рік тому

    How does this compare to the new OpenAI embeddings?

  • @aurkom
    @aurkom Рік тому

    Really enjoyed this one.

  • @abhayr
    @abhayr Рік тому

    Amazing explanation. Thx for sharing

  • @thedude9270
    @thedude9270 Рік тому

    Thanks for the tutorial! Is it possible that you could also share a colab or video explaining what would then be upserted as a Pinecone vector?

  • @Sky-ec9eu
    @Sky-ec9eu Рік тому

    This is incredible. Thanks James!

  • @salesgurupro
    @salesgurupro Рік тому

    Amazing. Thanks for such a great explanation 😊

  • @johannamenges3095
    @johannamenges3095 Рік тому

    But is Faiss still a solid solution for a semantic search engine? Cause I am at the moment working on a search engine with Faiss algorithm

  • @biaoliu9297
    @biaoliu9297 Рік тому

    Is there a multi-language version model?

  • @snack711
    @snack711 Рік тому +1

    i am surprised how "orangutans" got split into tokens. i thought "orangutan" surely had to be a token itself.

  • @AnonymousIguana
    @AnonymousIguana 11 місяців тому

    So, SPLADE vector generation is just as computationally intensive as dense vector generation? My understanding is that SPLADE requires real-time inference from a sophisticated model like BERT at query time. Isn't that very problematic?

    • @RatafakRatafak
      @RatafakRatafak 10 місяців тому

      Looks like so. Sentence-BERT is equally computationally intensive thant this SPLADE.

  • @ylazerson
    @ylazerson Рік тому

    very fascinating - thanks!

  • @EkShunya
    @EkShunya Рік тому

    what tool do you use to make the diagrams ?

  • @nhatpham4053
    @nhatpham4053 Рік тому

    awesome works

  • @leventk.1611
    @leventk.1611 Рік тому

    13:02: low proximity = high semantic similarity. Not high proximity. :D

  • @kayalvizhi8174
    @kayalvizhi8174 4 місяці тому

    How has the results of SPLADE been. Has it been proven to be effective?

  • @abhinavkulkarni6390
    @abhinavkulkarni6390 Рік тому

    Hey James,
    Can you please compare SPLADE with ColBERTv2 - both of which are designed to alleviate the problems of desnse passage retrievers?

    • @jamesbriggs
      @jamesbriggs  Рік тому

      I haven't read into the colbert models, I understood them to not be hugely scalable? I can look into it if they're of interest

  • @jeffsteyn7174
    @jeffsteyn7174 Рік тому

    That's interesting. What does pinecone use, sparse or dense?

    • @jamesbriggs
      @jamesbriggs  Рік тому +2

      now it can use both, I'll talk about it in the coming days or you can refer to here github.com/pinecone-io/examples/blob/master/search/hybrid-search/medical-qa/pubmed-splade.ipynb - for an example

    • @sndrstpnv8419
      @sndrstpnv8419 11 місяців тому

      code deleted pubmed-splade.ipynb @@jamesbriggs

    • @RatafakRatafak
      @RatafakRatafak 10 місяців тому

      Is it important? If you use cosine similarity for both dense and sparse embeddings, it should work in any case.

  • @BuFu1O1
    @BuFu1O1 11 місяців тому

    vocabulary mismatch can be fixed with sub-embeddings

  • @ramsescoraspe
    @ramsescoraspe Рік тому

    Multilingual??

  • @klammer75
    @klammer75 Рік тому +1

    Keywords and page rank are dead! The information landscape is undergoing a seismic shift and everyone better put a helmet on!!!🤔🤪😉🤖

    • @jamesbriggs
      @jamesbriggs  Рік тому +1

      things are moving so fast rn

    • @klammer75
      @klammer75 Рік тому

      @@jamesbriggs seems we’re getting closer and closer to the inflection point of the exponential….next stop, ludicrous speed!🤯🚀

  • @hoangphanhuy1992
    @hoangphanhuy1992 11 місяців тому

    I thought CLIP no need to finetune so why cons of dense is need to finetune sir? @jamesbriggs