Sentence Transformers: Sentence Embedding, Sentence Similarity, Semantic Search and Clustering |Code

Поділитися
Вставка
  • Опубліковано 8 вер 2024
  • Learn How to use Sentence Transformers to perform Sentence Embedding, Sentence Similarity, Semantic search, and Clustering.
    Code: github.com/Pra...
    Previously Uploaded Transformers Videos:
    1. Learn How to Fine Tune BERT on Custom Dataset.
    2. Learn How to Deploy Fine-tuned BERT Model. (Hugging Face Hub and Streamlit Cloud)
    3. Learn How to Deploy Fine-tuned Transformers Model on AWS Fargate (Docker Image)
    #nlp #bert #transformers #machinelearning #artificialintelligence #datascience

КОМЕНТАРІ • 75

  • @FutureSmartAI
    @FutureSmartAI  Рік тому +2

    📌 Hey everyone! Enjoying these NLP tutorials? Check out my other project, AI Demos, for quick 1-2 min AI tool demos! 🤖🚀
    🔗 UA-cam: www.youtube.com/@aidemos.futuresmart
    We aim to educate and inform you about AI's incredible possibilities. Don't miss our AI Demos UA-cam channel and website for amazing demos!
    🌐 AI Demos Website: www.aidemos.com/
    Subscribe to AI Demos and explore the future of AI with us!

  • @user-zl1pf2sy5s
    @user-zl1pf2sy5s Місяць тому +1

    Easy Interpretation!! Kudos

  • @kyoungd
    @kyoungd Рік тому +2

    This is an amazing video. I love how you walk me through, step by step. I love how this video gets into the meat of the problem and solution rather than talking endlessly about this and that. Straight to the point, and tons of useful and practical information that I can apply right away.

    • @FutureSmartAI
      @FutureSmartAI  Рік тому

      Thank You very much 🙏.Hope you find other videos also useful.

  • @martinmolina-sx4be
    @martinmolina-sx4be Рік тому +1

    All the concepts were clearly explained, thanks for the video! 🙌

  • @Munk-tt6tz
    @Munk-tt6tz 4 місяці тому

    That's exactly what I needed. Huge thanks Pradip!

  • @prajithkumar432
    @prajithkumar432 2 роки тому +1

    Very helpful video on embeddings Pradip. Keep it going👏👏👏

  • @phongd5929
    @phongd5929 Рік тому +2

    I'm a starter in this wide range of ML and very impressed about your presentation. If you have a chance, can you make a video about predicting name tag from some alphabets. For example, searching FBI, it will return Federal Bureau of Investigation, etc

  • @balag3611
    @balag3611 Рік тому +1

    Thanks for explaining this concept.This video is really helpful for project

  • @HuggingFace
    @HuggingFace 2 роки тому +3

    Cool video! 🤗

  • @ragibshahriar7959
    @ragibshahriar7959 2 місяці тому +1

    Great!!!! Super!!!!

  • @bhusanchettri8594
    @bhusanchettri8594 Рік тому +1

    Nicely compiled. Great work!

  • @ShaikhMushfikurRahman
    @ShaikhMushfikurRahman Рік тому +1

    Just amazing! Salute man!

  • @Ashesoftheliving
    @Ashesoftheliving 2 роки тому +1

    Wonderful lesson!

  • @IbrahimKhan-lf9cq
    @IbrahimKhan-lf9cq 11 місяців тому +1

    Amazing great work can please make video on semantic similarity detection model using bert transformer pleaseeee 🙏🙏🙏🙏🙏

  • @AndresVeraF
    @AndresVeraF Рік тому +1

    thanks you! very good explaination

  • @wilfredomartel7781
    @wilfredomartel7781 Рік тому +1

    Amazing! But how about a video of how to fine tuning a sentence transformer for nom english?

  • @LearningWorldChatGPT
    @LearningWorldChatGPT Рік тому +1

    Fantastic video!
    Thanks a lot for the explanation

  • @birolyildiz
    @birolyildiz Рік тому +1

    Thank you very much ❤

  • @kittu999c
    @kittu999c 4 місяці тому

    Great content!!

  • @Raaj_ML
    @Raaj_ML 4 місяці тому

    Pradip, can you please explain how you narrow down a particular model from all others ? Like how or why did you pick up this particular mfaq model for semantic search of query ?

  • @ilducedimas
    @ilducedimas Рік тому +1

    You rock !

  • @Amazingarjun
    @Amazingarjun 5 місяців тому

    Thank YOU.

  • @veronicanatividade
    @veronicanatividade 11 місяців тому

    OMG, man! Thank you!!

  • @flreview212
    @flreview212 Рік тому +1

    Hello sir, thanks for sharing, this is so insightful. I want to build a text summary, but I find it interesting on this embedding method. I want to ask, how we train our dataset on this model. Have any tutorials? Thank you in advance!

    • @FutureSmartAI
      @FutureSmartAI  Рік тому

      You can finetuning senetence transormers but I dont have any tutorials on it. You can read more about it www.sbert.net/docs/training/overview.html

    • @flreview212
      @flreview212 Рік тому

      @@FutureSmartAI Sorry to bothering again sir, I'm still new to this, so I just give the sentences (all the news text in each document without labels) in InputExample function and then train it in SentenceTransformer?

  • @MyAscetic
    @MyAscetic Рік тому +1

    Hi Pradip. Great demo. Can we further classify the cluster number into text? For example is there a model that will generate the word “baby” for cluster 0, “drums or monkey” for cluster 2, “animal” for cluster 3 and “food” for cluster 4?

  • @panditamey1
    @panditamey1 Рік тому +1

    Fantastic video Pradip!! Can you please suggest any reading material for sentence embeddings?

    • @FutureSmartAI
      @FutureSmartAI  Рік тому +1

      Thanks, Amey. I think you should check the official website. They have details of what pre-trained models are available and how to fine-tune them.
      www.sbert.net/docs/training/overview.html
      There is also new thing called SetFit
      huggingface.co/blog/setfit

    • @panditamey1
      @panditamey1 Рік тому

      @@FutureSmartAI Thanks a lot!!

  • @kaka_rbp1998
    @kaka_rbp1998 Рік тому

    Thankyou

  • @saritbahuguna9603
    @saritbahuguna9603 9 місяців тому

    pip install -U sentence-transformers I am getting error .
    To fix this you could try to:
    1. loosen the range of package versions you've specified
    2. remove package versions to allow pip attempt to solve the dependency conflict

  • @rajsethi26
    @rajsethi26 Рік тому

    excellent man!! short and crisp. Do you mind creating semantic search model on custom dataset using pre-trained hugging face model.

    • @FutureSmartAI
      @FutureSmartAI  Рік тому

      you mean you want to fientune senetence transfomers?

  • @pouriaforouzesh5349
    @pouriaforouzesh5349 Рік тому

    🙏

  • @sumankumari-gl3ze
    @sumankumari-gl3ze Рік тому

    amazing

  • @venkatesanr9455
    @venkatesanr9455 2 роки тому +1

    Thanks for valuable inputs and clarity explanation. Can you do ner related videos/fine tuning using hugging face. Another request is currently, i am also doing semantic search related tasks and followed all the links in the notebooks already excluding clustering. I like to do semantic search between text input and images output ( which is possible only vectorizing both query and image description). Can you share any links related to huggingface or others that will be helpful.

    • @FutureSmartAI
      @FutureSmartAI  2 роки тому +1

      Hi Venkatesan, I have already done videos related to custom NER and Fine tuning Hugging face transformers.
      ua-cam.com/video/9he4XKqqzvE/v-deo.html
      ua-cam.com/video/YLQvVpCXpbU/v-deo.html
      For semantic search between text input and images output:
      Check CLIP (Contrastive Language-Image Pre-training)
      openai.com/blog/clip/
      It is a neural network model which efficiently learns visual concepts from natural language supervision.
      CLIP is trained on a dataset composed of pairs of images and their textual descriptions, abundantly available across the internet.

    • @FutureSmartAI
      @FutureSmartAI  2 роки тому +1

      SentenceTransformers provides models that allow embedding images and text into the same vector space. This allows to find similar images as well as to implement image search.
      www.sbert.net/examples/applications/image-search/README.html

    • @venkatesanr9455
      @venkatesanr9455 2 роки тому +1

      @@FutureSmartAI Thanks for your valuable links and I will check/try.

  • @duetplay4551
    @duetplay4551 Рік тому +1

    q about the clustering case you gave the last. Is there a default criterion of similarity score to group the sentence? Which factor(s) sort the sentences together behind the scene? I mean some groups have only 2 sentences and some have 4 or 5. Thx

    • @FutureSmartAI
      @FutureSmartAI  Рік тому +1

      K-means clustering based on their proximity to the centroid of each cluster. The distance measure used in K-means clustering is typically the Euclidean distance, which is the straight-line distance between two points in n-dimensional space.
      Depends on these distances they are grouped. here we are calculating distance between embedding.

    • @duetplay4551
      @duetplay4551 Рік тому

      @@FutureSmartAI thx for your quick reply. Let me ask this way: Is there a specific distance value behind this clustering? This might be need a read though the document by myself. Thanks again!

    • @FutureSmartAI
      @FutureSmartAI  Рік тому +1

      @@duetplay4551 Yes in Kmeans you should be able to get the distance between cluster centroid and points.

  • @samarthsarin
    @samarthsarin 2 роки тому +1

    How can I train my custom sentence embeddings for my domain specific task so that I can find out similarity between my custom domain words

    • @FutureSmartAI
      @FutureSmartAI  2 роки тому

      You can train your own here are steps.
      www.sbert.net/docs/training/overview.html

    • @samarthsarin
      @samarthsarin 2 роки тому

      @@FutureSmartAI thank you for replying but this is valid for a supervised problem. I have huge amount of data which is pure text documents. I want to train it in an unsupervised way where the model can learn similar words/ sentences

    • @FutureSmartAI
      @FutureSmartAI  2 роки тому

      @@samarthsarin One way to train unsupervised is by using dummy tasks like next word prediction or next sentence prediction

  • @duetplay4551
    @duetplay4551 Рік тому +1

    Embedding/similarity can only be applied between sentences? What if paragraph to paragraph? essay to essay? Thx

    • @FutureSmartAI
      @FutureSmartAI  Рік тому +1

      Embedding can be calculated for paragraph and also for big documents.
      Sometime models will have their input token limit in the case you need to break document into smaller paragraphs and then calculate embedding.

    • @duetplay4551
      @duetplay4551 Рік тому

      @@FutureSmartAI Thank you, sir! I will try it out. Is there any particular model you suggest to start with?
      Thanks again.

  • @shubhamguptachannel3853
    @shubhamguptachannel3853 Рік тому

    Thanku soo much sir😊❤❤❤

  • @shobhitrajgautam
    @shobhitrajgautam Рік тому

    great video. My usecase is slightly different.
    i have corpus of articles and corpus of summary.
    i want to find for particiular summary, how many articles are sematically related or similar.
    Which model is use, embedding and cluster it or not?
    Can you help

    • @FutureSmartAI
      @FutureSmartAI  Рік тому

      You can use embedding and semantic score.
      1. Calculate embedding for each article.
      2. calculate embedding for a particular summary
      Now iterate through each article embedding and calculate the cosine similarity between article embedding and particular summary.
      Sort results to get the most semantically similar articles to that summary.
      Check this it has utility functions : www.sbert.net/examples/applications/semantic-search/README.html

  • @SMCGPRA
    @SMCGPRA 5 місяців тому

    How we know what is number of clusters needed

    • @Mr_ScrufMan
      @Mr_ScrufMan 4 місяці тому

      It's beneficial if you can somehow infer it based on domain knowledge, but have a look at the "elbow method" or "silhouette method"

  • @nitinchavan3395
    @nitinchavan3395 Рік тому

    HI Pradip, thanks for the video.
    Can you please help me with this:
    The embeddings (numerical values) change every time I use a new kernel.
    How can I ensure that the embeddings are exactly same?
    I have tried the following but it does not seem to work:
    1. use model.eval() to turn model into evaluation mode and deactivate dropouts.
    2. set "requires_grad" for each layer in the model as false so that the weight do not change.
    3. set the same seeds.
    Could you please guide me on this, any suggestion is appreciated.
    Thanks,
    Nitin

    • @tintumarygeorge9309
      @tintumarygeorge9309 11 місяців тому

      Hi, Did you get the solution for this problem ? I am also facing same problem

    • @nitinchavan3395
      @nitinchavan3395 11 місяців тому

      @@tintumarygeorge9309 Yes, the weights remain same (provided you use exactly same text each time). The bug in my case was, the order of text which I fed to the transformer was not same every time.

  • @AzertAzert-nw4ze
    @AzertAzert-nw4ze 6 місяців тому

    😙🤯🤯🤯🤯😃😲😁😅🤣😲😁🤣

  • @highstakestrading
    @highstakestrading Рік тому

    where can we find the dataset?
    its throwing error it cant find the dataset No such file or directory: '/content/drive/MyDrive/Content Creation/UA-cam Tutorials/datasets/toxic_commnets_500.csv'

    • @FutureSmartAI
      @FutureSmartAI  Рік тому

      Its is shared in previous video of playlsit.