Cosine Similarity, Clearly Explained!!!

Поділитися
Вставка
  • Опубліковано 6 чер 2024
  • The Cosine Similarity is a useful metric for determining, among other things, how similar or different two text phrases are. I'll be honest, the first time I saw the equation for The Cosine Similarity, I was scared. However, it turns out to be really quite simple, and this StatQuest walks you through it, one-step-at-a-time. BAM!!!
    English
    This video has been dubbed using an artificial voice via aloud.area120.google.com to increase accessibility. You can change the audio track language in the Settings menu.
    Spanish
    Este video ha sido doblado al español con voz artificial con aloud.area120.google.com para aumentar la accesibilidad. Puede cambiar el idioma de la pista de audio en el menú Configuración.
    Portuguese
    Este vídeo foi dublado para o português usando uma voz artificial via aloud.area120.google.com para melhorar sua acessibilidade. Você pode alterar o idioma do áudio no menu Configurações.
    If you'd like to support StatQuest, please consider...
    Patreon: / statquest
    ...or...
    UA-cam Membership: / @statquest
    ...buying my book, a study guide, a t-shirt or hoodie, or a song from the StatQuest store...
    statquest.org/statquest-store/
    ...or just donating to StatQuest!
    www.paypal.me/statquest
    Lastly, if you want to keep up with me as I research and create new StatQuests, follow me on twitter:
    / joshuastarmer
    0:00 Awesome song and introduction
    1:46 Visualizing the Cosine Similarity for two phrases
    6:19 The equation for the Cosine Similarity
    #StatQuest #DubbedWithAloud

КОМЕНТАРІ • 228

  • @statquest
    @statquest  Рік тому +6

    To learn more about Lightning: lightning.ai/
    Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/

  • @jwilliams8210
    @jwilliams8210 Рік тому +40

    You are EXCEPTIONALLY good at CLEARLY describing complex topics!!! Thank you!

  • @usamsersultanov689
    @usamsersultanov689 Рік тому +47

    I think and hope that this video is a preamble for more comlex NLP topics such as Word Embeddings etc.. many thanks for all of your efforts!

    • @statquest
      @statquest  Рік тому +21

      Yes it is! :)

    • @xanderortega4359
      @xanderortega4359 Місяць тому

      Cosine Similarity is used as an evaluation tool on word2vec

  • @nossonweissman
    @nossonweissman Рік тому +7

    You literally make it so easy!!
    I can't help but smile 😊😊😊❤️❤️❤️
    By far one of my favorite UA-cam channels!

  • @mattgenaro
    @mattgenaro 6 місяців тому +5

    Such a simple, yet, a beautiful and powerful concept of similarity.
    Thanks, StatQuest!

  • @jasonlough6640
    @jasonlough6640 2 місяці тому +1

    Dude these are so good. I have to watch them several times, and then I try write some code to reinforce the concept. Your vides are absolutely amazing.

  • @tysontakayushi8394
    @tysontakayushi8394 16 днів тому +1

    I usually hate when people say that a video explains well, because usually this is not the case. But, haha, amazing job! Well done, really nice explained, it's a gamification, they way I understand!

  • @kforay42
    @kforay42 Рік тому +5

    Your videos are such a lifesaver! Could you do one on the difference between PCA and ICA?

  • @abdulrafay2420
    @abdulrafay2420 4 місяці тому +1

    What a great way of explaination !! Love it ❤

  • @AU-hs6zw
    @AU-hs6zw Рік тому +1

    You deliver the moment I need it. Thanks

  • @virenpai9395
    @virenpai9395 4 місяці тому +1

    My Love for learning Data Science and Statistics has increased multi-folds because of you. Thank you Josh!!🙂

  • @torley
    @torley Рік тому +3

    QUADRUPLE BAM!!! Thanks for such fun yet pragmatic explainers.

  • @anuj5576
    @anuj5576 Рік тому +1

    Super simplistic explanation! Thanks for your effort.

  • @olucasharp
    @olucasharp Рік тому +4

    It all seems so easy when you speak about such complicated things! Huge talent! And so funny ⚡⚡⚡

  • @bjornnorenjobb
    @bjornnorenjobb Рік тому +2

    Awesome video! I had no idea what Cosine Similarity was, but you explained super clearly

  • @magicfox94
    @magicfox94 Рік тому +2

    Excellent explaination! I hope it is the first of a NLP series of videos!

    • @statquest
      @statquest  Рік тому +5

      I hope to do word embeddings soon.

  • @dukeduke1910
    @dukeduke1910 2 місяці тому +1

    This guy is seriously funny. I thought I was the only person who ever watched gymkata (like 50 times, especially the part in the town where everyone was crazy). This video def explains cosine sim clearly. Thk u!

  • @azzahamed2063
    @azzahamed2063 4 місяці тому +1

    This is an AMAZING explanation !!

  • @user-kp8lw1nz7m
    @user-kp8lw1nz7m 3 місяці тому +1

    you are the King Josh 👏👏👏👏 wonderful job!!!

  • @bladongarland8635
    @bladongarland8635 21 день тому +1

    Hilarious, easy to understand, and entertaining. Bravo!

  • @ymperformance
    @ymperformance Рік тому +1

    Great video and great explanation! Thanks.

  • @ibrahimogunbiyi4296
    @ibrahimogunbiyi4296 Місяць тому +1

    I came here as I need to learn something in NLP. Thank you, I understood it clearly.

  • @exoticcoder5365
    @exoticcoder5365 10 місяців тому +1

    I must watch Gymkata ! Thanks for the recommendation ! And excellent explanation of the topic !

  • @bachdx2812
    @bachdx2812 Рік тому +1

    thanks a lot. this kind of videos are super helpful for me !!!

  • @RichardGreco
    @RichardGreco Рік тому +2

    Great video. Very interesting. I hope to see you apply this to more examples.

    • @statquest
      @statquest  Рік тому +1

      We'll see it used in CatBoost for sure.

  • @muhammadazeemmohsin5666
    @muhammadazeemmohsin5666 6 місяців тому +1

    what's an amazing explanation. Thanks for the video.

  • @KarthikNaga329
    @KarthikNaga329 Рік тому +4

    This is another great video, Josh!
    question: @3:51 you talk about having 3 Hellos and that still results in a 45 degree angle with Hello World.
    However, comparing Hello to Hello World seems to be a diff angle from comparing Hello to Hello World World.
    Is there an intuition as to why this is the case? That is adding as many Hellos to Hello keeps the angle the same, but adding more Worlds to Hello World seems to change the Cosine Similarity.

    • @statquest
      @statquest  Рік тому

      Two answers:
      1) Just plots the points on a 2-dimensional graph for the two pairs of phrases and you'll see that the angles are different.
      2) The key difference is that "hello hello hello" only contains the word "hello". If we had included "world", then the angles would be different. Again, you can plot the points to see the differences.

  • @luizcarlosazevedo9558
    @luizcarlosazevedo9558 Рік тому +1

    Hey, great video as always!! Is the cosine similarity good for regression problems in which the targets are pretty close to zero? Im trying to implement some accuracy metrics for a transformer model

    • @statquest
      @statquest  Рік тому

      Hmm... I bet it would work (if you had a row of predictions and a row of known values).

  • @Francescoct
    @Francescoct Рік тому +1

    Great video! Have you made one for the Word Embeddings?

  • @edmiltonpeixeira3221
    @edmiltonpeixeira3221 Рік тому

    Parabéns pelo conteúdo. Excelente explicação, como não encontrei em nenhum outro vídeo

  • @dataanalyticswithmichael8931
    @dataanalyticswithmichael8931 Рік тому +1

    superb ! Thank you for the explanation

  • @kavita8925
    @kavita8925 9 місяців тому +1

    Your Explanation is great

  • @murilopalomosebilla2999
    @murilopalomosebilla2999 Рік тому +1

    Hello!! Nice video!

  • @theedspage
    @theedspage 10 місяців тому +1

    Hello! Hello! Hello! Thank you for introducing me to this topic! Subscribed.

    • @statquest
      @statquest  10 місяців тому

      Awesome! Thank you!

  • @jonathanramos6690
    @jonathanramos6690 2 місяці тому +1

    Amazing!!

  • @AmineBELALIA
    @AmineBELALIA Рік тому +1

    this video needs more views it is awesome

  • @RaynerGS
    @RaynerGS 7 місяців тому +1

    I love you!!!! Salute from Brazil.

    • @statquest
      @statquest  7 місяців тому

      Muito obrigado! :)

  • @spambaconeggspamspam
    @spambaconeggspamspam Рік тому +2

    Perfect! I'm trying to figure out how to best present my Single Cell Data in a UMAP and saw i cosine is the default distance metric in Seurat!

  • @davidmurphy563
    @davidmurphy563 Рік тому +9

    Could you cover discrete cosine/fourier transforms pretty please?* I've love to know how to break signals up into their component frequencies.
    If you haven't already!

    • @statquest
      @statquest  Рік тому +1

      I'll keep that in mind.

    • @alexanderlevakin9001
      @alexanderlevakin9001 Рік тому +1

      Have you seen 3blue1brown video on this topic? Not sure if it about descreet FT.

  • @lifeisbeautifu1
    @lifeisbeautifu1 3 місяці тому +1

    Thank you!

  • @chrisguiney4568
    @chrisguiney4568 Рік тому +1

    This video also does a good job highlighting how cosine and dot products are the same. Unless I'm mistaken, that equation can be written dot(a, b) / (magnitude(a) * magnitude(b)), where magnitude(x) = sqrt(dot(x, x))

  • @abrahammahanaim3859
    @abrahammahanaim3859 11 місяців тому +2

    Hey josh thanks for the video nice explanation.

  • @millennialm1money500
    @millennialm1money500 Місяць тому +1

    Great video 🎉

  • @yuan8947
    @yuan8947 Рік тому

    Always thank you for the great and easy-understanding video!
    And I have a question about the totally different word.
    If there are 2 sentences like very good/super nice, since very, good, super, nice are totally different, the cosine similarity will be 1.
    However, they are actually the same meaning!
    I want to ask what else preprocessing should we do toward such situation?
    Thank you so much!

    • @statquest
      @statquest  Рік тому +1

      I think you might need more context (longer phrases) to get a better cosine similarity. I just used 2 words because I could draw them, but in practice, you use more.

  • @jainanshu2000
    @jainanshu2000 Рік тому

    Great video ! One question - how is this diffrent from the regular string comparison we use various programming languages?

    • @statquest
      @statquest  Рік тому

      I'm not sure I understand your question. My understanding of string comparison in programming languages is that it just compares the bits to make sure they are equal and the result is a boolean True/False type thing.

  • @limebro8833
    @limebro8833 Рік тому +1

    This video saved me, I cannot thank you enough.

  • @smegala3815
    @smegala3815 Рік тому +2

    Very useful 👍

  • @user-yd8sr9ot9u
    @user-yd8sr9ot9u 7 місяців тому +1

    wow thankyou!!! i don't know how to calculate it , but after watching this, i become mathmatician!!

  • @suaridebbarma1255
    @suaridebbarma1255 21 день тому +1

    this video was absolutely a BAM!!

  • @debatradas1597
    @debatradas1597 9 місяців тому +1

    Thank you so much

    • @statquest
      @statquest  9 місяців тому

      You're most welcome!

  • @MrJ17J
    @MrJ17J Рік тому

    Super interesting ! Do you have examples of how those are implemented in practice ?

    • @statquest
      @statquest  Рік тому

      I talk about that at the start of the video, but it's also used by CatBoost to compare the predicted values for a bunch of samples to their actual values.

  • @samrasoli
    @samrasoli Рік тому +1

    useful, thanks

  • @CristianoGarcia10
    @CristianoGarcia10 Рік тому +1

    Excellent and clear video! I wonder why NLP applications use more often cosine distance rather than other metrics, such as euclidean distance. Is there a clear reason for that? Thanks in advance

    • @statquest
      @statquest  Рік тому +1

      I'm not certain, but one factor might be how easy it is to compute (people often omit the denominator making the calculation even easier) and it might be nice that the cosine similarity is always between 0 and 1 and doesn't need to be normalized.

  • @Ghulinzer
    @Ghulinzer Рік тому +1

    Great video! I've seen though in many articles out there that people consider cosine similarity the same as Pearson's correlation since they produce the same outcome when E(X) = E(Y) = 0 and the means of X and Y = 0.
    This is not true since both measure different things. Cosine similarly measures the cosine of the angle between two vectors in a multi-dimensional space and returns a similarity score as explained in the video, while Pearson's correlation measure the linear relationship between 2 variables.

  • @sciab3674
    @sciab3674 2 місяці тому +1

    thanks a lot. easy to understand

  • @Shehab-Codes
    @Shehab-Codes 6 місяців тому

    Thank you so much
    I had no idea what cosine similarity is and you illustrated it easily, appreciate it
    Btw how cosine similarity can result in -ve number

    • @statquest
      @statquest  6 місяців тому

      The cosine similarity can be calculated for any 2 sets of numbers, and that can result in a negative value.

  • @miltonborges7356
    @miltonborges7356 5 місяців тому +1

    Amazing

  • @raphaelbonillo2192
    @raphaelbonillo2192 8 днів тому +1

    Você democratiza a matemática! Deveriam fazer assim nas escolas.

  • @Levy957
    @Levy957 Рік тому +1

    you are amazing

  • @pouryajafarzadeh5610
    @pouryajafarzadeh5610 Рік тому +1

    Cosine similarity is a good method for comparing the embedding vectors, especially for face recognition.

  • @mystmuffin3600
    @mystmuffin3600 Рік тому +1

    Cool! (in StatQuest voice)

  • @cartulinito
    @cartulinito Рік тому +1

    Great video as we are used to.

  • @gsp_admirador
    @gsp_admirador Рік тому +2

    nice easy explanation

  • @fazelamirvahedi9911
    @fazelamirvahedi9911 6 місяців тому

    Thank you for making all of these informative, simple and precise videos. I wondered what happens if two phrases deliver the same meaning but have different orders of words, for instance: A) I like Gymkata. B) I really like Gymkata. In this case doesn't the extra adverb "really" in the second sentence disturb the phrase matrix? And one more question, if the three phrases have the same length and two of them have the same meaning but have used different words, like: A) I like Gymkata. B) I love Gymkata. C) I like volleyball. In this case, would the cosine similarity between A and B be more than A and C?

    • @statquest
      @statquest  6 місяців тому +1

      In this video, we're simply counting the number of words that are the same in different phrases, however, you can use other metrics to calculate the cosine similarity, and that is often the case. For example, we could calculate "word embeddings" for each word in each phrase and calculate the cosine similarity using the word embedding values and that would allow phrases with similar meanings to have larger similarities. To learn more about word embeddings, see: ua-cam.com/video/viZrOnJclY0/v-deo.html

  • @MOROCCANFREEMIND
    @MOROCCANFREEMIND 4 місяці тому +1

    The quality of your explanation is more than triple bam!!😂

  • @willw4096
    @willw4096 9 місяців тому +1

    Great video! My notes: 3:52 4:23

  • @SalahMusicOfficial
    @SalahMusicOfficial 11 місяців тому

    Hi Josh, I’m trying to understand why cosine similarity may be the best metric to find semantically similar texts (using pertained embeddings). It sounds like the two vectors have to only directionally similar for cosine similarity to be high. What about using something like Euclidean or Manhattan distance. Would a distance metric be better to see if two texts are semantically similar?

    • @statquest
      @statquest  11 місяців тому +1

      That's a good question and, to be honest, I don't know the answer. I do know, however, that most neural networks - when they use "attention" (like in transformers, which are used for ChatGPT) - just use the numerator of the cosine similarity as the "similarity metric". In other words, they just compute the dot-product. Maybe they do this because it's super fast, and the speed outweighs the benefits of using another, more sophisticated method.
      Also, it's worth noting that this is a similarity metric and not a distance. In other words, as the value goes up, things are "more similar" (the angle is smaller). In contrast, the Euclidean and Manhattan distances are...distances. That is, as the value goes up, the things are further away and considered "less similar"
      Lastly, cool music on your channel! You've got a dynamite voice.

    • @SalahMusicOfficial
      @SalahMusicOfficial 11 місяців тому +1

      @@statquest thank you! let me know if you need another voice in any of your intro jingles 😁

    • @statquest
      @statquest  11 місяців тому

      @@SalahMusicOfficial bam!

  • @ZOBAER496
    @ZOBAER496 Рік тому

    Can you please tell about some applications of cosine similarity like where is it used in which type of problems?

    • @statquest
      @statquest  Рік тому

      I talk about that at the start of the video, but you can also use it whenever you want to compare two rows of data. For example, CatBoost uses it compare predicted values for a bunch of data to their actual values.

  • @sushi666
    @sushi666 11 місяців тому

    Can you please do Spherical K Means with Cosine Similarity as the distance metric?

    • @statquest
      @statquest  11 місяців тому +1

      I'll keep that in mind.

  • @madhubabukencha5037
    @madhubabukencha5037 Рік тому +2

    Man you are not human, you are my god 😀

  • @itSinger
    @itSinger 4 місяці тому +1

    tysm

  • @banibratamanna5446
    @banibratamanna5446 Місяць тому

    the generalized equation of cosine similarity comes from the dot product of 2 vectors in multidimension.....by the way big fan of yours❤

    • @statquest
      @statquest  Місяць тому

      scaled to be between -1 and 1. :)

  • @Mrnafuturo
    @Mrnafuturo Рік тому

    Does cosine similarity equation ends up being a vector normalization of the projection of one vector over the other one?

    • @statquest
      @statquest  Рік тому

      I believe that is correct.

  • @chris-graham
    @chris-graham Рік тому +1

    "in contrast, this last sentence is from someone who does not like troll 2" - I was expecting a BOOOO after that lol

    • @statquest
      @statquest  Рік тому

      Ha! That would have been great.

  • @lonok84
    @lonok84 Місяць тому +1

    Wow, I used this to make a bot from whatsapp, to put client on flow/menu based on the first message from client

  • @raven-888
    @raven-888 Рік тому +2

    Love you

  • @nidhi_singh9494
    @nidhi_singh9494 2 місяці тому

    Hey...so cosine is only depends on angle not on lengths... When the case of three Hello were shown, how it can be distinguished between them as similarity is same for both sentence

    • @statquest
      @statquest  2 місяці тому

      What time point, minutes and seconds, are you asking about?

  • @eddiesec
    @eddiesec Рік тому

    I still don't understand how that works for embeddings though. Each embedding dimension should represent loosely a grammatical property of the words, than how can one word that is farther than another in a single dimension (as in your Hello Hello Hello example) be considered identical?

    • @statquest
      @statquest  Рік тому

      I'll do a video on embeddings soon.

  • @001kebede
    @001kebede 9 місяців тому

    how can we relate this with correlation between two continuous random variables?

    • @statquest
      @statquest  9 місяців тому

      See: stats.stackexchange.com/questions/235673/is-there-any-relationship-among-cosine-similarity-pearson-correlation-and-z-sc#:~:text=TL%3BDR%20Cosine%20similarity%20is,a%20norm%20of%20%E2%88%9An.&text=To%20convert%20a%20z%2Dscore,function%20for%20a%20Gaussian%20distribution.

  • @PromitiDasgupta-mz7uc
    @PromitiDasgupta-mz7uc 11 місяців тому

    can i use cosine similarity for building a similarity matrix between two different brain regions?

  • @shintaardani6332
    @shintaardani6332 2 місяці тому

    I am conducting sentiment analysis research and found that some data has a Cosine Similarity of 0. Are there any methods to make the Cosine Similarity not equal to 0?

    • @statquest
      @statquest  2 місяці тому

      you could pad each phrase with something, so all phrases have at least one thing in common.

    • @shintaardani6332
      @shintaardani6332 2 місяці тому +1

      @@statquest Thank you so much😁

  • @XEQUTE
    @XEQUTE 2 місяці тому +1

    You're kinda like Phil from Mordern Family but for Data Science/ Statistics

  • @skiraf
    @skiraf Рік тому

    "Troll 2" should be considered 1 word. It refers to only one idea, the troll sequel movie which is different than the first troll movie.

  • @AxDhan
    @AxDhan Рік тому +1

    I'm a native spanish speaker, and it surprised me when it started speaking spanish, it will reach more people, but they will miss your motivating silly songs xD

    • @statquest
      @statquest  Рік тому

      Thanks! Yeah - I'm not sure what to do about the silly songs. :)

  • @aquagardening5803
    @aquagardening5803 4 місяці тому +1

    BAM!!!

  • @s0meus3r
    @s0meus3r 2 місяці тому +1

    I got it BAMM !!🎉

  • @rajashreechakraborty747
    @rajashreechakraborty747 4 місяці тому

    Can u please help me with this?
    This is my data:
    A: cosine: 0.58, z-score: 372
    B: cosine: 0.63 , z-score: 370
    How can I find the p-value/significance of the 0.5 change in the cosine similarities?

    • @statquest
      @statquest  4 місяці тому

      We didn't cover p-values in the video.

  • @hansu7474
    @hansu7474 Рік тому

    What is the applications of cosine similarity?

    • @statquest
      @statquest  Рік тому

      Umm... Did you watch the video? It's the first thing I talk about.

  • @kushiiiy1582
    @kushiiiy1582 Рік тому

    Why is it specifically Cos, and not Tan? Since you’re collecting the opposite and adjacent length??

    • @statquest
      @statquest  Рік тому

      The cosine is easy to calculate and, unlike the tangent function, is defined for all possible angles.

  • @Olddays100s
    @Olddays100s 7 місяців тому

    but if the phrases are Hello World and World Hello. The cosine would still be 1. how to differentiate between them using cosine similarities? do algorithms introduce another dimension?

    • @statquest
      @statquest  7 місяців тому

      Algorithms use other methods to keep track of word order. For example, transformers use positional encoding. To learn more, see: ua-cam.com/video/zxQyTK8quyY/v-deo.html

  • @c.nbhaskar4718
    @c.nbhaskar4718 Рік тому

    what is the formula if we have more than 2 sentences ??

    • @statquest
      @statquest  Рік тому

      I believe you just calculate it for all combinations of the sentences.

  • @aizazkhan5439
    @aizazkhan5439 8 місяців тому

    Can the cosine similarity be greater than the distance between words?

    • @statquest
      @statquest  8 місяців тому

      I guess it depends on how you measure the distance. However, in general, the cosine similarity will always be between -1 and 1 (and is usually just between 0 and 1).

    • @aizazkhan5439
      @aizazkhan5439 8 місяців тому

      @@statquest in what cases can cosine similarity be -1? Isnt it a similarity measure meaning 0 would imply nothing in common and 1 perfect similarity? What would -1 imply?

    • @statquest
      @statquest  8 місяців тому

      @@aizazkhan5439 -1 similarity is sort of like a inverse correlation - when one goes up, the other goes down, etc.

  • @offBeatRock777
    @offBeatRock777 Рік тому

    Is this the basis of LLM model?

    • @statquest
      @statquest  Рік тому +1

      Unfortunately, not. However, I'll be doing some videos that cover LLM topics, like word embedding and attention, soon.

  • @Liteship
    @Liteship 3 місяці тому +1

    Baaaaam!

  • @lorryzou9367
    @lorryzou9367 Рік тому

    This equation is actually the equation of dot product of two vectors

  • @zorojuro5106
    @zorojuro5106 10 місяців тому

    existing video

  • @_thehunter_
    @_thehunter_ Рік тому

    how do some people come up with formulas.. its crazy

  • @hippolyte223
    @hippolyte223 Рік тому

    Please can we have vidéos about transformers ? 🙏🙏🙏

  • @ayoubrayanemesbah8845
    @ayoubrayanemesbah8845 5 місяців тому +1

    hello ,hello , helloo🤣🤣

  • @sethjchandler
    @sethjchandler Рік тому +2

    Good discussion of cosine similarity. But Gymkata is an AWFUL movie.