Detecting pitch automatically - The intuition behind the YIN pitch detection algorithm

Поділитися
Вставка
  • Опубліковано 6 сер 2024
  • Sound is messy and difficult to deal with, yet with some simple techniques, we are able to write a short program which deals well with pitch detection in many real cases of monophonic audio. This video covers the main steps used in the YIN pitch detection algorithm [1] and visualizes the underlying concepts with various animations.
    [1]: De Cheveigné, A., & Kawahara, H. (2002). YIN, a fundamental frequency estimator for speech and music. The Journal of the Acoustical Society of America, 111(4), 1917-1930.
    Corrections:
    - The 2 samples played at 0:40 are not actually the same pitch, but 1 octave apart, specifically 110 Hz and 220 Hz respectively.
    _____
    Music by:
    Cursedsnake
    SoundCloud: / cursedsnake
    UA-cam: / cursedsnake
    Nicolaj Valsted:
    SoundCloud: / nicolajvalsted
    UA-cam: / detslav
    Manim Community was used to make a majority of the visualizations: www.manim.community/ which stems from a fork of Manim by 3Blue1Brown: github.com/3b1b/manim

КОМЕНТАРІ • 40

  • @3blue1brown
    @3blue1brown 2 роки тому +110

    Excellent job! It's wonderful how you lay out all the code for the tactics you describe. I also love the overall narrative structure of developing partial solutions that don't generally work but slowly refining the main idea until there's a final impressive (and at this point quite understandable) tactic.

  • @yusufcan1304
    @yusufcan1304 3 місяці тому +4

    why this is the only video you have bro just keep up !!!

  • @404errorpagenotfound.6
    @404errorpagenotfound.6 3 місяці тому +1

    Hope all is well and that you return. Great clip.

  • @ananthakrishnank3208
    @ananthakrishnank3208 7 місяців тому +2

    Just Amazing!!
    11:10 It takes a lot of time to run in G-colab, for an 8 second input .wav file. 30 mins + still running. I am dropping this for now. Anyways many thanks.

  • @joelmeuleman247
    @joelmeuleman247 Рік тому

    Excellent video. Great work

  • @MATHsegnale
    @MATHsegnale 2 роки тому +1

    Very nice video! I liked the graphics and the explainations! The final part with the real voice was amazing!! Thank you for the video!

  • @sinewavey
    @sinewavey 2 роки тому +4

    Thanks for covering this topic. I've always enjoyed the overlap of music theory and mathematics, and this kind of thing immediately (though much less formally defined) came to mind when I first learned about Fourier transforms. But now there's a well made video that really gets into a practical way this is actually done, with or without that knowledge, and for much more complicated - and closer to reality - cases.
    I really liked how you included multiple approaches and how they could fail or have issues, and how you went about to get a better process.

  • @jaimelabac
    @jaimelabac Рік тому

    Thanks for the explanations!

  • @caleb1951
    @caleb1951 Рік тому

    awesome video! works great!

  • @letstrythistv
    @letstrythistv 2 роки тому +3

    Amazing. I finally understand it.

  • @jeyko666
    @jeyko666 Рік тому

    great one, thanks!

  • @RossPlunkett
    @RossPlunkett 3 місяці тому

    Superb explanation, best on YT

  • @sachinreddy2836
    @sachinreddy2836 Рік тому

    Great video! :)

  • @bot5am
    @bot5am 2 роки тому +1

    Can’t believe I’m gonna be the third subscriber of this channel.

  • @mikehedges7079
    @mikehedges7079 2 роки тому +7

    This is a wonderful video and explains things very clearly. I was wondering, is there any possibility of you sharing your final code? Also, what changes could be made when taking a continuous signal, such as real-time input from a microphone (instead of a wav file)? I hope to see more from you!

    • @vforscience3687
      @vforscience3687  2 роки тому +5

      Thanks! I've created a git repo containing relevant code snippets from this video, which is available at github.com/NValsted/VForScienceProjects. The snippets displayed in the video are not optimized for speed, and especially not the CMNDF. Memoizing this function (and thus avoiding multiple identical calls to DF) will greatly speed up the computation time, yet it is still too slow for real time input (a crude implementation of this is included in the code snippets in the git repo). Implementing it in a faster language than python along with some micro-optimizations will probably be sufficient, but it might be better to take a different approach, perhaps using spectral-domain methods like the fast fourier transform.

    • @TheBunyk
      @TheBunyk Рік тому +2

      ​@@vforscience3687 I took some YIN implementation made in Go, and it's almost real-time. I created a Synthesia-like game based on it: ua-cam.com/video/-9oLTsaAoIM/v-deo.html I use buffers of size 5512 samples and sample rate of 188200 (chosen randomly, now I wonder if my microphone could handle that or it increases sample rate by interpolation). This gives ~34 pitch measures per second, seems to be enough for the gameplay. Still, it will probably benefit from me learning more about DSP and tuning it better.

  • @ericadruin26
    @ericadruin26 Рік тому

    Hi... i'm thank You for this. I've tested it , you explined this beautifuly. Just gonna accupy my self with this for a week. THANK YOU!!!!!
    Reading the original paper with no math background is a pain (sorry..). The math symbol is abstract to me(third world country education system). WE NEED MORE!...... PEOPLE LIKE YOU To populate the internet.

  • @pablosml
    @pablosml Рік тому +3

    Hey! I don't know if you're still active here, but I optimized the code you shared on Github to probably be good enough for real time and added the option of parabolic interpolation (step 5 in the original YIN paper). If you're interested in it, let me know how to contact you (I don't want to share anything that's based on your work here without your approval).
    Thanks for the awesome video! Your modular implementation of the steps described in the paper was great for understanding it :)

    • @vforscience3687
      @vforscience3687  Рік тому +2

      Hey! Thanks for reaching out, that sounds awesome! If you're familiar with GitHub, I'd recommend opening a pull request against this repository with the improved code github.com/NValsted/VForScienceProjects/tree/master/YIN_pitch_detection - Otherwise feel free to modify and share your code in any way you like :)

    • @pablosml
      @pablosml Рік тому +1

      @@vforscience3687 Thanks for the reply! I've created the PR with the implemented changes and some comments about them. I hope you find them useful and interesting. Feel free to reach out with any comments!

  • @computergeek2552
    @computergeek2552 9 місяців тому +1

    I cannot find what I'm doing wrong in the first step. I keep getting an output of 0.49 rather than the expect 0.98, any thoughts?

  • @reportdabug
    @reportdabug 2 роки тому +6

    How are you making these animations? They're beautiful!

    • @vforscience3687
      @vforscience3687  2 роки тому +5

      Thanks! The animations are made with the community maintained version of a tool called Manim, which can be found at www.manim.community/

  • @jacobpenn2266
    @jacobpenn2266 2 роки тому +1

    More videos!

  • @porkypooky4412
    @porkypooky4412 2 роки тому +2

    Thank you very much for this useful video!
    By the way, is detecting polyphonies much different from this?

    • @vforscience3687
      @vforscience3687  2 роки тому +9

      Glad you enjoyed it! I'm not particularly familiar with polyphonic pitch detection (PD) methods yet, but as far as I know, the fact that we do not necessarily know beforehand, how many pitches are present in a signal, and that pitch is not well-defined (in comparison to a frequency component) tend to be what makes polyphonic PD very difficult. In contrast, for monophonic PD, we generally make the assumption that one and only one pitch is present, so the problem is essentially reduced to guessing which single frequency is the most probable.
      From what I've seen, various polyphonic PD methods try to overcome this with some variation of detecting the most probable frequency, then either eagerly or lazily removing it along with its harmonics from the search space such that the next-most probable frequency will be detected and so on until some cutoff.
      I believe polyphonic PD is widely regarded as unsolved, but I would like to delve deeper into some of the current methods, possibly resulting in an explainer-video on the topic.

    • @porkypooky4412
      @porkypooky4412 2 роки тому +4

      @@vforscience3687 Why, thank you!
      This is the longest and kindest reply I ever received from a UA-camr.
      You are the best!

  • @ankitanand2448
    @ankitanand2448 2 місяці тому

    I am slightly confused at the absolute threshold step. The paper says to look for minima (first in the sequence) that is below the threshold, but the way you implemented it does not consider if it is a minima. If the value is less than the threshold, you are considering that to be the period. Could you please explain this.

  • @marshallchadbourne6229
    @marshallchadbourne6229 2 роки тому

    Does anyone know of a way to do this in C?? Thank you!

  • @technofeeliak
    @technofeeliak Рік тому +1

    0:55, I don't hear the same pitch. Maybe the piano, if it is a real piano, is out of tune. Or the guitar. But I hear two distinct notes.

    • @vforscience3687
      @vforscience3687  Рік тому

      This is a very silly mistake on my part. It is correct that you don't hear the same pitch, as the two notes are actually an octave apart.

  • @kikodasneves1
    @kikodasneves1 Рік тому

    Great video. I just didnt understand why limiting the lag values through the bound parameter limits the frequency we can detect?

    • @vforscience3687
      @vforscience3687  Рік тому +1

      I'll try to explain by example. Recall that the value of the frequency we guess is (sample_rate / number_of_samples_shifted_at_best_guess). If we set the sample rate to 100, and we check lag values between 4 and 20, then the highest frequency we can guess is 100/4=25, and the lowest is 100/20=5. As long as we are limiting ourselves to lag values between 4 and 20, we will never guess frequencies outside the interval [5;25]. And in general, the interval of frequencies we consider is [sample_rate / upper_bound ; sample_rate / lower_bound]

  • @atharvasaney3897
    @atharvasaney3897 Рік тому +1

    can you send me a link to your code please?

    • @vforscience3687
      @vforscience3687  Рік тому

      It is available here github.com/NValsted/VForScienceProjects/tree/master/YIN_pitch_detection

  • @recluse-audio3245
    @recluse-audio3245 2 роки тому +1

    How do I give you money for this masterpiece?

    • @vforscience3687
      @vforscience3687  2 роки тому +1

      Thanks, that's very generous of you! However, at the time of writing, I do not have any means of accepting donations. For the time being, I would recommend sponsoring some of the great, free open source tools, which many projects like this one depend on. I am sure a contribution to NumPy and the Python science ecosystem or Manim and its community will be well-received, and will also benefit those using such tools.

  • @user-ys3tm6uw2i
    @user-ys3tm6uw2i Місяць тому

    Any body understand

  • @thomashansen1309
    @thomashansen1309 16 днів тому

    He sounds Danish :)