Mel Frequency Cepstral Coefficients (MFCC) Explained

Поділитися
Вставка
  • Опубліковано 2 жов 2024

КОМЕНТАРІ • 44

  • @datamlistic
    @datamlistic  Рік тому +3

    If you want to find out more about the Fourier Transform and the maths behind it, make sure to check out this video: ua-cam.com/video/7Tk6BAJ3mm8/v-deo.html

  • @quinxx12
    @quinxx12 5 днів тому

    Talking about windowing. Sounds like you windowed your voice recording in this video. The chopping is quite irritating

  • @florin-andreirusu6424
    @florin-andreirusu6424 Рік тому +6

    Oh wow, this is very well explained. Thank you!

  • @anne.nijenhuis
    @anne.nijenhuis День тому

    A great explanation! Thanks!

    • @datamlistic
      @datamlistic  8 годин тому

      Thanks! Glad it was helpful! :)

  • @iankeck3419
    @iankeck3419 Рік тому +1

    Thanks! Nice explanation. They were used in the early days of speech recognition when most recognizes were HMM based.

    • @datamlistic
      @datamlistic  Рік тому

      Thank you! MFCCs are still quite used in speech recognition based neural networks that do not take directly as input the audio and need some kind of pre-processing step (DeepSpeech, LAS etc). However, the community is shifting towards wav2vec2 like models that take as input the raw audio, so yeah, they are starting to become less relevant, but are an interesting studycase nevertheless.

  • @ethancooper4154
    @ethancooper4154 11 місяців тому

    So when using the triangular filterbank, you store just one scalar for each bank rather than a whole array of data?

    • @datamlistic
      @datamlistic  11 місяців тому +1

      Exactly, that scalar is the weighted sum of energies as given by the filterbank. Please let me know if you have any more questions. :)

  • @dexnug
    @dexnug Рік тому

    Hi nice presentation..tbh applying filterbank part is the hardest to understand.
    1. After we compute the fourier transform to each frame/segment signal, we convert signal to mel scale and applying triangular filterbank?and what is the output from mel filterbank?
    2. Btw can I have the reference that you used?
    3. Can i get the code to generate all this plot 1:59 ?

    • @datamlistic
      @datamlistic  Рік тому

      Thank you for your feedback! Yeah, the filterbanks part also didn't come easy to me. Maybe I should have provided more details there. Regarding your questions:
      1. Almost correct. We don't convert the spectrum to mel-scale directly, we apply the triangular filterbank to obtain the filterbank energies. We could have simply selected the frequencies that correspond to equal points on the mel scale, but then we would have lost quite a bit of spectral information. You can think of filterbank triangles as taking the average of frequencies in that triangle.
      2. practicalcryptography.com/miscellaneous/machine-learning/guide-mel-frequency-cepstral-coefficients-mfccs/ and wiki.aalto.fi/display/ITSP/Cepstrum+and+MFCC
      3. I don't have that code anymore.

  • @nayanvats3424
    @nayanvats3424 Рік тому +1

    Very crisp and precise explanation of MFCC :)

    • @datamlistic
      @datamlistic  Рік тому

      Many thanks! Glad it helped! :)

    • @nayanvats3424
      @nayanvats3424 Рік тому

      @@datamlistic would you try to explore features like Single Frequency Filtering and other filter bank approaches?

    • @datamlistic
      @datamlistic  Рік тому

      @@nayanvats3424 Most likely not in the very near future, but they are on my list. Probably at some point I will make a whole series about speech processing as I did for object detection. :)

  • @devdexvils7602
    @devdexvils7602 Рік тому

    tbh I am not so understand in the riangular bank (03:23), why we are not keep the highest frequency component?what does the mean Wh,k, Xh and the red dot on frequency?

    • @datamlistic
      @datamlistic  Рік тому

      Thank you for your question! Basically, you can see at 02:35 all the possible filterbanks we can apply to the spectrum. Some of them capture the highest frequency component.
      W_{h,k} - the weight given by filterbank k at frequency h in the spectrogram
      x_h - the power of the frequency h
      red dots - each discrete frequency in the signal
      I hope that this helps you in better understanding the video material. Please let me know if you have any other questions.

    • @devdexvils7602
      @devdexvils7602 Рік тому

      @@datamlistic I have learn from many tutorial so that I have made the summary, please correct me if something missing, btw my project is mel spectrogram, it is process before DCT part in MFCC.
      1. Pre-process the audio signal: Depending on the application, the audio signal may need to be pre-processed before computing the Mel spectrogram. This can include steps such as removing silence or background noise, downsampling the signal, or normalizing the signal level.
      2. Compute the STFT of the audio signal: To compute the STFT of the audio signal, the signal is divided into overlapping frames using a window function. The Fourier Transform is then applied to each frame to compute the frequency spectrum of the signal at that point in time.
      3. Convert the frequency spectrum to the Mel scale: The frequency spectrum is typically expressed in Hz, but the Mel scale is based on the way that the human auditory system perceives pitch. To convert the frequency spectrum to the Mel scale, the spectrum is multiplied by a scale factor that maps the frequencies to the Mel scale.
      4. Apply the Mel-weighted filter bank: The Mel-weighted filter bank consists of a series of filters that are spaced equally on the Mel scale. Each filter is designed to pass a specific range of frequencies and the output of each filter is a measure of the energy in the signal at those frequencies. The filters are typically designed using a triangular shape, with the center frequency of each filter being the peak of the triangle and the width of the triangle being determined by the bandwidth of the filter.
      5. Compute the Mel spectrogram: The output of the Mel-weighted filter bank is a set of filterbank energies, which represent the energy of the signal at each frequency band on the Mel scale. These filterbank energies can be plotted over time to create a Mel spectrogram, which is a visual representation of the frequency spectrum of the signal over time.
      6. Post-process the Mel spectrogram: Depending on the application, the Mel spectrogram may need to be post-processed before it can be used. This can include steps such as smoothing the spectrogram, applying a logarithmic scale, or normalizing the spectrogram.

    • @datamlistic
      @datamlistic  Рік тому

      ​@@devdexvils7602 That's pretty much the algorithm used to compute the Mel Spectrogram. The only remark I have is that you should make it a little bit more clear that at step 5 you apply step 4 on each frame resulted from step 2.

  • @profdrmea
    @profdrmea Рік тому +1

    Thank you for your effort👍

    • @datamlistic
      @datamlistic  Рік тому +1

      No problem. I hope you enjoyed the explanation.

  • @cgyh68748
    @cgyh68748 Місяць тому

    quick and simple!

  • @saranshduharia6156
    @saranshduharia6156 10 місяців тому +1

    Nh iutt

  • @billylee7758
    @billylee7758 Рік тому

    what is different MFCC and MFE ?

    • @datamlistic
      @datamlistic  Рік тому

      I've never personally used MFEs features, but as far as I am aware you just extract the frequencies and convert to the Mel Scale using the triangular coefficients (the first two steps in MFCC). You don't apply the logarithm and the (inverse) Fourier transformation again (the last 2 steps in MFCC).
      Please let me know if this info was useful.

    • @billylee7758
      @billylee7758 Рік тому

      @@datamlistic great answers. many thanks for you

    • @datamlistic
      @datamlistic  Рік тому

      @@billylee7758 My pleasure! :)

  • @anikaroy8311
    @anikaroy8311 6 місяців тому

    Amazing!

  • @shanybarhom4395
    @shanybarhom4395 Рік тому +1

    The best explanation I've heard 👏

    • @datamlistic
      @datamlistic  Рік тому +1

      Sweet thanks! I am happy you found it helpful! :)

  • @fernandovldrs
    @fernandovldrs Рік тому

    Great video

  • @FreehuntX93
    @FreehuntX93 Рік тому

    too complicated 😞

    • @datamlistic
      @datamlistic  Рік тому

      Which part do you think is too complicated?

    • @FreehuntX93
      @FreehuntX93 Рік тому +1

      @@datamlistic The math behind it. Would be cool to make it understandable with low math knowledge. But nvm 😅

    • @datamlistic
      @datamlistic  Рік тому +2

      @@FreehuntX93 Well, to understand MFCCs you need to be familiar with how the Fourier Transform works. You can take a look here to get a better intuition about it: ua-cam.com/video/spUNpyF58BY/v-deo.html. The maths should come easier afterwards.

    • @root55
      @root55 Рік тому

      ​@@datamlisticthis link helped a lot. Thanks

    • @datamlistic
      @datamlistic  Рік тому

      @@root55 Glad it was helpful!