How to Extract Spectrograms from Audio with Python

Поділитися
Вставка
  • Опубліковано 10 вер 2024
  • Learn how to extract spectrograms from an audio file with Python and Librosa using the Short-Time Fourier Transform. Learn different types of spectrograms and compare the spectrograms of music in different genres.
    Code:
    github.com/mus...
    Join The Sound Of AI Slack community:
    valeriovelardo...
    Interested in hiring me as a consultant/freelancer?
    valeriovelardo...
    Follow Valerio on Facebook:
    / thesoundofai
    Connect with Valerio on Linkedin:
    / valeriovelardo
    Follow Valerio on Twitter:
    / musikalkemist

КОМЕНТАРІ • 86

  • @professorbalthazar82
    @professorbalthazar82 3 роки тому +9

    All the videos in this series are very helpful, well made and well explained, thank you so much!

  • @ang3dang2
    @ang3dang2 Рік тому

    You are going to put so many lecturers - top universities inclusive, out of job!

  • @nmirza2013
    @nmirza2013 Рік тому

    Such a comprehensive yet easy to understand series, hats off

  • @faramirchevlonski6152
    @faramirchevlonski6152 3 роки тому +1

    The best explanation in all the internet! Thanks man.

  • @rekreator9481
    @rekreator9481 3 роки тому +5

    Is there any possibility to recreate audio after we get the log-amplitude spectrogram? Ofc, first we would convert dB back to power for what we have a function in Librosa, but what then? How to invert the "np.abs(S_scale) ** 2" part back to audio?

    • @gvcallen
      @gvcallen Рік тому +1

      once you've taken the magnitude, you unfortunately lose the phase information and therefore lose information in general. you would have to have stored the phase somewhere

  • @zahraroozbehi734
    @zahraroozbehi734 3 роки тому +5

    Thank you so much for your fascinating course.
    At about 7:33, when you explain how to get #frames, which is 342 here, I cannot calculate it myself based on the formula in the last video:
    #frames = ((#samples(of scale array)-FRAME_SIZE)/HOP_SIZE)+1 = ((174943-2048)/512)+1 = 338.68 and not 342.
    Can you please clarify me?
    Thank you
    Zahra

  • @ourissueanniversary
    @ourissueanniversary Рік тому

    Extremely clear explanation!
    Thanks a lot!

  • @Sumit_Sharma_Music
    @Sumit_Sharma_Music Рік тому +2

    Thank you for making these amazing videos and putting so much effort. This series has cleared my doubts better than any signal processing and audio processing videos. I have one question: In this plot at 12:48, you are computing the spectrogram for the square of the amplitude, if I plot not the square of the amplitude, I am able to visualize the frequencies. In the previous videos shows in the series, we are computing the fourier transform given the amplitude (while computing full fourier transform).

  • @ryandaputra2056
    @ryandaputra2056 3 роки тому +1

    MAKASIIH BG, SUDAH MEMBANTU TUGAS SAYA

  • @saucyyy4508
    @saucyyy4508 Рік тому

    This helps so much for my final project idea thank you!

  • @juanhernandez-up4pg
    @juanhernandez-up4pg 2 роки тому

    thank you for this video, your a real hero.

  • @sharonmcgavin1666
    @sharonmcgavin1666 Місяць тому

    I notice in the librosa docs they don't square the magnitude but then use amplitude to db, does anyone know if this make any difference to the final results? Guess I'll have to try to understand everything properly later! Great video!

  • @giavinhvoquang4456
    @giavinhvoquang4456 2 роки тому

    Thank you so so much for your dedication.

  • @Underscore_1234
    @Underscore_1234 3 місяці тому

    Hi, great video again. Could anyone explain me why we square after the np.abs(Y)? not using it doesn't change much the result since we'll use logarithmic scales, but is it correct to actually use it?

  • @riobale
    @riobale 3 роки тому +2

    Hi Valerio, in the above spectrograms there is always a strong and constant low frequency component. What does it depend on? Is it relevant or is it just an artefact? Thank you

  • @ronespu
    @ronespu 3 роки тому

    Why do I have this error when I try to run "Ipd.Audio(scale_file)" : Value Error: rate must be specified when data is a numpy array or list of audio samples

  • @thuancollege5594
    @thuancollege5594 Рік тому

    very helpful. Thank you very much!

  • @user-co6pu8zv3v
    @user-co6pu8zv3v Рік тому

    Thank you :)

  • @habibrekik4009
    @habibrekik4009 2 роки тому

    Hey Valerio, i tried to make it on an audiofile that i got but i have an error; it sais to me that this "figsize" is not defined... can you please help :)

  • @enkaibi2756
    @enkaibi2756 4 роки тому +1

    Why the time of the spectrum is only lasting a few seconds while the raw audio is a few hours long?

  • @user-nw7rv3oq3p
    @user-nw7rv3oq3p 6 місяців тому

    Why does the debussy sound file does'nt look like a copy from the center?

  • @lahcenekabour7542
    @lahcenekabour7542 3 роки тому

    thanks for your video, very helpfull and well explained !!!!!!

  • @cloudhuang700
    @cloudhuang700 3 роки тому

    Quick questions, What is the purpose of doing Y_scale = S_scale ** 2 ? Why 2 but not a different number ? What effect does this power parameter has on the generated spectrogram ?

  • @nikkatalnikov
    @nikkatalnikov 3 роки тому +1

    Great video!
    What is the advantage of power magnitude vs. magnitude, e.g. why doing np.abs(S_scale) ** 2, not np.abs(S_scale)?

  • @Pianistprogrammer
    @Pianistprogrammer 3 роки тому +1

    Can you please explain similarity matrix if its possible with python?

  • @leonardofreua3084
    @leonardofreua3084 4 роки тому

    Excellent, thanks for this video.

  • @shubhamkapoor5152
    @shubhamkapoor5152 Рік тому

    Hi how do we do this if we have 150 audio files?

  • @sandipandhar1668
    @sandipandhar1668 4 роки тому +2

    Great content

  • @dr.nandkishordhawale3
    @dr.nandkishordhawale3 Рік тому

    Hello! The series is extremely interesting. Thank you for creating a channel and sharing the knowledge with theory and hands-on. Between, I want to report a tiny error. In this track around the time 3:15, the audio track been played via Ipython was not audible. The same problem happend when you played the same audio in one your previous video. Thank you.

  • @sohampatil6539
    @sohampatil6539 3 роки тому

    Thank you thank you thank you!!!!!!!!

  • @girishrane9926
    @girishrane9926 4 роки тому

    Thank you sir🥳🙌🙌

  • @Trivimania
    @Trivimania 2 роки тому

    Sir is it also possible to save such a spectogram to an image file?

  • @MsBalajiv
    @MsBalajiv 3 роки тому +1

    Hi Valerio,
    Thanks for this video. Very useful.
    At 7:30 of this video, displaying the shape of stft output matrix, I feel that the #frames is calculated as (samples/hopsize)+1 in my example code. I understand the equation in your previous video but the librosa output is slightly different.
    In my example, samples=220100, Frame_size = 512, Hop_size = 160
    Output STFT matrix has second dimension: 1379
    Can you please clarify.

    • @sac6496
      @sac6496 3 роки тому +3

      From the librosa source code says that pad_mode is 'reflect' and place center If unspecified.
      if center:
      y = np.pad(y, int(n_fft // 2), mode=pad_mode)
      so both n_frames calculation Valerio said and librosa have same result.
      Your samples + padding = 220100 + (256*2) = 220612, n_frames = (samples - frame_size)/hop_size + 1 = (220612-512)/160 + 1 = 1376, and I think your **second dimension: 1379** maybe type wrong. My computation is (257, 1376).

  • @seathru1232
    @seathru1232 2 роки тому

    Dear Valerio, how can I convert from magnitude spectrum (therefore, with ^2 coefficient) back to audio? The istft requires complex numbers, but we lose track of them with the magnitude.

    • @codeparity
      @codeparity 2 роки тому

      just save the variable before doing magnitude

  • @roaaeb1894
    @roaaeb1894 3 роки тому

    Hi Valerio, I want to extract features for set of video that is stored in folder and number of audio file is 200.How can I load audio files and apply the feature extracting in all files at the same time.What is the format for the code that is contains the all audio files???

    • @debabratagogoi9038
      @debabratagogoi9038 2 роки тому

      I am also looking for the same, to extract spectrogram from the large number of audio files. did you get any solution that I can follow

  • @ont7126
    @ont7126 2 роки тому

    Hi, i got confused a bit. In the DL series we compute spectrogram with spectrogram = np.abs(stft) and log_spectrogram = librosa.amplitude_to_db(spectrogram). Is there any difference with the way we compute it in this video (spectrogram=np.abs(stft)**2 and Y_log_scale = librosa.power_to_db(Y_scale) )?

    • @ValerioVelardoTheSoundofAI
      @ValerioVelardoTheSoundofAI  2 роки тому

      In the first case, we have an amplitude spectrogram. In the second, a power spectrogram.

    • @ont7126
      @ont7126 2 роки тому

      @@ValerioVelardoTheSoundofAI Thank you! But still i can't understand when we use the first case and when the second one, since you call both as spectrograms

    • @ValerioVelardoTheSoundofAI
      @ValerioVelardoTheSoundofAI  2 роки тому

      @@ont7126 it depends on the task. Sometimes, amplitude spectrograms work better than power spectrograms. Sometimes, it's the reverse. Unfortunately, there isn't a "rule". You'll have to try both of them and see which representation works best for your problem.

    • @ont7126
      @ont7126 2 роки тому

      @@ValerioVelardoTheSoundofAI Ok, got it ! Since i am new to speech recognition and still practicing on datasets, would it be better to use scipy.signal.spectrogram() instead ?

  • @desryan1603
    @desryan1603 3 роки тому

    Was there a reason why you changed from using a continuous colour scale in the first (non-log) plot to a diverging scale for the log plot?

    • @desryan1603
      @desryan1603 3 роки тому

      I suspect it demonstrates harmonics better than with the contiuous scale.

  • @seeking9145
    @seeking9145 Рік тому

    Which piece was it from debussy?

  • @0Freguenedy0
    @0Freguenedy0 3 роки тому

    are we only able to work in .wav with librosa? I've been using only .wav so far.
    If it's needed, i'll convert mp3s to .wav using pydub. I'll try that next week

  • @nezardasan5015
    @nezardasan5015 4 роки тому

    Very very usfull thank you

  • @blaze-pn6fk
    @blaze-pn6fk 4 роки тому +1

    Can you make videos on voice cloning?

    • @ValerioVelardoTheSoundofAI
      @ValerioVelardoTheSoundofAI  4 роки тому +2

      Thank you for the suggestion! I haven't planned to cover this topic soon, but I'll put it in my backlog as it's quite interesting.

  • @subhashachutha7413
    @subhashachutha7413 4 роки тому +1

    Do on mfcc also . Upto now no good resource on it please do on mfcc also🙏

    • @ValerioVelardoTheSoundofAI
      @ValerioVelardoTheSoundofAI  4 роки тому +1

      I've planned to cover MFCCs (theory + code) over the next few weeks. Stay tuned!

    • @subhashachutha7413
      @subhashachutha7413 4 роки тому +1

      @@ValerioVelardoTheSoundofAI thank you and waiting for video

  • @basmam8542
    @basmam8542 2 роки тому

    nice hair style :)

  • @achalcharantimath5603
    @achalcharantimath5603 4 роки тому

    hii, valerio i have a question if we have 100 genres with 100 training examples each then how to take the spectrograms, and store them? (or is there any way to generate img data and input at the run time)and each spectrogram will have varying dimensions then how to get uniform input for the network to train? will use of rectangular windows of the spectrogram be better for training , can you suggest some links for reading more about audio augmentation.
    Y_scale.shape is a 2d array is enough for training or we need the rgb version of it, which is more efficient
    please have look at this kaggle competition this might interest you , for bird audio recognition what should the input be spectrogram or melspectrogram
    www.kaggle.com/c/birdsong-recognition

    • @ValerioVelardoTheSoundofAI
      @ValerioVelardoTheSoundofAI  4 роки тому

      Achai you've asked a lot of good questions! I have a couple of videos in "DL for Audio with Python" that has a similar use case to yours, i.e., 100 samples in 10 musical genres. You can refer to those videos.
      It's important that you always have the same input shape. For that, you can segment the songs as to have the same number of samples. Usually for music genre classification 15-30 sec worth of audio should be good.
      If by "rectangular windows of the spectrogram" you mean applying a Mel filter bank, you're on the right path. Mel Spectrograms, or even better CQT, are valuable approaches when dealing with music data.
      For training, you'll need the equivalent of a grey scale image, in case you decide to go with CNN architectures -- which I suggest you to do. In other words, you'll have to add a 3rd dimension. Once again, you can refer back to my videos in DL for Audio to see how to do that.

    • @achalcharantimath5603
      @achalcharantimath5603 4 роки тому

      @@ValerioVelardoTheSoundofAI
      Hii, I have been watching the series(DL for Audio with Python) and referring to it a lot, thank you for this channel,
      so for CNN we have to set the third dimension as 1 right is that what you meant , like (120,120,1) adding the other dimension, by windowing i meant if there is a 1 minute call of bird then taking 5 second input next 5 second input so on , is there any way to do that in the spectrogram ?

    • @ValerioVelardoTheSoundofAI
      @ValerioVelardoTheSoundofAI  4 роки тому

      @@achalcharantimath5603 (120, 120, 1) works. You can use the whole 1 minute worth of audio, or segment it in, say, 15 seconds chunks.

  • @nowrozimohammad
    @nowrozimohammad 3 роки тому

    How can I get your code?

  • @schibob1
    @schibob1 3 роки тому

    Thank you so much for the great content!
    I followed every video up to now and learned so much
    At this point is the first time I have a problem that I can't solve:
    In Sublime Text when I use the plot_spectogram function there is no spectogram window popping up as usual. If I put a print() before that (I don't know if that would be the right way to do it) the output shows "None". Apart from that no errors are occuring. Does anyone know how to visualize the spectogram in Sublime text?
    Hope that someone knows the solution to that. Thanks in advance :)

    • @idontevenknow3707
      @idontevenknow3707 2 роки тому

      getting the same issue, did you end up solving it? thanks

    • @idontevenknow3707
      @idontevenknow3707 2 роки тому

      ah, found it. add plt.show() at the end of the function!

  • @iioiggtrt9085
    @iioiggtrt9085 4 роки тому

    how to save this after processing in csv file

    • @ValerioVelardoTheSoundofAI
      @ValerioVelardoTheSoundofAI  4 роки тому

      You could use Pandas for that. It has a super convenient to_csv function.

    • @iioiggtrt9085
      @iioiggtrt9085 4 роки тому

      @@ValerioVelardoTheSoundofAI can make video for all there is no resources as you know in internet it will be refrence thanka alot

    • @iioiggtrt9085
      @iioiggtrt9085 4 роки тому +1

      from scratch build a dataset from audio and save it as a CSV file that will be great I appreciate that effort

    • @ValerioVelardoTheSoundofAI
      @ValerioVelardoTheSoundofAI  4 роки тому

      @@iioiggtrt9085 I'll put this in my backlog! Thank you for the suggestion :)

  • @drjfilix3192
    @drjfilix3192 3 роки тому

    Hi, Valerio, do you speak italian ? ;-)

    • @ValerioVelardoTheSoundofAI
      @ValerioVelardoTheSoundofAI  3 роки тому

      Yes, I'm Italian

    • @drjfilix3192
      @drjfilix3192 3 роки тому

      @@ValerioVelardoTheSoundofAI Perfetto! Vorrei chiederti 1 milione di cose ! :-D Intanto inizio a guardare i tuoi video, che mi interessano moltissimo!

  • @foxyonfire
    @foxyonfire 8 місяців тому

    0:16 debussy

  • @noahdrisort2005
    @noahdrisort2005 2 роки тому

    you were too young with new hair :))