11- Preprocessing audio data for Deep Learning

Valerio Velardo - The Sound of AI

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 17 січ 2025

КОМЕНТАРІ • 226

@ValerioVelardoTheSoundofAI 4 роки тому ⁺²⁹
I now have a full series called "Audio Signal Processing for Machine Learning", which develops the concepts introduced here in greater detail. You can check it out at
ua-cam.com/video/iCwMQJnKk2c/v-deo.html
@subramanyabhattm4626 3 роки тому ⁺¹
Sir how to understand the mfccs like which mfcc to consider and which one to leave.
@alexey7249 2 роки тому ⁺³⁸
For who is learning course in 2022 - a name of function "waveplot" was changed to "waveshow".
@iliasp4275 2 роки тому ⁺⁴
1:47 that song hits HARD
@yusufcan1304 9 місяців тому
hahaha :D
@sanzhik_sanziu 7 місяців тому
Там есть звук или это просто файл wav без звука?
@kaushilkundalia2197 4 роки тому ⁺²⁸
I'm so glad I found this series. Great quality content (Y)
@ValerioVelardoTheSoundofAI 4 роки тому ⁺²
Thank you!
@casafurix 3 роки тому ⁺³
this is so well-explained, helps me entirely for the project i'm working on! i can never thank you enough for making all these videos, you deserve the best!
@ValerioVelardoTheSoundofAI 3 роки тому
Thanks!
@eriklee1131 4 роки тому ⁺³
Great video! I like how you stepped through everything and the code in the video works
@ValerioVelardoTheSoundofAI 4 роки тому
Thanks Erik!
@SabriCanOkyay Рік тому
The series is awesome. And 😭at 1:50 . Love you bro!
@SubtreX 4 роки тому ⁺⁸
Finally found exactly what I was looking for. Great explanations! ❤
@ValerioVelardoTheSoundofAI 4 роки тому
Thanks!
@WarshaKiringoda 4 роки тому ⁺⁴
This channel is a Gem!! Thank you for putting out these tutorials. Keep going!
@javadmahdavi1151 3 роки тому
The best instructional video I've ever seen, even better than college ❤❤❤❤
@abhishekdileep5950 2 роки тому ⁺¹
awesome series, deserves more recognition !!!
@annasultubo 3 роки тому
Sto scrivendo la tesi grazie a te, ti devo una cena! Amazing Job
@ValerioVelardoTheSoundofAI 3 роки тому
In bocca al lupo ;)
@jaychen1116 5 років тому ⁺³
Thank you for the wonderful work. If you can make a series of audio signal processing, that would be great. Have a nice day!
@ValerioVelardoTheSoundofAI 5 років тому
Thank you for the feedback!
@maddonotcare 4 роки тому ⁺²
i second this!!
@meetgandhi8782 4 роки тому
Yeah, I would be very helpful if you made a video series explaining these different DSP methods.
@frekehkhouri 3 роки тому
You are amazing! A real music and deep learning wizard!
@ValerioVelardoTheSoundofAI 3 роки тому
Thank you Fabio :)
@VishwaAbeywardana 2 роки тому
hello what is the python version you use in this tutorial
@EliRifle 4 роки тому ⁺¹
Extremely helpful! You are the best
@ValerioVelardoTheSoundofAI 4 роки тому
Thanks!
@aavash_bhattarai Рік тому ⁺³
If anybody has an issue on the librosa.feature.mfcc() line (mfcc takes 0 positional arguments but 1 positional argument...)
make sure you add "y=" before signal that is
MFCCs = librosa.feature.mfcc(y = signal, n_fft= n_fft, hop_length = hop_length, n_mfcc= 13)
Hope this helps
@Tvisha-kt3oy 5 місяців тому
Thanks for the help! Was a little confused when code wasn't working!
@monkedelufi6106 5 місяців тому
thanks a lot bud
@cromie_ Місяць тому
thanks dude!
@manpreetkaur8587 3 роки тому
This is helping me in my capstone masters project. Thank you so much.
@sahiljain3083 3 роки тому
The same reason i was searching the concepts then this channel showed up.
@aendnouseforalastname8318 4 роки тому ⁺¹
So glad I found this! You do an amazing job!
@ValerioVelardoTheSoundofAI 4 роки тому ⁺¹
Thank you Aend!
@rainerzufall1868 4 роки тому ⁺²
Incredible channel! Please keep going!!!
@tejakayyala5795 3 роки тому
Love you so much sir....No words...
@missmudassarnizam5158 4 роки тому
thankyou so much ... i m so happy you make this video .you make my work easier
..
@ValerioVelardoTheSoundofAI 4 роки тому
Glad I've been useful! Have you seen my new series Audio Sianal Processing for ML? It runs very deep into these and more topics in audio processing.
@joeyg4008 4 роки тому
20:16 With this graph, how would you display the number of seconds on the x-axis and the range of frequencies on the y-axis?
@mohammadareebsiddiqui5739 5 років тому ⁺²
The series so far was very well explained and paced but personally I would've wanted a little more detailed explanation of MFCC as it would be the most important thing we are going to use in the nn right? If there are some resources you can recommend it'd be really appreciated!
@ValerioVelardoTheSoundofAI 5 років тому ⁺³
Thank you for the feedback! I get your point. But I made the choice not to get into the algorithmic/mathematical details of MFCCs because it's a quite complicated topic that would probably derail too much from the focus on deep learning. As I mentioned in the videos, if I see enough interest I may create a whole series on audio DSP. There, I'll definitely go into the nitty gritty of MFCCs and the Fourier transform. On this point, would you be interested in a series on audio digital signal processing?
As for this course, I don't think using MFCCs as a black box is going to be detrimental for DL applications.
As for extra resources on MFCCs, I suggest you take a look at this article: practicalcryptography.com/miscellaneous/machine-learning/guide-mel-frequency-cepstral-coefficients-mfccs/ It's a friendly intro into the concept. Hope this helps :)
@mohammadareebsiddiqui5739 5 років тому
@@ValerioVelardoTheSoundofAI I would definitely be interested in a series of audio DSP; although, as an DL enthusiast I would love that if the topics covered in that series would circle around their significance in DL somehow?
Also thank you so much for the article!
@ValerioVelardoTheSoundofAI 5 років тому ⁺¹
@@mohammadareebsiddiqui5739 that could be interesting...
@javadmahdavi1151 3 роки тому
this is so good , what do you suggest to implement this things?
i very excited to read that book and learn sounds in deep learning
thank you so much
@oblivion962 2 роки тому
Please make a tutorial on Anomaly detection on raw sound
@yannickpezeu3419 3 роки тому
Amazing clear content. Thanks a lot !
@Sawaedo 3 роки тому ⁺¹
Thank you!
@3arabs4 4 роки тому
Just amazing content, you are a live saver.
@ashokdhingra4 3 роки тому
Your wonderful videos are helping me in my PhD on Indian Vocal Music. But alas, no videos on Indian Classical Vocals.
@ValerioVelardoTheSoundofAI 3 роки тому
That may come at some point in the future. Stay tuned!
@9b177-becc5 4 роки тому
This is what i've been looking for.
@ValerioVelardoTheSoundofAI 4 роки тому ⁺¹
Thanks Roner!
@9b177-becc5 4 роки тому
@@ValerioVelardoTheSoundofAI does reslly hard manage music data to make animations or reactions with it? Im really really new on this thing about music spectre and ML
@ValerioVelardoTheSoundofAI 4 роки тому
@@9b177-becc5 not really. As long as you have audio parameters (e.g., loudness, chroma, beat), you can map them to different elements of an animation.
@markosklonis 4 місяці тому
having an issue with librosa missing _soundfile_data module when i try load the song
@yusufcan1304 9 місяців тому
we couldn't heard the song but it is super cool video.
@Hiyori___ 3 роки тому
Amazing tutorial, thank you
@arbigobiaalu Рік тому
Can you update the code for present versions of matplotlib and librosa.
@evicluk 4 роки тому ⁺¹
I think I am finishing my master degree with you😂 , thank you for your amazing job!
@rekreator9481 4 роки тому
Bachelor degree here but same.. I guess Im low on time as I see the complexity tho :DD When is your due date and how far are you? :DDD
@ValerioVelardoTheSoundofAI 4 роки тому ⁺²
I'm happy the videos can help :)
@evicluk 4 роки тому
It's not in a hurry. I tried to do build a music recommender system, and the music classification is the first step. I tried several models and tried to optimize it.
@rekreator9481 4 роки тому
@@evicluk Do you use tensorflow or pytorch?
@evicluk 4 роки тому
@@rekreator9481 tensorflow
@alua6916 3 роки тому
Hello, thank you very much for this tutorial. What if i have problems with numpy? My ide tells there is a mistake
@jecellepaculba8673 2 роки тому
hello, do you know how to classify extracted features of Audio from MFCC to SVM?
@captcannoli7293 4 роки тому
great series! I have learned much more from this that other courses that have cost me alot of money. I have one question if you could help. What is the number of features you are extracting to use in the NNs in the series? it wasnt very clear in the videos.
@rafiyandi4654 2 роки тому
what version of python are you using ?
@nmdhawale Рік тому
While plotting the power spectrum we take only half of the data (left) after doing fft? Then how come we dont do the same while plotting data based on sfft? 😮
@rishirajput6691 3 роки тому
valerio could you please give some code for removing silence from whole audio data, please guide
@Magistrado1914 4 роки тому
Excellent course
14/11/2020
@carlraywairata5291 4 роки тому
hi valerio, I can't show the wave image can you help me?
@jawadmansoor6064 2 роки тому
8:30 length of 'signal', 'magnitude' and 'frequency' is the same.
Why is frequency increasing from beginning towards the end, it should be increasing and decreasing at different times and it should not only increase over time. What am I missing?
9:29 by lower frequency I understand the left most area in the graph. But we see the same height of energy/magnitude towards the end (right side) of the graph. But you say that "the higher we go with frequency the less contribution they will give us". I don't understand the graph this seams.
@consultingprestig2096 Рік тому
Thanks for this tutorials ! I want to aks question : What you not use notebook ? i'am using notebook with vscode ".ipynb " file extension its just pratical no ? Good luck
@michalkmicikiewicz4391 3 роки тому
Shouldn't we multiply the magnitude by 2 when narrowing the power spectrum plot to the Nyquist frequency?
@lakshaysharma1888 2 роки тому
how to use that blues.0000.wav file for running the code many error are coming
@hafizhashshiddiqi2988 2 роки тому
WOW! this lesson totally good, are we done for preprocessing audio thing with that? coz i wanna build a system for speaker recognition and need to learn how to build a. model, one of the step to do preprocessing for the audio, this video very helpful if we are done for preprocessing audio thing, and what do we do after this? or maybe if we want to representation the result using numeric, not visualizing, thank you
@MattsThe1991 4 роки тому
Thanks man! Wonderful videos
@ValerioVelardoTheSoundofAI 4 роки тому
Thank you!
@ahmedkhateeb8178 3 роки тому
Thankyou for spreading the knowledge. I've a question tho, If I want to make a source separation kind of application. should I use mel scale spectograms or should I opt better for other time series representations like Gramian matrices and Markov transtions?
@birukabera465 4 роки тому
Thank you for this wonderful video lecture, I am working on lung sound analysis. would you show us also how to implement wavelet analysis particularly discret wavelet transform as like FFT, STFT, MFCCS?
@ValerioVelardoTheSoundofAI 4 роки тому ⁺¹
Glad you liked it Biruk! I'm planning to start a whole series on audio/music processing over the next few weeks. Stay tuned :)
@tolouamirifar1913 4 роки тому ⁺¹
Hi Valerio, thank you so much for your amazing videos. I am doing an emergency vehicle siren detection by deep learning, I divided my data into emergency and non-emergency and used band-pass filter to remove the noise. Now I have doubt that should I implement this filter on just emergency audio files or on all the data (emergency and non-emergency). I would be grateful if you could guide me on this.
@ValerioVelardoTheSoundofAI 4 роки тому ⁺¹
You should apply the same preprocessing on all the data you train on.
@tolouamirifar1913 4 роки тому ⁺¹
@@ValerioVelardoTheSoundofAI thank you so much
@gs1619 4 роки тому
Nice work broo! I have a question tho. Is it alright to have negative MFCCs? Btw I am using RAVDESS dataset.
@ValerioVelardoTheSoundofAI 4 роки тому
It's totally fine to get negative MFCCs. Stay tuned for my coming videos in the "Audio Processing for ML" series on MFCCs to learn more ;)
@achmadarifmunaji3320 4 роки тому
What is the number of hop_length for voice recognition?
@JustinMitchel 5 років тому ⁺¹
Nice work! Keep it up.
@ValerioVelardoTheSoundofAI 5 років тому
Thank you!
@LucasSilva30 3 роки тому
Valerio, first of all, congratulations for your excelent job! I am learning so much from you!
Secondly, can you explain how to load mp3 files with librosa? From what I read on the documentation, installing ffmpeg should solve it, but it did not.
Thank you!
@ValerioVelardoTheSoundofAI 3 роки тому
Thank you! Please refer to this thread: github.com/librosa/librosa/issues/945
@mutiahrisky1698 4 роки тому
Great video, thanks
@blaze-pn6fk 4 роки тому
really great series !!
@oroneki 4 роки тому
Again! This is just awesome!
@ValerioVelardoTheSoundofAI 4 роки тому
And thank you again :)
@kacemichakdi3048 2 роки тому
Hi sir
thanks for this video
i just want to know how can we play the audio in python and listen to it from this form (signal , sr = l.load(file))
@kiran23500 4 роки тому
great series man,thank you.can we differentiate human voices by using the mel spectrums?if yes can u please tell me how?ur reply would be helpful.
@ValerioVelardoTheSoundofAI 4 роки тому
Yes, you can use MFCCs for speaker identification. The process is similar to that I've used for genre recognition in the following videos. Check those out!
@lindascoon4652 4 роки тому
Thank you so much for ur videos. I have a question regarding processing of audio. If I want to classify a bell that rings for less than a sec then stops for some time, do I have to collect the audio of the individual rings and cut out the silences or can I use a longer audio of the bell ringing and stopping ?
@ValerioVelardoTheSoundofAI 4 роки тому
You can use the long sample. Hopefully the algorithm will figure that out!
@saranyaasuresh5710 2 роки тому
Hello Valerio
Your videos are very helpful to learn about audio signals processing in AI. Am learning about AI and the theory you have explained are easier to grab. Thank you for such great lessons.
I have a doubt as input you have been using .wav file, which is uncompressed, thus the file size will be large. Can you tell me what method can be used to process the audio file with best quality and without losing information.
@ValerioVelardoTheSoundofAI 2 роки тому
Thank you! There isn't an ideal solution to compress audio files and not lose information. WAV (loseless) is the best. Many AI music applications won't be affected negatively if you use MP3s instead.
@swapnilbhabal5289 3 роки тому
Can anyone send the link for the music dataset of popular / hit songs please
@alonalon8794 4 роки тому
great content. btw, how can i download the wav file?
@alonalon8794 4 роки тому
it seems as if it can't be downloaded from the github link u published. is there another place to download it from?
@ValerioVelardoTheSoundofAI 4 роки тому ⁺¹
Thanks! I think I used a piece classified as blues from the GTZAN dataset. You can search for the file with the same name in the dataset. I provided the link to download GTZAN in a previous video in the series.
@Fil0125 3 роки тому
Ok bro but, what I need to pass on my input to teach my neural network?
@jaehwlee 4 роки тому
Thank you for posting this wonderful video.
I'm working on a Toy project where I search for music with humming, is it right to use Mel Spectrogram? I don't know if CQT is more appropriate. I would appreciate your reply.
@ValerioVelardoTheSoundofAI 4 роки тому ⁺²
You can definitely give a try to Mel Spectrograms. Try to focus on intervals instead of absolute pitches as people with no absolute pitch (i.e., the overwhelming majority) can hum the intervals which make up a melody, but not necessarily in the right key. Focus only on monophonic music (i.e., a vocal melody). The generalisation is a way harder problem. Hope this helps!
@apoorvwatsky 4 роки тому
Amazing series.
A question, frequency and magnitude are numpy arrays of size > 661000 respectively.
But while plotting, the x-axis (denoting frequency) scales itself to the sample rate which is 22050. Why so? I'm talking about the spectrum plot here.
@sidvlognlifestyle 2 роки тому
You help me like God send you to help me .... After 3 days i have submission till preprocessing 😅 Thank you so much .... Please make a video on How to build a accurate model for audio signal ❤️
@midhunsatheesan5717 4 роки тому
This was a brilliant video. I have a query which I would like to shoot. Don't know if it's answered in the next set of videos.
Does it matter if the time span of each clips are different in the dataset?
Do the same principles applied here apply to any audio eg : animal audios, scream detection?
How to deal with noise?
@ValerioVelardoTheSoundofAI 4 роки тому ⁺¹
1- If you're using a CNN architecture you need to have all data samples with the same duration. To obtain this, you should segment clips with different duration (e.g., 10 secs clips).
2- Yes, you can transfer the same approach used here to other audio domains.
3- If you're using a DL approach, the network should be able to learn to deal with noise automatically.
If you'd like to learn more about these topics, I suggest you to check out my series "Audio Signal Processing for ML".
@midhunsatheesan5717 4 роки тому
@@ValerioVelardoTheSoundofAI Thanks Valerio. I'm watching the signal processing series too..! Another query I have.. There is another library called kapre. That one seems like it's built upon Keras. How do you think it compares with librosa? Kapre seemed very easy with just additions of layers to the model. I'm not sure if it can do everything that librosa can.
@ValerioVelardoTheSoundofAI 4 роки тому
@@midhunsatheesan5717 Kapre is great if you want to extract spectrograms computing FT on GPU. However, it can't do many things that librosa can. So if you plan to use basic audio features used in DL go with Kapre. Otherwise, go with librosa :)
@achmadarifmunaji3320 4 роки тому
Can we use a full 10 minute wav file as an example or do we need to cut the file into pieces in preprocessing?
@ValerioVelardoTheSoundofAI 4 роки тому
It depends on the application. 10' is probably too long. I would suggest segmenting the files.
@achmadarifmunaji3320 4 роки тому
@@ValerioVelardoTheSoundofAI thanks
@datascienceed3069 4 роки тому
Thanks 👍👍👍
@rutiksansare3644 3 роки тому
sir im making a project on attendance system using voice. which python modules should i use ?? which algorithms should i use???
@JamesSmith-vu7io 3 роки тому
Hello Valerio, have you ever extracted ivector from audio clips? I am trying to find documentation on it but am struggling. Your advice would be greatly appreciated
@achalcharantimath5603 4 роки тому
hii Valerio,if we have training data in mp3 format, is it important to convert mp3 files to wav files for training ? will it improve performance
@ValerioVelardoTheSoundofAI 4 роки тому ⁺¹
Don't worry about mp3 files. With Librosa you can directly load them, without the need to convert them to wav files first.
@bogdandaniel5426 3 роки тому
How can i fix this error "UserWarning: PySoundFile failed. Trying audioread instead.
warnings.warn("PySoundFile failed. Trying audioread instead.")"?
@massimomontanaro 4 роки тому
Hi Valerio, I've been looking for resources on how to deal with deep learning and audio for some time without too many results, so I'm really grateful to you for sharing these videos! I would like to ask you if it is possible and how to recover the original signal from the spectrogram. I tried to use the inverse functions like librosa.db_to_amplitude and librosa.core.istft, but the output signal seems very bad. I think this happens because we truncate complex numbers for the construction of the spectrogram. Can you suggest me the right way?
@ValerioVelardoTheSoundofAI 4 роки тому
You're absolutely right! The issue with istft is that we ignore the phase. The audio result is somewhat problematic. Reconstructing audio from a power spectrogram is a major problem still actively researched. There isn't a simple solution I'm afraid :(
@massimomontanaro 4 роки тому
@@ValerioVelardoTheSoundofAI Yeah, i found the same answer in a research paper i was reading just now. Do you think a well trained LSTM autoencoder could approximate a better result? I mean, if we use these corrupted istft's outputs as input and the original waveforms as output, could we obtain a neural net that can reconstruct a better waveform? Or do you think it's only a waste of time? Thanks in advance for your attention!
@ValerioVelardoTheSoundofAI 4 роки тому
@@massimomontanaro mmh... this is a highly dimensional problem. You'll need a MASSIVE dataset to try to get something decent. It may be worth an experiment, but I wouldn't be super confident.
@latchireddiekalavya4683 3 роки тому
i have downloaded that audio file . but still it is showing error as
FileNotFoundError: [Errno 2] No such file or directory: 'blues.00000.wav'
sir solution please?
@emirkuralkocer1280 3 роки тому
Hello Valerio, I have 3 folders(go, yes, no) that contains 30 .wav files. Each folder has 10 wav files. How can I run this code with different 30 wav files?
@vishalr9972 3 роки тому
well made tutorial
@matrixlukan 3 роки тому
Now I know why fourier transforms is added to my degree syllabus
@AshwaniKumar04 4 роки тому
Thanks for the video.
One question: Should we always use SR = 22050?
@ValerioVelardoTheSoundofAI 4 роки тому
It depends on the problem. Most of the time sr = 16K is OK for sound/music classification problems.
@AshwaniKumar04 4 роки тому
@@ValerioVelardoTheSoundofAI Thanks for the reply. Does having a higher value increases the model accuracy?
@ValerioVelardoTheSoundofAI 4 роки тому
@@AshwaniKumar04 not necessarily. If most of the patterns for classification are in the lower frequencies having a high sr can actually be counterproductive.
@yacineboureghda1192 4 роки тому
thanks man u are a hero
@ValerioVelardoTheSoundofAI 4 роки тому
Thanks!
@yacineboureghda1192 4 роки тому
I'm working on an audio projet and your video help me a lot
@jessicabustos1262 4 роки тому
Hola Valerio. Un saludo desde Colombia. He estado viendo algunos videos tuyos referentes a MFCC pero como comprenderás, mi inglés es un humilde casi B1 y he estado poniendo subtítulos a tus videos; sin embargo, este no fue el caso :C porque no aparece la opción. Me encantaría que este video tuviera la opción de de los subtítulos, te quedaré muy agradecida. Quisiera saber qué hay después de la obtención de los MFCC, qué se debe implementar en Phyton para que finalmente tome la decisión de clasificar un sonido como Xo Y ?. Quedo muy agradecida con tu colaboración.
@marianofares9694 8 місяців тому
Hola me interesa esto que decis Jessica!
4 роки тому
Great videos man! Is there a way to make a database from audio files from metadata? Like labeling each file with BPM, Key, etc. But automatic, doing a database from scracth is going to take more than the coding itself lol
@ValerioVelardoTheSoundofAI 4 роки тому
Thanks! There are algorithms for extracting Key, BPM automatically. You'll then need to implement a DB and populate it with the metadata. The algorithms aren't perfect. They are also genre-dependent.
4 роки тому
@@ValerioVelardoTheSoundofAI i want to automatically sort my samples, and obviously experiment with NN and python, what approach would you recommend
@antunmodrusan828 3 роки тому
Thanks for the great content! :)
@barshalamichhane6761 4 роки тому
Hello sir from where do I get the audio file that you have used here? Would you please provide me the link?
@ValerioVelardoTheSoundofAI 4 роки тому ⁺¹
If I remember correctly, it comes from the Marsyas genre dataset (marsyas.info/downloads/datasets.html). I may have mentioned this in a previous video.
@barshalamichhane6761 4 роки тому
@@ValerioVelardoTheSoundofAI thank you :)
@henryjames6284 4 роки тому
Thanks you dude!
@ValerioVelardoTheSoundofAI 4 роки тому
You're welcome!
@achmadarifmunaji3320 4 роки тому
How long should the wav file be used in preprocessing?
@ValerioVelardoTheSoundofAI 4 роки тому
That really depends on the problem you're working on and your dataset. Let me give you a couple of examples. In music processing, we usually use 15'' or 30'' of a song to analyse it. In keyword spotting systems, you would often have 1-second long clips.
@achmadarifmunaji3320 4 роки тому
@@ValerioVelardoTheSoundofAI I'm working on ML for voice recognition using a data set that contains conversations in the form of a wav file. is there any suggestion you can give for a good duration .wav file for my problem?
Thank you for the advice
@dannyrodin1151 3 роки тому ⁺¹
There's no sound when you play that blues file.
@ValerioVelardoTheSoundofAI 3 роки тому
Yeah... I had to remove it for copyright reasons :(
@stipan11 4 роки тому
Hi there, i'm interested to know how can i clean my audio dataset(google speech commands) if it contains faulty audio. For example i should hear word "three" but there is too much noise or the word is cut in the middle of pronunciation so it just says "thh.."
Any idea how to get rid of those audio files and clean my dataset without doing it manually?
@Gileadean 4 роки тому ⁺¹
Could you make a video on the inverse functions? signal->stft->istft->signal works fine, but signal->stft->amplitude_to_db->db_to_amplitude->istft->signal results in a distorted signal. Same with inverse.mfcc_to_audio.
@ValerioVelardoTheSoundofAI 4 роки тому ⁺¹
This is a somewhat more advanced topic in DSP. I'm thinking of creating a series on audio DSP / music processing. I'll definitely cover the inverse functions in that series. Before engaging in the implementation, I'd like to dig deeper in the math behind FT/MFCC. You're totally right re the reconstruction of the signal from MFCCs. It's a long shot, and the result isn't that great.
@Gileadean 4 роки тому
@@ValerioVelardoTheSoundofAI I kinda get why we are losing information if we convert our spectrogram to a mel-spectrogram, but why are we already losing information when using amplitude_to_db on the stft? Isn't it "just" a log-function?
@ValerioVelardoTheSoundofAI 4 роки тому ⁺¹
@@Gileadean excellent question! I'm glad you've been playing around with these interesting DSP concepts :) Now, on to the answer. The STFT outputs a matrix with complex numbers. To arrive at the spectrogram, we calculate the absolute value of each complex number. This process removes the imaginary part of the complex values, which carries information about the phase of the signal. At this point you've already lost information! When you try to reconstruct the signal, the inverse STFT can't rely on phase information anymore. Hence, the somewhat distorted sound. As you correctly hinted to in your question, the conversion back and forth from amplitude to db doesn't loose any additional vital info. I hope this helps!
@Gileadean 4 роки тому
@@ValerioVelardoTheSoundofAI Thanks for your quick replies! I somehow missed the np.abs(stft) and the warning message that occurs when calling amplitude_to_db on a complex input (phases will be discarded)
@phamvandan760 4 роки тому
I think that the magnitude of the frequency in FFT is the module which calculate by np.absolute() not np.abs()
@ValerioVelardoTheSoundofAI 4 роки тому ⁺¹
np.absolute() and np.abs() are completely ideantical. You can use either one.
@phamvandan760 4 роки тому
@@ValerioVelardoTheSoundofAI Yeah, I see. Thanks.
@abhipanchal5681 4 роки тому
Hi there,
I want to do music semantic segmentation(intro, chorus, verse etc.). Could you please suggest me how should I label my audio data? and what features I should use for that?
@ValerioVelardoTheSoundofAI 4 роки тому ⁺²
The task you're referring to is called "music segmentation" or "music structure analysis". I'm assuming you want to work with audio (e.g., WAV) not symbolic data (e.g., MIDI). There's a lot of literature on this topic. The techniques that work best are based on music processing algorithms which don't involve machine learning. The high-level idea is to extract a chromogram, manipulate it, and use a self-similarity matrix to identify similar parts of a song. The book "Fundamentals of Music Processing" has a chapter that discusses music segmentation in detail. Here's a slide presentation that summarises that book chapter: s3-us-west-2.amazonaws.com/musicinformationretrieval.com/slides/mueller_music_structure.pdf Hope this helps :)
@sudhakarmadakaful 3 роки тому
Thank You :)
@fatimahalqadheeb 2 роки тому
I have an audio dataset, each audio file consists of letters that are spoken all at one time. How can I prepare these audio files for machine learning? I would like to have each letter in an audio file. if anyone has an idea please help.
@singhsa3 2 роки тому
Good Job
@greg73049 4 роки тому
Hi, thanks alot for these videos they are very useful.
I was just wondering if it would be beneficial to represent the frequency scale logarithmically as humans interpret sound in this way (since musical intervals/harmonics are represented by multiples of a frequency rather than an absolute difference). Are deep learning algorithms not trained with this scale since it mimics human hearing more?
@ValerioVelardoTheSoundofAI 4 роки тому ⁺²
Great intuition! You can take the logarithm of the spectrogram, or, apply Mel filterbanks, and arrive at the so called Mel Spectrogram. I have another series called "Audio Signal Processing for ML" that dives deep into all of these topics, if you're interested.
@thebinayak 3 роки тому
Hi Valerio, Thank you for your detailed explanation. I am sure like me thousands of others are benefitting from your videos. I understood everything in your video however I have one query, can we use log_spectrogram for deep learning instead of MFCCs? Or in other words, why do we only use MFCCs in deep learning? One more concern, I have audio data that is recorded in 44100Hz, can I use a sample rate of 44100 instead of 22050 (which you are using in this tutorial)? Thank you in advance.
@ValerioVelardoTheSoundofAI 3 роки тому ⁺¹
Mel-Spectrograms is the feature of choice in DL. Of course, you can use a RD of 44.1K.

Наступне

Автоматичне відтворення

12- Music genre classification: Preparing the dataset