Audio Classification with Machine Learning (EuroPython 2019)

Jon Nordby

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 30 січ 2025
Наука та технологія

КОМЕНТАРІ • 59

@jsbisht_ 4 роки тому ⁺⁶
Hi Jon. Great presentation. I am absolutely new to machine learning and found your talk really clear and useful. Thanks for sharing.
@sidalibourenane5377 2 роки тому
Hey Mr Hope you doing good !
Please Can you help me ? How Can we use speech recognition to detect falling in elderly people ?
Just another question how to combine audio with image to implement fall detection ??
Thank you
@Captura22 10 місяців тому ⁺¹
Hi Jon, I am doing a final year undergraduate project on bioacoustics, I am new to signal processing as well as your channel! I was just wondering - do you have a paper covering some of the stuff you've talked about, which I could reference?
@Jononor 10 місяців тому ⁺¹
Hi! Yes, this work is mostly in my master thesis. If you search Google Scholar for "Environmental sound classification on microcontrollers using Convolutional Neural Networks" you should find it. I would give you a link, but UA-cam tends to shadowblock messages with links...
@a2sirmotivationdoses782 2 роки тому
Respected Sir...
My project is to cancel the noise from audio... For this how can i train ML model? And how can i proceed for that plz help me....
@michaelwirtzfeld7847 4 роки тому ⁺³
Thank you. A very good presentation. Is Keras model code you showed (i.e. "block_1", "block_2", etc.) on a couple of your slides available in one of your GitHub repositories?
@Jononor 4 роки тому ⁺¹
Thank you Michael. Yes, all the Keras I tested in my thesis are in the following repo/folder. The one in question is probably in "strided.py" or "sbcnn.py"
github.com/jonnor/ESC-CNN-microcontroller/tree/0d3a1231831d3ee61c22a4f8b461a7511fae3de7/microesc/models
@R0se_ix1b 2 роки тому
Hi can you please explain how can we convert mp3 audio file into. Wav file
@Jononor 2 роки тому
For a single file use Audacity. For multiple files can use ffmpeg and shell to script it. To do it from Python, use librosa.load and soundfile.write
@GadisaGemechu-j2u 10 місяців тому ⁺¹
perfect bro. can you exchange an idea how to prepare dataset ?
@sadeghmohammadi5567 3 роки тому
Thank you very much for your very informative presentation. However, I have a question regarding one of your slides, Specifically on Aggregation analysis windows: Could you please explain further (possibly with an example). For instance windows = 6 is number of segment that you have extracted from you audio signals or it is length of windows (6*sampling_rate)? or bands=32?
Moreover, regarding base model, is the model that you presented in slide before (3 layers CNN?) so the logic is that we kind get the audio signals convert them into the sequence of windows and pass them through SB-CNN and propagate it over time and compute the average pooling and will use the output of average pooling to the softmax to conduct the prediction. is this logic is correct?
In advance thank you for you considerations.
@jayshaligram4474 4 роки тому ⁺²
Hi... great work! Thank you for uploading this video. If you had the exact frequency vs time data for a particular sample in text or csv format, How to use it to improve accuracy of a cnn? Can image data be correlated to corresponding frequency data to get more accurate predictions?
@jayshaligram4474 4 роки тому
Also.. is data augmentation (time shift, pitch shift etc,) manual or is there any automated process for achieving this?
@Jononor 4 роки тому
Hi Jay. The spectrograms contain basically all the time versus frequency data. But if you have some additional information available, there are way to incorporate that. If the data is always available (both training time and prediction time), then you can use it as an additional input to the neural network.
@Jononor 4 роки тому
Data augmentation is basically always automated. Either as a pre-processing batch job. Or done on-the-fly while training the neural network. This posts shows the code for common audio augmentations, medium.com/@makcedward/data-augmentation-for-audio-76912b01fdf6
@weirjwerijrweurhuewhr588 5 років тому ⁺¹
Interesting talk! In the example you showed, lots of the sounds are quite different from each-other, e.g. the children playing, a siren, and a jackhammer. Does it also work for sounds that are very similar? For example different crow calls or different type of chimpanzee sounds?
@Jononor 5 років тому ⁺¹
Hi Ramon. Yes the same basic approach can be used in such a case. Whether good results can be achieved depends on how hard the task is annd how good the data is.
@girishraghunathan2221 5 років тому ⁺²
Interesting Presentation !
@sigitpriyohartanto2129 4 роки тому
thanks you, for great presentation. i have question :
how to make comparisons between one person's voice and another.
@Jononor 4 роки тому ⁺¹
Search for "speaker recognition". I recommend looking into pretrained models based on X-vectors or I-vectors
@sigitpriyohartanto2129 4 роки тому
@@Jononor ok thanks
@cookingcriss 4 роки тому ⁺¹
Thank you so much for sharing the presentation with us! I m new in machine learning and I have some questions. From where could I download or use datasets of audio for my project? Thank you in advance !
@Jononor 4 роки тому ⁺¹
A good overview of environmental audio datasets can be found at www.cs.tut.fi/~heittolt/datasets
@idrisseahamadiabdallah7669 3 роки тому
Hello Jon , you did a great presentation. Thanks for sharing.
I am working on my master's thesis, specifically in Lung Sounds classification using CNN.
I am using mfcc's features. I am getting about 88% of accuracy.
Do you think that melspectogram can give a high accuracy than 88% ?
@Jononor 3 роки тому
Hi Idrisse! Thank you. Yes, I think that mel-spectrogram instead of MFCC might give you a slight increase in performance for your usecase, at least it is worth trying out!
@idrisseahamadiabdallah7669 3 роки тому
@@Jononor thank you
@idrisseahamadiabdallah7669 3 роки тому
@@Jononor thanks sir,
I would like to ask something, please bear me.
Step1 : original dataset 177 samples ( 3 classes , each class has 59 audios files).
Because of the small size of the data, I did data augmentation.
Step 2: After data augmentation, I extracted mfcc's features of the Audio files with its respective labels in order to create a useful dataset.
Step 3 : I splitted the new dataset into training, validation and testing sets.
Step 4: Feed the CNN with the training and validation sets for the training process.
Step 5: evaluated the CNN with the testing set, we are able to reach an accuracy around 90-93%.
Is correct ( logic) to test the model with the testing data that l got in step 3? Or I should split the data to training and testing sets before doing the data augmentation.? Doing so l got an accuracy around 40-43.
Thanks a lot for replying to me.
@Jononor 3 роки тому
@@idrisseahamadiabdallah7669 the testing set should be kept unmodified. Data Augmentation should only be applied to training. It sounds like your data augmentation may have introduced bigger changes than planned. Check the statistics of the data, it should still be very similar between augmented train and original train/test, otherwise you will get trouble
@idrisseahamadiabdallah7669 3 роки тому
@@Jononor okay I understood, thanks a lot.
One other question. Do you think that the 177 wav files , maybe enough to train a CNN model efficiently?
@xXDarQXx 3 роки тому
I was quite surprised that for classification you didn't feed the feature embeddings of the windows to an rnn and instead just used a post processing trick. Wouldn't an rnn work better, what about a transformer? Also, I know that mel spectrograms work better than just feeding raw audio, but how better? is it like +5% accuracy or is it game changing?
nvm 😅 both of these questions were answered at the end. another question that came to mind though is: what about speech recognition models or something similar, are spectrogram-based models still dominating or is it a different story?
@Jononor 3 роки тому ⁺¹
Temporal aggregation using mean or majority voting is simple and works pretty well. It can be done with an RNN, or AutoPool, or an attention function - and it can increase performance a bit
@Jononor 3 роки тому ⁺¹
Whether mel-spectrogram or raw audio works best depends on the task and dataset. It is much more challenging, and more data intensive, to make a system that learns from raw audio - but it sometimes performs better once it works. Though combining both tends to work the best. Not always worth the complexity though
@xXDarQXx 3 роки тому
@@Jononor jesus, that was quick XD
thank you so much for the reply! I really appreciate it. and that was great presentation btw. It was very easy to follow.
I hope you have a nice day ma, cheers :D.
@Jononor 3 роки тому
@@xXDarQXx Thank you :) Happy learning, have a nice day!
@peterm.4026 3 роки тому
I'm new to machine learning and I feel like I watched so many audio machine learning videos and the tips & tricks section to the end on this is the most practical and unique stuff I've seen. Thanks! Does the simple audio recognition by tensor flow tutorial still exist? I can't seem to find it? Also, in the audio augmentation slide you talk about adding noise to your data for benefit of the model but in the Q&A you talk about how de-noising is helpful. Could you clarify the different cases where you use both?
@Jononor 3 роки тому ⁺¹
Hi Peter. The Tensorflow simple audio tutorial still exists, but they keeping moving it around and renaming it. Currently it is called "Simple audio recognition: Recognizing keywords" at www.tensorflow.org/tutorials/audio/simple_audio
@Jononor 3 роки тому ⁺¹
Training with noise via data augmentation is almost always beneficial (possible exception, if one of your classes is very noise like). And given sufficient data, this will work well, and is the simplest solution. However, if one 1) has a small amount of data and 2) there are well known denoising methods that work well for the case - it may be worth a try. Examples of usecases where I have seen denoising step work well is bird audio spotting in remote monitoring cases (forests etc) - here it is often very quiet and the noise floor can be significant. It may be the noise is that of the microphones and electronics themselves, which is near constant, and relatively simple to denoise
@saleemjamali3521 3 роки тому
Sir can you share the code of your model?
@Jononor 3 роки тому ⁺¹
Hi Saleem. You can find the code here, github.com/jonnor/eSC-CNN-microcontroller
@saleemjamali3521 3 роки тому ⁺¹
@@Jononor thank you so much sir
@doyourealise 2 роки тому
i am here again, one question. Why don't you upload audio processing videos weekly ? Thanks !!!!!!
@Jononor 2 роки тому ⁺¹
Several reasons. But the main one is that I do not have the time right now. It takes around 10 hours to make a 10 minute lecture with solid content.
@doyourealise 2 роки тому
@@Jononor you are right! Its hard and sometimes a headache haha, anyway loved the old content!
@chacmool2581 3 роки тому
Great stuff.
How's the job market for this type of knowledge and skills? I am an old EE just starting a DS masters and I've turned my attention to audio classification.
@Jononor 3 роки тому
Hi Chac. For audio, image, video etc type of processing - the kind of companies that before would hire for Digital Signal Processing skills are today hiring for Machine Learning. If you have an EE background with skills around embedded systems, that is a very good compliment for many such companies. At the moment the demand for ML engineers is high - and many are trying to build new ML-based products and functionality - and there is a lack of skilled people. So pretty good I would say - but you need to go for the places that match your skill profile. A masters degree will set you apart from the large number of self-learners, in terms of demonstrated qualifications
@chacmool2581 3 роки тому ⁺¹
@@Jononor Thank you very much for that. Much appreciated.
@chacmool2581 3 роки тому
John, hate to bug you again, but I am actually kinda serious about this. My DS program is actually not geared or focused for 'TinyML' so I need to supplement it with other learning. What online program or set of courses would you recommend to get into 'TinyML'?
@Jononor 3 роки тому ⁺¹
@@chacmool2581 There is a TinyML book. Have not ready, but probably a good start. The TinyML youtube channel has many good talks, but they are on bleeding edge research - not a pedagogical resource. But apart from the usual embedded/DSP topics, the main part of TinyML is computationally efficient and small models. So focus on understanding how to choose and optimize for such models. For CNNs my master thesis has some pointers on that
@Jononor 3 роки тому ⁺¹
@@chacmool2581 Also, do a few practical projects. Get an ESP32 board and build something fun (does not have to be useful)
@tranthanh3060 5 років тому ⁺¹
I really like your presentation. Thank you very much. Since I'm trying to classify sound for my project now, could I ask you some more questions?
@Jononor 5 років тому ⁺¹
Just ask here, or create Stack Overflow questions and link them here. Then I can respond :)
@tranthanh3060 5 років тому
Could you help me explain more detail about mel spectrogram, more mathematical
@Jononor 5 років тому ⁺²
@@tranthanh3060 here is a good intro, haythamfayek.com/2016/04/21/speech-processing-for-machine-learning.html
@tranthanh3060 5 років тому ⁺¹
@@Jononor Thank you so much for your prompt response, this is exactly what I need. Hope you have a nice day!
@tommygun296 4 роки тому
Fantastic!!!! **O** GrEAT insight! Thank you!

Наступне

Автоматичне відтворення

Environmental Noise Classification on Microcontrollers (tinyML Summit 2021)