DSP Background - Deep Learning for Audio Classification (kapre) p.1

Seth Adams

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 30 січ 2025

КОМЕНТАРІ • 52

@matheusschmitz881 4 роки тому ⁺⁷
Dude, your audio processing videos are excellent! I've watched both the original series and the new one, and they were by far the best study material I've seen on the topic. And this comes from a guy who just took a 60 hours NLP class.
@ارضالعلوم-ب4ر 3 роки тому
i have error when run this project you explain
Traceback (most recent call last):
File "C:/Users/HHH/Desktop/audio/Audio-Classification-master1/predict.py", line 2, in
from clean import downsample_mono, envelope
File "C:\Users\HHH\Desktop\audio\Audio-Classification-master1\clean.py", line 1, in
import matplotlib.pyplot as plt
File "C:\ProgramData\Anaconda3\envs\code\lib\site-packages\matplotlib\__init__.py", line 205, in
_check_versions()
File "C:\ProgramData\Anaconda3\envs\code\lib\site-packages\matplotlib\__init__.py", line 190, in _check_versions
from . import ft2font
ImportError: DLL load failed while importing ft2font: The specified module could not be found
@AbdallahIDRISSE 4 місяці тому
This is amazing, from A to Z. Hoping we will have another series in computer vision or audio ml, dude.
@nitinshukla6751 4 роки тому
Really love your videos Adam. Keep up the good work.
@tommygun296 4 роки тому
wow! ^-^ great!!! i watched your old series not long ago! really good!!! now im happy to add this series to my watch later list :P
@aartvanroon1523 4 роки тому
Hello Seth, Thank you for this excellent series! Very helpful!
@GuRockGamer1 4 роки тому ⁺³
good content keep up bro you really explain well and your picture help me understand a lot better
@ارضالعلوم-ب4ر 3 роки тому
i have error when run this project you explain
Traceback (most recent call last):
File "C:/Users/HHH/Desktop/audio/Audio-Classification-master1/predict.py", line 2, in
from clean import downsample_mono, envelope
File "C:\Users\HHH\Desktop\audio\Audio-Classification-master1\clean.py", line 1, in
import matplotlib.pyplot as plt
File "C:\ProgramData\Anaconda3\envs\code\lib\site-packages\matplotlib\__init__.py", line 205, in
_check_versions()
File "C:\ProgramData\Anaconda3\envs\code\lib\site-packages\matplotlib\__init__.py", line 190, in _check_versions
from . import ft2font
ImportError: DLL load failed while importing ft2font: The specified module could not be found
@mudasserahmad6076 Рік тому
Hey what does it mean dealing with mel spectrograms (128,216,3). By using 3 windows length 93ms,46ms and 23ms and in the end they have write 128,216,3 what does 3 shows here??
@tyhuffman5447 4 роки тому
Isn't 256 exactly half of 512, not 257, I'm probably missing something. Can you make it so that the Mel filter tops out at 8kHz? I'm a vibration analyst by trade and my job is to analyze the FFT and Time Waveform of sound files we collect from running machines, we use the Hanning window and of course the Nyquist is how we setup our data collection. When we are trained we first learn how to use the FFT to decide where are the frequencies of interest then later we learn how to use the waveform to "see" what the equipment is doing and how it is moving. My question is, would it be possible to use both the FFT and the waveform to properly classify the sound file? Why does the ML get just one when we get two? Putting both the FFT and the waveform together moves the confidence level for us. We always use both, plus the sound file to classify. I can easily reference a few short videos that will explain this to you if you don't already know. I know of a service that uses ML to automatically calculate the belt slip in every monitored machine that uses a belt. Any company that uses clean rooms is interested in belt slip because they want to know how much the belts are slipping so they can estimate if they can make it to a shutdown to change the belts rather than the belts failing and then the clean room is not longer clean and everything in the room is suspect and it usually takes several days of testing to get a clean room back to operational status. Anyway this is part 1 so I'll watch parts two and three. I have done similar things like you are doing but I didn't use the Mel filter, if it were possible to use both the FFT and the waveform to confirm the results should be above 95% maybe 99+%?? Exciting. Using sound, or sound and vibration, to monitor industrial equipment should yield a tool that when used properly will increase industry output. Reply to this and we can talk about sound files. I have some sound files on Kaggle using as a storage place as I figure out how to best process them but I'm learning as I go.
@SHADABALAM2002 3 роки тому
Hi Seth... can you plz guide how can we use transfer learning models with Kapre? TIA
@yanxu4968 4 роки тому
Seth very clear explanation. thanks.
@benyusu8045 Рік тому
Can be usded for classification the elelctromagnetic informaion?
@wintersoldier9595 6 місяців тому
Hello, I want to convert the h5 model to tflite, but I'm having issues how do I remove those custom layers and simplify it?
@seth8141 6 місяців тому
I don't think that's possible.
It has been years since I tried that, but I think you would need something that can understand the custom layers.
@andysofly1 4 роки тому ⁺¹
@seth I’m trying to write a program to make audio predictions in real time. The inputs can be a mic or UA-cam audio. Do you happen to know how i can do that?
@MegaFarkh 4 роки тому
i am also working on that ...
@seth8141 4 роки тому ⁺²
I have had some requests for this from multiple people; however, it will probably happen as another series. If you just want a single stream to connect between a server and client, I would recommend a socket connection (using flask) that streams wav files. You simply pass the data through the model as it comes in and send the result back in real time.
@andysofly1 4 роки тому
Mostafa Miroune thanks
@butterChickenAndNaan98 4 роки тому
@@seth8141 I am watching these series for my project,, as my instructor wants me to implement an ML model in my e-commerce website. I am very new to machine and deep learning, essentially I know nothing in them. I am familiar with python and I usually work with React and Node, I want To learn machine learning, could you point me in a right direction, where I should start>??? Many thanks
@eshetdires3739 2 роки тому
Excellent !!
@danautebayeva2652 2 роки тому
Hi, have the Melspectrogram layer deleted from kapre layers? Before it worked, but when I tried now, it is saying "cannot import name 'Melspectrogram' from 'kapre.time_frequency' (/home/dana/.local/lib/python3.7/site-packages/kapre/time_frequency.py)"
@PedroHenrique-hb6jn 4 роки тому
Excellent content. Thank you and keep up
@Erosis 4 роки тому
Godsend! I was just messing around with Keras Sequences to create a .npy audio generator. It can be quite annoying because there's not a whole lot of resources for audio compared to images.
@filipesilvestre6214 4 роки тому
Thanks, Seth! Great content.
@seth8141 4 роки тому
Thanks Filipe
@ارضالعلوم-ب4ر 3 роки тому
@@seth8141 i have error when run this project you explain
Traceback (most recent call last):
File "C:/Users/HHH/Desktop/audio/Audio-Classification-master1/predict.py", line 2, in
from clean import downsample_mono, envelope
File "C:\Users\HHH\Desktop\audio\Audio-Classification-master1\clean.py", line 1, in
import matplotlib.pyplot as plt
File "C:\ProgramData\Anaconda3\envs\code\lib\site-packages\matplotlib\__init__.py", line 205, in
_check_versions()
File "C:\ProgramData\Anaconda3\envs\code\lib\site-packages\matplotlib\__init__.py", line 190, in _check_versions
from . import ft2font
ImportError: DLL load failed while importing ft2font: The specified module could not be found
@thetisliu3504 4 роки тому ⁺¹
Hi Seth Adams! I am a student and studying the Audio Classification, VAE and CNN at the moment. I like your videos very much! They really help me a lot. If you do not mind, could I ask your permission to share your 3 new Audio Classification videos to the other website in China for the embarrassing reason that UA-cam is blocked from accessing in China? Of course, I will give sources of your original website. Thank you very much!! :p
@dimitrijemitic497 4 роки тому
Hi Seth, really appreciate what you are doing with this tutorials, they helped me a lot. I am trying to implement instrument classifier with this approach (spectograms + CNN), the only problem is my dataset contains wav files of various length, from 0.5s to 15s, what do you think, how should I segment data to have spectograms with same dimensions ?
@khizirsiddiqui 4 роки тому
If you want you can split the wave files to specific time duration and pad the last segment if necessary, all with same label (or howsoever they are). This will let you have specific width spectrograms.
Otherwise you can run the CNN on the complete wave spectrogram similar to how YOLO does object detection. It automatically segments the data and you can finally compute loss to n-number of outputs thus obtained.
Hope this works! :)
@juaresbarbosa3833 4 роки тому
Incredible! Thanks a lot
@kevinyoung9243 4 роки тому
could you please post the original code too?
@HashanDananjaya 4 роки тому
Thank you very much! your videos are very useful!
@ANAPHUT 4 роки тому
keep up bro!
thx for your content
@ojasvsingh5142 4 роки тому
@Seth
I'm getting this error: ```Traceback (most recent call last):
File "sound_preprocess.py", line 119, in
save_mfcc(DATASET_PATH, JSON_PATH)
File "sound_preprocess.py", line 90, in save_mfcc
hop_length = hop_length
File "/home/bitsy-chuck/anaconda3/lib/python3.7/site-packages/librosa/feature/spectral.py", line 1706, in mfcc
S = power_to_db(melspectrogram(y=y, sr=sr, **kwargs))
File "/home/bitsy-chuck/anaconda3/lib/python3.7/site-packages/librosa/feature/spectral.py", line 1831, in melspectrogram
pad_mode=pad_mode)
File "/home/bitsy-chuck/anaconda3/lib/python3.7/site-packages/librosa/core/spectrum.py", line 2530, in _spectrogram
window=window, pad_mode=pad_mode))**power
File "/home/bitsy-chuck/anaconda3/lib/python3.7/site-packages/librosa/core/spectrum.py", line 219, in stft
y = np.pad(y, int(n_fft // 2), mode=pad_mode)
File "", line 6, in pad
File "/home/bitsy-chuck/anaconda3/lib/python3.7/site-packages/numpy/lib/arraypad.py", line 821, in pad
"'constant' or 'empty'".format(axis)
ValueError: can't extend empty axis 0 using modes other than 'constant' or 'empty'
```
Please help
@seth8141 4 роки тому ⁺²
Can you email the entire project directory to me? seth8141@gmail.com or just the entire traceback.
@seth8141 4 роки тому ⁺¹
Or post the full traceback with OS system info in an issue on the github repo
@lea-qu4fs 4 роки тому
Great content, thank you so much!!
@ibrahimabarry8839 4 роки тому
Hello I hope you are well, I followed your videos on deep learning for audio classification and it was very interesting thanks for everything.
but please ask her something:
if i want to create a machine learning model for transcribing (audio -> written) in a brand new language, like an african language for example, how should i proceed.
thank you
@myerwerl Рік тому
check out assemblyai.
@davidcaldera6768 4 роки тому
Thank you for posting new content! Could you elaborate why you decided to use the Mel Log spectrogram in this series instead of Mel Frequency Cepstrum of Coefficidents for the last series you had made?
@seth8141 4 роки тому
Because kapre doesn't support MFCC. The process of extending to support MFCC is probably not too difficult, but the mel spectrogram should be fine for most people anyway.
@Wanderlust1342 4 роки тому
can you make videos for autonomous-Pi
@seth8141 4 роки тому
Unfortunately, there are too many hardware requirements for me to make videos about it. I would probably have to run in simulation and go over specific topics.
@user-yj7rc2qs8d 4 роки тому
Nice job
@qin7280 4 роки тому
Hello,Thanks so much for updating the video!! It helps a lot for what I am doing right now since I am a total fresher in this filed.
May I ask one simple question about the log mel filter bank?
My understanding is when we try to classify sound which make by human we can put more stress on lower frequency.
For the sound from these instruments (which has important data on relatively high frequency) why it helps for audio feature extraction?
Anybody has the answer? I will be so appreciate!!!
@seth8141 4 роки тому
Yes, using a log mel filter bank puts more of a bias towards low frequency signals. If you think you need better precision on higher frequencies, you can just use a spectrogram from kapre. Just swap out the Melspectrogram for a regular one. github.com/keunwoochoi/kapre/blob/master/examples/example_codes.ipynb
You will likely need to change model parameters to handle the larger input shape, but the code on github should give a starting point.
@yannickpezeu3419 4 роки тому
Thanks !
@kp-dh8ki 2 роки тому
i love you. soooooooo much
@ADodo-f4w Рік тому
How uninterested you are in explaining your concepts, just talking and talking assuming the viewer knows all the concepts already. try explaining in a better manner.

Наступне

Автоматичне відтворення

Data Generator - Deep Learning for Audio Classification (kapre) p.2