Dude, your audio processing videos are excellent! I've watched both the original series and the new one, and they were by far the best study material I've seen on the topic. And this comes from a guy who just took a 60 hours NLP class.
i have error when run this project you explain Traceback (most recent call last): File "C:/Users/HHH/Desktop/audio/Audio-Classification-master1/predict.py", line 2, in from clean import downsample_mono, envelope File "C:\Users\HHH\Desktop\audio\Audio-Classification-master1\clean.py", line 1, in import matplotlib.pyplot as plt File "C:\ProgramData\Anaconda3\envs\code\lib\site-packages\matplotlib\__init__.py", line 205, in _check_versions() File "C:\ProgramData\Anaconda3\envs\code\lib\site-packages\matplotlib\__init__.py", line 190, in _check_versions from . import ft2font ImportError: DLL load failed while importing ft2font: The specified module could not be found
i have error when run this project you explain Traceback (most recent call last): File "C:/Users/HHH/Desktop/audio/Audio-Classification-master1/predict.py", line 2, in from clean import downsample_mono, envelope File "C:\Users\HHH\Desktop\audio\Audio-Classification-master1\clean.py", line 1, in import matplotlib.pyplot as plt File "C:\ProgramData\Anaconda3\envs\code\lib\site-packages\matplotlib\__init__.py", line 205, in _check_versions() File "C:\ProgramData\Anaconda3\envs\code\lib\site-packages\matplotlib\__init__.py", line 190, in _check_versions from . import ft2font ImportError: DLL load failed while importing ft2font: The specified module could not be found
Hey what does it mean dealing with mel spectrograms (128,216,3). By using 3 windows length 93ms,46ms and 23ms and in the end they have write 128,216,3 what does 3 shows here??
Isn't 256 exactly half of 512, not 257, I'm probably missing something. Can you make it so that the Mel filter tops out at 8kHz? I'm a vibration analyst by trade and my job is to analyze the FFT and Time Waveform of sound files we collect from running machines, we use the Hanning window and of course the Nyquist is how we setup our data collection. When we are trained we first learn how to use the FFT to decide where are the frequencies of interest then later we learn how to use the waveform to "see" what the equipment is doing and how it is moving. My question is, would it be possible to use both the FFT and the waveform to properly classify the sound file? Why does the ML get just one when we get two? Putting both the FFT and the waveform together moves the confidence level for us. We always use both, plus the sound file to classify. I can easily reference a few short videos that will explain this to you if you don't already know. I know of a service that uses ML to automatically calculate the belt slip in every monitored machine that uses a belt. Any company that uses clean rooms is interested in belt slip because they want to know how much the belts are slipping so they can estimate if they can make it to a shutdown to change the belts rather than the belts failing and then the clean room is not longer clean and everything in the room is suspect and it usually takes several days of testing to get a clean room back to operational status. Anyway this is part 1 so I'll watch parts two and three. I have done similar things like you are doing but I didn't use the Mel filter, if it were possible to use both the FFT and the waveform to confirm the results should be above 95% maybe 99+%?? Exciting. Using sound, or sound and vibration, to monitor industrial equipment should yield a tool that when used properly will increase industry output. Reply to this and we can talk about sound files. I have some sound files on Kaggle using as a storage place as I figure out how to best process them but I'm learning as I go.
@seth I’m trying to write a program to make audio predictions in real time. The inputs can be a mic or UA-cam audio. Do you happen to know how i can do that?
I have had some requests for this from multiple people; however, it will probably happen as another series. If you just want a single stream to connect between a server and client, I would recommend a socket connection (using flask) that streams wav files. You simply pass the data through the model as it comes in and send the result back in real time.
@@seth8141 I am watching these series for my project,, as my instructor wants me to implement an ML model in my e-commerce website. I am very new to machine and deep learning, essentially I know nothing in them. I am familiar with python and I usually work with React and Node, I want To learn machine learning, could you point me in a right direction, where I should start>??? Many thanks
Hi, have the Melspectrogram layer deleted from kapre layers? Before it worked, but when I tried now, it is saying "cannot import name 'Melspectrogram' from 'kapre.time_frequency' (/home/dana/.local/lib/python3.7/site-packages/kapre/time_frequency.py)"
Godsend! I was just messing around with Keras Sequences to create a .npy audio generator. It can be quite annoying because there's not a whole lot of resources for audio compared to images.
@@seth8141 i have error when run this project you explain Traceback (most recent call last): File "C:/Users/HHH/Desktop/audio/Audio-Classification-master1/predict.py", line 2, in from clean import downsample_mono, envelope File "C:\Users\HHH\Desktop\audio\Audio-Classification-master1\clean.py", line 1, in import matplotlib.pyplot as plt File "C:\ProgramData\Anaconda3\envs\code\lib\site-packages\matplotlib\__init__.py", line 205, in _check_versions() File "C:\ProgramData\Anaconda3\envs\code\lib\site-packages\matplotlib\__init__.py", line 190, in _check_versions from . import ft2font ImportError: DLL load failed while importing ft2font: The specified module could not be found
Hi Seth Adams! I am a student and studying the Audio Classification, VAE and CNN at the moment. I like your videos very much! They really help me a lot. If you do not mind, could I ask your permission to share your 3 new Audio Classification videos to the other website in China for the embarrassing reason that UA-cam is blocked from accessing in China? Of course, I will give sources of your original website. Thank you very much!! :p
Hi Seth, really appreciate what you are doing with this tutorials, they helped me a lot. I am trying to implement instrument classifier with this approach (spectograms + CNN), the only problem is my dataset contains wav files of various length, from 0.5s to 15s, what do you think, how should I segment data to have spectograms with same dimensions ?
If you want you can split the wave files to specific time duration and pad the last segment if necessary, all with same label (or howsoever they are). This will let you have specific width spectrograms. Otherwise you can run the CNN on the complete wave spectrogram similar to how YOLO does object detection. It automatically segments the data and you can finally compute loss to n-number of outputs thus obtained. Hope this works! :)
@Seth I'm getting this error: ```Traceback (most recent call last): File "sound_preprocess.py", line 119, in save_mfcc(DATASET_PATH, JSON_PATH) File "sound_preprocess.py", line 90, in save_mfcc hop_length = hop_length File "/home/bitsy-chuck/anaconda3/lib/python3.7/site-packages/librosa/feature/spectral.py", line 1706, in mfcc S = power_to_db(melspectrogram(y=y, sr=sr, **kwargs)) File "/home/bitsy-chuck/anaconda3/lib/python3.7/site-packages/librosa/feature/spectral.py", line 1831, in melspectrogram pad_mode=pad_mode) File "/home/bitsy-chuck/anaconda3/lib/python3.7/site-packages/librosa/core/spectrum.py", line 2530, in _spectrogram window=window, pad_mode=pad_mode))**power File "/home/bitsy-chuck/anaconda3/lib/python3.7/site-packages/librosa/core/spectrum.py", line 219, in stft y = np.pad(y, int(n_fft // 2), mode=pad_mode) File "", line 6, in pad File "/home/bitsy-chuck/anaconda3/lib/python3.7/site-packages/numpy/lib/arraypad.py", line 821, in pad "'constant' or 'empty'".format(axis) ValueError: can't extend empty axis 0 using modes other than 'constant' or 'empty' ``` Please help
Hello I hope you are well, I followed your videos on deep learning for audio classification and it was very interesting thanks for everything. but please ask her something: if i want to create a machine learning model for transcribing (audio -> written) in a brand new language, like an african language for example, how should i proceed. thank you
Thank you for posting new content! Could you elaborate why you decided to use the Mel Log spectrogram in this series instead of Mel Frequency Cepstrum of Coefficidents for the last series you had made?
Because kapre doesn't support MFCC. The process of extending to support MFCC is probably not too difficult, but the mel spectrogram should be fine for most people anyway.
Unfortunately, there are too many hardware requirements for me to make videos about it. I would probably have to run in simulation and go over specific topics.
Hello,Thanks so much for updating the video!! It helps a lot for what I am doing right now since I am a total fresher in this filed. May I ask one simple question about the log mel filter bank? My understanding is when we try to classify sound which make by human we can put more stress on lower frequency. For the sound from these instruments (which has important data on relatively high frequency) why it helps for audio feature extraction? Anybody has the answer? I will be so appreciate!!!
Yes, using a log mel filter bank puts more of a bias towards low frequency signals. If you think you need better precision on higher frequencies, you can just use a spectrogram from kapre. Just swap out the Melspectrogram for a regular one. github.com/keunwoochoi/kapre/blob/master/examples/example_codes.ipynb You will likely need to change model parameters to handle the larger input shape, but the code on github should give a starting point.
How uninterested you are in explaining your concepts, just talking and talking assuming the viewer knows all the concepts already. try explaining in a better manner.
Dude, your audio processing videos are excellent! I've watched both the original series and the new one, and they were by far the best study material I've seen on the topic. And this comes from a guy who just took a 60 hours NLP class.
i have error when run this project you explain
Traceback (most recent call last):
File "C:/Users/HHH/Desktop/audio/Audio-Classification-master1/predict.py", line 2, in
from clean import downsample_mono, envelope
File "C:\Users\HHH\Desktop\audio\Audio-Classification-master1\clean.py", line 1, in
import matplotlib.pyplot as plt
File "C:\ProgramData\Anaconda3\envs\code\lib\site-packages\matplotlib\__init__.py", line 205, in
_check_versions()
File "C:\ProgramData\Anaconda3\envs\code\lib\site-packages\matplotlib\__init__.py", line 190, in _check_versions
from . import ft2font
ImportError: DLL load failed while importing ft2font: The specified module could not be found
This is amazing, from A to Z. Hoping we will have another series in computer vision or audio ml, dude.
Really love your videos Adam. Keep up the good work.
wow! ^-^ great!!! i watched your old series not long ago! really good!!! now im happy to add this series to my watch later list :P
Hello Seth, Thank you for this excellent series! Very helpful!
good content keep up bro you really explain well and your picture help me understand a lot better
i have error when run this project you explain
Traceback (most recent call last):
File "C:/Users/HHH/Desktop/audio/Audio-Classification-master1/predict.py", line 2, in
from clean import downsample_mono, envelope
File "C:\Users\HHH\Desktop\audio\Audio-Classification-master1\clean.py", line 1, in
import matplotlib.pyplot as plt
File "C:\ProgramData\Anaconda3\envs\code\lib\site-packages\matplotlib\__init__.py", line 205, in
_check_versions()
File "C:\ProgramData\Anaconda3\envs\code\lib\site-packages\matplotlib\__init__.py", line 190, in _check_versions
from . import ft2font
ImportError: DLL load failed while importing ft2font: The specified module could not be found
Hey what does it mean dealing with mel spectrograms (128,216,3). By using 3 windows length 93ms,46ms and 23ms and in the end they have write 128,216,3 what does 3 shows here??
Isn't 256 exactly half of 512, not 257, I'm probably missing something. Can you make it so that the Mel filter tops out at 8kHz? I'm a vibration analyst by trade and my job is to analyze the FFT and Time Waveform of sound files we collect from running machines, we use the Hanning window and of course the Nyquist is how we setup our data collection. When we are trained we first learn how to use the FFT to decide where are the frequencies of interest then later we learn how to use the waveform to "see" what the equipment is doing and how it is moving. My question is, would it be possible to use both the FFT and the waveform to properly classify the sound file? Why does the ML get just one when we get two? Putting both the FFT and the waveform together moves the confidence level for us. We always use both, plus the sound file to classify. I can easily reference a few short videos that will explain this to you if you don't already know. I know of a service that uses ML to automatically calculate the belt slip in every monitored machine that uses a belt. Any company that uses clean rooms is interested in belt slip because they want to know how much the belts are slipping so they can estimate if they can make it to a shutdown to change the belts rather than the belts failing and then the clean room is not longer clean and everything in the room is suspect and it usually takes several days of testing to get a clean room back to operational status. Anyway this is part 1 so I'll watch parts two and three. I have done similar things like you are doing but I didn't use the Mel filter, if it were possible to use both the FFT and the waveform to confirm the results should be above 95% maybe 99+%?? Exciting. Using sound, or sound and vibration, to monitor industrial equipment should yield a tool that when used properly will increase industry output. Reply to this and we can talk about sound files. I have some sound files on Kaggle using as a storage place as I figure out how to best process them but I'm learning as I go.
Hi Seth... can you plz guide how can we use transfer learning models with Kapre? TIA
Seth very clear explanation. thanks.
Can be usded for classification the elelctromagnetic informaion?
Hello, I want to convert the h5 model to tflite, but I'm having issues how do I remove those custom layers and simplify it?
I don't think that's possible.
It has been years since I tried that, but I think you would need something that can understand the custom layers.
@seth I’m trying to write a program to make audio predictions in real time. The inputs can be a mic or UA-cam audio. Do you happen to know how i can do that?
i am also working on that ...
I have had some requests for this from multiple people; however, it will probably happen as another series. If you just want a single stream to connect between a server and client, I would recommend a socket connection (using flask) that streams wav files. You simply pass the data through the model as it comes in and send the result back in real time.
Mostafa Miroune thanks
@@seth8141 I am watching these series for my project,, as my instructor wants me to implement an ML model in my e-commerce website. I am very new to machine and deep learning, essentially I know nothing in them. I am familiar with python and I usually work with React and Node, I want To learn machine learning, could you point me in a right direction, where I should start>??? Many thanks
Excellent !!
Hi, have the Melspectrogram layer deleted from kapre layers? Before it worked, but when I tried now, it is saying "cannot import name 'Melspectrogram' from 'kapre.time_frequency' (/home/dana/.local/lib/python3.7/site-packages/kapre/time_frequency.py)"
Excellent content. Thank you and keep up
Godsend! I was just messing around with Keras Sequences to create a .npy audio generator. It can be quite annoying because there's not a whole lot of resources for audio compared to images.
Thanks, Seth! Great content.
Thanks Filipe
@@seth8141 i have error when run this project you explain
Traceback (most recent call last):
File "C:/Users/HHH/Desktop/audio/Audio-Classification-master1/predict.py", line 2, in
from clean import downsample_mono, envelope
File "C:\Users\HHH\Desktop\audio\Audio-Classification-master1\clean.py", line 1, in
import matplotlib.pyplot as plt
File "C:\ProgramData\Anaconda3\envs\code\lib\site-packages\matplotlib\__init__.py", line 205, in
_check_versions()
File "C:\ProgramData\Anaconda3\envs\code\lib\site-packages\matplotlib\__init__.py", line 190, in _check_versions
from . import ft2font
ImportError: DLL load failed while importing ft2font: The specified module could not be found
Hi Seth Adams! I am a student and studying the Audio Classification, VAE and CNN at the moment. I like your videos very much! They really help me a lot. If you do not mind, could I ask your permission to share your 3 new Audio Classification videos to the other website in China for the embarrassing reason that UA-cam is blocked from accessing in China? Of course, I will give sources of your original website. Thank you very much!! :p
Hi Seth, really appreciate what you are doing with this tutorials, they helped me a lot. I am trying to implement instrument classifier with this approach (spectograms + CNN), the only problem is my dataset contains wav files of various length, from 0.5s to 15s, what do you think, how should I segment data to have spectograms with same dimensions ?
If you want you can split the wave files to specific time duration and pad the last segment if necessary, all with same label (or howsoever they are). This will let you have specific width spectrograms.
Otherwise you can run the CNN on the complete wave spectrogram similar to how YOLO does object detection. It automatically segments the data and you can finally compute loss to n-number of outputs thus obtained.
Hope this works! :)
Incredible! Thanks a lot
could you please post the original code too?
Thank you very much! your videos are very useful!
keep up bro!
thx for your content
@Seth
I'm getting this error: ```Traceback (most recent call last):
File "sound_preprocess.py", line 119, in
save_mfcc(DATASET_PATH, JSON_PATH)
File "sound_preprocess.py", line 90, in save_mfcc
hop_length = hop_length
File "/home/bitsy-chuck/anaconda3/lib/python3.7/site-packages/librosa/feature/spectral.py", line 1706, in mfcc
S = power_to_db(melspectrogram(y=y, sr=sr, **kwargs))
File "/home/bitsy-chuck/anaconda3/lib/python3.7/site-packages/librosa/feature/spectral.py", line 1831, in melspectrogram
pad_mode=pad_mode)
File "/home/bitsy-chuck/anaconda3/lib/python3.7/site-packages/librosa/core/spectrum.py", line 2530, in _spectrogram
window=window, pad_mode=pad_mode))**power
File "/home/bitsy-chuck/anaconda3/lib/python3.7/site-packages/librosa/core/spectrum.py", line 219, in stft
y = np.pad(y, int(n_fft // 2), mode=pad_mode)
File "", line 6, in pad
File "/home/bitsy-chuck/anaconda3/lib/python3.7/site-packages/numpy/lib/arraypad.py", line 821, in pad
"'constant' or 'empty'".format(axis)
ValueError: can't extend empty axis 0 using modes other than 'constant' or 'empty'
```
Please help
Can you email the entire project directory to me? seth8141@gmail.com or just the entire traceback.
Or post the full traceback with OS system info in an issue on the github repo
Great content, thank you so much!!
Hello I hope you are well, I followed your videos on deep learning for audio classification and it was very interesting thanks for everything.
but please ask her something:
if i want to create a machine learning model for transcribing (audio -> written) in a brand new language, like an african language for example, how should i proceed.
thank you
check out assemblyai.
Thank you for posting new content! Could you elaborate why you decided to use the Mel Log spectrogram in this series instead of Mel Frequency Cepstrum of Coefficidents for the last series you had made?
Because kapre doesn't support MFCC. The process of extending to support MFCC is probably not too difficult, but the mel spectrogram should be fine for most people anyway.
can you make videos for autonomous-Pi
Unfortunately, there are too many hardware requirements for me to make videos about it. I would probably have to run in simulation and go over specific topics.
Nice job
Hello,Thanks so much for updating the video!! It helps a lot for what I am doing right now since I am a total fresher in this filed.
May I ask one simple question about the log mel filter bank?
My understanding is when we try to classify sound which make by human we can put more stress on lower frequency.
For the sound from these instruments (which has important data on relatively high frequency) why it helps for audio feature extraction?
Anybody has the answer? I will be so appreciate!!!
Yes, using a log mel filter bank puts more of a bias towards low frequency signals. If you think you need better precision on higher frequencies, you can just use a spectrogram from kapre. Just swap out the Melspectrogram for a regular one. github.com/keunwoochoi/kapre/blob/master/examples/example_codes.ipynb
You will likely need to change model parameters to handle the larger input shape, but the code on github should give a starting point.
Thanks !
i love you. soooooooo much
How uninterested you are in explaining your concepts, just talking and talking assuming the viewer knows all the concepts already. try explaining in a better manner.