I was setting a voice recognition password for my phone and a dog nearby barked and run away. Now I'm still looking for that dog to unlock my phone....
thank for video. but i have a question. i don't know what is Feature Descriptors in animal sound recognition. Can you answer my question? My english is not good. i hope you to understand me.
Hiii sir my professor gave me a mini project topic is [Improving speech recognition using bionic wavelet feature] he said to do this in python program please help me to do it.plzzz
HI How to use this type of network for when we are looking for a specific word in the input sound For example, we are looking for the word hello So the first label is "hello" and the second label is something other than hello
Hi, I was trying probe the project but i have a mistake when i run the audio.ipynb, please, i would like that somebody could help me with this mistake. Thank you Using TensorFlow backend. --------------------------------------------------------------------------- ModuleNotFoundError Traceback (most recent call last) ModuleNotFoundError: No module named 'numpy.core._multiarray_umath' --------------------------------------------------------------------------- ImportError Traceback (most recent call last) ImportError: numpy.core.multiarray failed to import The above exception was the direct cause of the following exception: SystemError Traceback (most recent call last) ~\Anaconda3\lib\importlib\_bootstrap.py in _find_and_load(name, import_) SystemError: returned a result with an error set --------------------------------------------------------------------------- ImportError Traceback (most recent call last) ImportError: numpy.core._multiarray_umath failed to import --------------------------------------------------------------------------- ImportError Traceback (most recent call last) ImportError: numpy.core.umath failed to import
For MFCC transformation the signal is first converted to frequency domain using FFT. This need to be applied to small windows of the whole signal. The bucket specifies the length of those windows.
Hey Souha! We can make and log a confusion matrix for you, given the ground truth and the model predictions, with wandb.sklearn.plot_confusion_matrix. As the name implies, we use sklearn to generate the matrix, so head there if you want to calculate and plot the CM without logging it. See some examples of confusion matrix calculation, and our other scikit integrations, here: docs.wandb.com/library/integrations/scikit
New to ML here, very very much not new to audio. - I have a specific use case with lots of data that I want to experiment with involving six channels of low sample rate data, rather than the one. How would I go about separating each channel in the area where you opted to keep it at one?
@@WeightsBiases Sorry, I was not following along with the linked GitHub repository because I wanted to apply the knowledge from this video onto a different dataset. So, I did not realize that the save_data_to_array() and get_data_train_test() functions are inside of the preprocess.py file. Furthermore, the data is loaded from librosa via the librosa.load() call. In other words, I was watching the video out of context of the first video that suggests following along after setting up a local copy of the provided Git repository, which I had done previously and should have checked there before commenting. Thank you for checking in! Love the videos!
Weights & Biases I plan to apply it to the DARPA TIMIT dataset that I found here: www.kaggle.com/mfekadu/darpa-timit-acousticphonetic-continuous-speech First I’ll need to write some python code that splits the data into just the words from the sentences using the time-aligned orthographic annotation files.
The video is amazing and it has helped me to solve one of my projects, however, when I'm running the last part validating the model, I've got this error AttributeError: 'NoneType' object has no attribute 'item' could you help me, please?
Love the casual presentation of this material, so sophisticated and yet improvisatory…
I was setting a voice recognition password for my phone and a dog nearby barked and run away. Now I'm still looking for that dog to unlock my phone....
Crazy
Awesome
In the end, instead of trying the LSTM network, you ran the Dense network by mistake!
Please check on it.
Watching a jupyter notebook being executed live evokes a different level of interest than watching someone go through the notebook
Could you guys do a series where you guys make your own AI assistant?
@
Weights & Biases where is the link to download more files?
That hairstyle adds 2.5 intelligence to his avatar.
I want to convert speech to text offline.. atleast a limited amount of words, can anybody help?
thank for video. but i have a question. i don't know what is Feature Descriptors in animal sound recognition. Can you answer my question? My english is not good. i hope you to understand me.
Thank you for sharing this informative video, Can you share some information related to speaker diarization in python?
is this same as if we choose the topic as " Speech spoofing detection"
Hiii sir my professor gave me a mini project topic is [Improving speech recognition using bionic wavelet feature] he said to do this in python program please help me to do it.plzzz
I'm going to develop voice recognition software, thanks this is great, subscribed.
I would like to know about your voice recognition software. So how can I contact you?
Hello, how's your progress?
Can you please explain SER using CNN for a beginner?
I got excited when i clicked the video because i thought you were speaking of 1D-cnn, move to 1Dcnn on raw audio
Great resource. Instantly subscribed
what are the callbacks when fitting the model, you didn't scroll there
where is the dataset obtained from original link ????
How is this speech recognition? Its just Spoken word classification.
A great and informative video, thank you!
HI
How to use this type of network for when we are looking for a specific word in the input sound
For example, we are looking for the word hello
So the first label is "hello" and the second label is something other than hello
Sir how can we label our audio files dataset?
nice and informative video
where we can download the data which's used in here?
You can follow along the code and get the data here!
github.com/lukas/ml-class/tree/master/videos/cnn-audio
Great video
Hi, I was trying probe the project but i have a mistake when i run the audio.ipynb, please, i would like that somebody could help me with this mistake. Thank you
Using TensorFlow backend.
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
ModuleNotFoundError: No module named 'numpy.core._multiarray_umath'
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
ImportError: numpy.core.multiarray failed to import
The above exception was the direct cause of the following exception:
SystemError Traceback (most recent call last)
~\Anaconda3\lib\importlib\_bootstrap.py in _find_and_load(name, import_)
SystemError: returned a result with an error set
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
ImportError: numpy.core._multiarray_umath failed to import
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
ImportError: numpy.core.umath failed to import
That's errors about importing library. So I think you need to check your app about numpy. Or you can try that project in Google Colab first.
Hey luis, this is fixed now if you pull the changes from git.
Thank you for sharing your good work.
I am 20 seconds into this video, i had to pause it and write a comment. I can tell this is gonna be AMAZING.
Yep, it was amazing.
Thank you for source code ❤️
Is it QCNN??
Hello, i have issue while predict can you please guide me how to predict this
What sort of issue did you face?
Looking to start a voice recognition company but not tech savvy. If any tech guros are interested, please let me know? Thanks Zach
Why do we have to use and specify buckets?
For MFCC transformation the signal is first converted to frequency domain using FFT. This need to be applied to small windows of the whole signal. The bucket specifies the length of those windows.
Hi, can you please explain how did you convert the audio files into a useful data ?
yasmine belhadj you can use some technique like mfcc, ..... I’m using it for my project.
@@cabbagenguyen801 Thank you , i got it :D
@@yasminebelhadj9359 You're welcome ^^
@@cabbagenguyen801 mfcc does what? explain briefly. Also explain how he covert audio into useful data?
@@zohaibramzan6381 you can Google it with keyword "speech feature extraction with mfcc"
Why not Pytorch?
how to create confusion matrix for this tutorial ?
Hey Souha!
We can make and log a confusion matrix for you, given the ground truth and the model predictions, with wandb.sklearn.plot_confusion_matrix. As the name implies, we use sklearn to generate the matrix, so head there if you want to calculate and plot the CM without logging it.
See some examples of confusion matrix calculation, and our other scikit integrations, here: docs.wandb.com/library/integrations/scikit
@@WeightsBiases thank you
New to ML here, very very much not new to audio. - I have a specific use case with lots of data that I want to experiment with involving six channels of low sample rate data, rather than the one. How would I go about separating each channel in the area where you opted to keep it at one?
Can we use the same code to make a model to identify if an audio is fake or real?
Great video! Thank you!!
Do you know where to find WAV files like the ones that you used?
Idk if you're still looking but Google's Speech Command Dataset
is there an example with reccurent technics like lstm
ua-cam.com/video/u9FPqkuoEJ8/v-deo.html hope this helps
Where is the data?
+Michael Fekadu can you elaborate?
@@WeightsBiases Sorry, I was not following along with the linked GitHub repository because I wanted to apply the knowledge from this video onto a different dataset. So, I did not realize that the save_data_to_array() and get_data_train_test() functions are inside of the preprocess.py file. Furthermore, the data is loaded from librosa via the librosa.load() call. In other words, I was watching the video out of context of the first video that suggests following along after setting up a local copy of the provided Git repository, which I had done previously and should have checked there before commenting.
Thank you for checking in!
Love the videos!
@@michaelfekadu6116 No problem, what are you applying this to?
Weights & Biases I plan to apply it to the DARPA TIMIT dataset that I found here:
www.kaggle.com/mfekadu/darpa-timit-acousticphonetic-continuous-speech
First I’ll need to write some python code that splits the data into just the words from the sentences using the time-aligned orthographic annotation files.
The video is amazing and it has helped me to solve one of my projects, however, when I'm running the last part validating the model, I've got this error
AttributeError: 'NoneType' object has no attribute 'item'
could you help me, please?