13. Speech Recognition with Convolutional Neural Networks in Keras/TensorFlow

Поділитися
Вставка
  • Опубліковано 27 сер 2024

КОМЕНТАРІ • 78

  • @MyerNore
    @MyerNore 2 роки тому +3

    Love the casual presentation of this material, so sophisticated and yet improvisatory…

  • @MaestroBeats
    @MaestroBeats 4 роки тому +82

    I was setting a voice recognition password for my phone and a dog nearby barked and run away. Now I'm still looking for that dog to unlock my phone....

  • @shanejohnpaul
    @shanejohnpaul 5 років тому +26

    In the end, instead of trying the LSTM network, you ran the Dense network by mistake!
    Please check on it.

  • @bags534
    @bags534 4 роки тому +6

    Watching a jupyter notebook being executed live evokes a different level of interest than watching someone go through the notebook

  • @TecGFS
    @TecGFS 3 роки тому +4

    Could you guys do a series where you guys make your own AI assistant?

  • @taptaplit1081
    @taptaplit1081 3 роки тому +1

    @
    Weights & Biases where is the link to download more files?

  • @slazerlombardi
    @slazerlombardi 4 роки тому +3

    That hairstyle adds 2.5 intelligence to his avatar.

  • @rhinoara7119
    @rhinoara7119 3 роки тому +1

    I want to convert speech to text offline.. atleast a limited amount of words, can anybody help?

  • @aquafina3708
    @aquafina3708 2 роки тому

    thank for video. but i have a question. i don't know what is Feature Descriptors in animal sound recognition. Can you answer my question? My english is not good. i hope you to understand me.

  • @shobhitbishop
    @shobhitbishop 4 роки тому +2

    Thank you for sharing this informative video, Can you share some information related to speaker diarization in python?

  • @sidvlognlifestyle
    @sidvlognlifestyle Рік тому

    is this same as if we choose the topic as " Speech spoofing detection"

  • @Pnr231
    @Pnr231 2 роки тому +1

    Hiii sir my professor gave me a mini project topic is [Improving speech recognition using bionic wavelet feature] he said to do this in python program please help me to do it.plzzz

  • @ShaunJW1
    @ShaunJW1 3 роки тому +5

    I'm going to develop voice recognition software, thanks this is great, subscribed.

    • @shashithadithya9744
      @shashithadithya9744 3 роки тому

      I would like to know about your voice recognition software. So how can I contact you?

    • @waterspray5743
      @waterspray5743 2 роки тому

      Hello, how's your progress?

  • @sreyamathew327
    @sreyamathew327 10 місяців тому

    Can you please explain SER using CNN for a beginner?

  • @kevinsasso1405
    @kevinsasso1405 4 роки тому +1

    I got excited when i clicked the video because i thought you were speaking of 1D-cnn, move to 1Dcnn on raw audio

  • @ar-visions
    @ar-visions 3 роки тому +2

    Great resource. Instantly subscribed

  • @MS-fk8ec
    @MS-fk8ec 4 роки тому

    what are the callbacks when fitting the model, you didn't scroll there

  • @aliarslan6904
    @aliarslan6904 4 роки тому +1

    where is the dataset obtained from original link ????

  • @shangethrajaa
    @shangethrajaa 5 років тому +4

    How is this speech recognition? Its just Spoken word classification.

  • @JS19190
    @JS19190 5 років тому

    A great and informative video, thank you!

  • @alikavari351
    @alikavari351 4 роки тому

    HI
    How to use this type of network for when we are looking for a specific word in the input sound
    For example, we are looking for the word hello
    So the first label is "hello" and the second label is something other than hello

  • @inamullahshah7074
    @inamullahshah7074 4 роки тому +1

    Sir how can we label our audio files dataset?

  • @mrsilver8151
    @mrsilver8151 4 місяці тому

    nice and informative video

  • @azrflourish9032
    @azrflourish9032 2 роки тому +1

    where we can download the data which's used in here?

    • @WeightsBiases
      @WeightsBiases  2 роки тому

      You can follow along the code and get the data here!
      github.com/lukas/ml-class/tree/master/videos/cnn-audio

  • @user-or7ji5hv8y
    @user-or7ji5hv8y 3 роки тому

    Great video

  • @luisfernandoriveroslozano2859
    @luisfernandoriveroslozano2859 4 роки тому +1

    Hi, I was trying probe the project but i have a mistake when i run the audio.ipynb, please, i would like that somebody could help me with this mistake. Thank you
    Using TensorFlow backend.
    ---------------------------------------------------------------------------
    ModuleNotFoundError Traceback (most recent call last)
    ModuleNotFoundError: No module named 'numpy.core._multiarray_umath'
    ---------------------------------------------------------------------------
    ImportError Traceback (most recent call last)
    ImportError: numpy.core.multiarray failed to import
    The above exception was the direct cause of the following exception:
    SystemError Traceback (most recent call last)
    ~\Anaconda3\lib\importlib\_bootstrap.py in _find_and_load(name, import_)
    SystemError: returned a result with an error set
    ---------------------------------------------------------------------------
    ImportError Traceback (most recent call last)
    ImportError: numpy.core._multiarray_umath failed to import
    ---------------------------------------------------------------------------
    ImportError Traceback (most recent call last)
    ImportError: numpy.core.umath failed to import

    • @cabbagenguyen801
      @cabbagenguyen801 4 роки тому

      That's errors about importing library. So I think you need to check your app about numpy. Or you can try that project in Google Colab first.

    • @chrisvanpelt1677
      @chrisvanpelt1677 4 роки тому

      Hey luis, this is fixed now if you pull the changes from git.

  • @hygjob
    @hygjob 5 років тому +1

    Thank you for sharing your good work.

  • @mattymallz4207
    @mattymallz4207 4 роки тому +1

    I am 20 seconds into this video, i had to pause it and write a comment. I can tell this is gonna be AMAZING.

  • @_mehmet
    @_mehmet 4 роки тому +1

    Thank you for source code ❤️

  • @phamthanhnhan9409
    @phamthanhnhan9409 3 роки тому

    Is it QCNN??

  • @rudrakshshukla765
    @rudrakshshukla765 4 роки тому

    Hello, i have issue while predict can you please guide me how to predict this

  • @zacharyblundell6994
    @zacharyblundell6994 4 роки тому +1

    Looking to start a voice recognition company but not tech savvy. If any tech guros are interested, please let me know? Thanks Zach

  • @kopalsoni4780
    @kopalsoni4780 4 роки тому

    Why do we have to use and specify buckets?

    • @ayushthakur3880
      @ayushthakur3880 4 роки тому +1

      For MFCC transformation the signal is first converted to frequency domain using FFT. This need to be applied to small windows of the whole signal. The bucket specifies the length of those windows.

  • @yasminebelhadj9359
    @yasminebelhadj9359 5 років тому +2

    Hi, can you please explain how did you convert the audio files into a useful data ?

    • @cabbagenguyen801
      @cabbagenguyen801 5 років тому +1

      yasmine belhadj you can use some technique like mfcc, ..... I’m using it for my project.

    • @yasminebelhadj9359
      @yasminebelhadj9359 4 роки тому +1

      @@cabbagenguyen801 Thank you , i got it :D

    • @cabbagenguyen801
      @cabbagenguyen801 4 роки тому

      @@yasminebelhadj9359 You're welcome ^^

    • @zohaibramzan6381
      @zohaibramzan6381 4 роки тому

      @@cabbagenguyen801 mfcc does what? explain briefly. Also explain how he covert audio into useful data?

    • @cabbagenguyen801
      @cabbagenguyen801 4 роки тому

      @@zohaibramzan6381 you can Google it with keyword "speech feature extraction with mfcc"

  • @user-or7ji5hv8y
    @user-or7ji5hv8y 3 роки тому

    Why not Pytorch?

  • @souha5188
    @souha5188 3 роки тому

    how to create confusion matrix for this tutorial ?

    • @WeightsBiases
      @WeightsBiases  3 роки тому

      Hey Souha!
      We can make and log a confusion matrix for you, given the ground truth and the model predictions, with wandb.sklearn.plot_confusion_matrix. As the name implies, we use sklearn to generate the matrix, so head there if you want to calculate and plot the CM without logging it.
      See some examples of confusion matrix calculation, and our other scikit integrations, here: docs.wandb.com/library/integrations/scikit

    • @souha5188
      @souha5188 3 роки тому

      ​@@WeightsBiases thank you

  • @pricesmith1793
    @pricesmith1793 2 роки тому

    New to ML here, very very much not new to audio. - I have a specific use case with lots of data that I want to experiment with involving six channels of low sample rate data, rather than the one. How would I go about separating each channel in the area where you opted to keep it at one?

  • @mysteriousartiest542
    @mysteriousartiest542 3 роки тому

    Can we use the same code to make a model to identify if an audio is fake or real?

  • @zaphbeeblebrox5333
    @zaphbeeblebrox5333 3 роки тому

    Great video! Thank you!!

  • @pablinsky2006
    @pablinsky2006 3 роки тому

    Do you know where to find WAV files like the ones that you used?

    • @Dr.Funknstein
      @Dr.Funknstein Рік тому

      Idk if you're still looking but Google's Speech Command Dataset

  • @science.20246
    @science.20246 4 роки тому

    is there an example with reccurent technics like lstm

    • @kishpawar
      @kishpawar 4 роки тому

      ua-cam.com/video/u9FPqkuoEJ8/v-deo.html hope this helps

  • @michaelfekadu6116
    @michaelfekadu6116 5 років тому

    Where is the data?

    • @WeightsBiases
      @WeightsBiases  5 років тому +2

      +Michael Fekadu can you elaborate?

    • @michaelfekadu6116
      @michaelfekadu6116 5 років тому +3

      ​@@WeightsBiases Sorry, I was not following along with the linked GitHub repository because I wanted to apply the knowledge from this video onto a different dataset. So, I did not realize that the save_data_to_array() and get_data_train_test() functions are inside of the preprocess.py file. Furthermore, the data is loaded from librosa via the librosa.load() call. In other words, I was watching the video out of context of the first video that suggests following along after setting up a local copy of the provided Git repository, which I had done previously and should have checked there before commenting.
      Thank you for checking in!
      Love the videos!

    • @WeightsBiases
      @WeightsBiases  5 років тому

      @@michaelfekadu6116 No problem, what are you applying this to?

    • @michaelfekadu6116
      @michaelfekadu6116 5 років тому +1

      Weights & Biases I plan to apply it to the DARPA TIMIT dataset that I found here:
      www.kaggle.com/mfekadu/darpa-timit-acousticphonetic-continuous-speech
      First I’ll need to write some python code that splits the data into just the words from the sentences using the time-aligned orthographic annotation files.

  • @karenhdez7735
    @karenhdez7735 3 роки тому

    The video is amazing and it has helped me to solve one of my projects, however, when I'm running the last part validating the model, I've got this error
    AttributeError: 'NoneType' object has no attribute 'item'
    could you help me, please?