I honoustly never take the time to comment or compliment on youtube videos. You, my man, are simply amazing and I truly enjoy listening to you. Going all the way till your last video =)
I was just curious about sound processing and found your lecture series. After I started watching, I binge watched the whole series! Absolute piece of art! PS-I started watching with an absolute zero knowledge about the subject.
Thank you so much for creating these video, I am really enjoying them! Always worked with computers, music and sound when I was young, and still am. Have all the basic knowledge of prorgamming, ml and music. But this is so much more depth, didnt knew I like this stuff. Thank you for creating a new passion for me. Ai with music❤️❤️❤️
Great stuff Valerio, this is amazing content - very educational. When you cover the audio features, can you also cover in depth MFCC's, and how they are typically used? I have yet to seen a good treatment of MFCCs and get an intuitive feel for how they work.
I’m not sure your aliasing demo actually has aliasing. Usually you would hear nasty artifacts when downsampling so much without an antialiasing filter. Audacity is likely applying an antialiasing filter to reduce the bandwidth of the signal before downsampling it.
Thank you for this amazing walkthrough; this is going to help me SO much with ML. Also, question for this section 17:37, do you know why we have to divide the bit depth and resolution sampling rate by 1048 window? In other words, why do we divide by 1,048,576 and then again by 8 bytes? Is there some resource on why this is default? (I'm assuming this has to do with the way computers work.)
A bit can either be 1 or 0 (On / Off) A byte is a group of 8 bits. So if there are 8 bits, where each bit can only be 0 or 1, there are a total of 2^8 = 256 different values that a byte can represent. So let's consider bit depth * sampling rate = 16 * 44,100 = 705,600 bits per second. There are 1,048,576 bits in megabits (mega means 1 million, but the closest binary representation of that is 2^20 = 1,048,576). So 705,600 / 1,048,576 = 0.6729 megabits per second. Remember that 8 bits = 1 byte? 8 megabits is also 1 megabyte. So we just need to divide by 8. 0.6729/8 = 0.0841 megabytes per second = 5.0468 megabytes per minute. I'm assuming it's a typo in the video.
When you resample in Audacity, you are not hearing aliasing. Audacity used a LP filter (as any good downsampler should) to avoid aliasing. What you're hearing in the high frequencies being filtered out
A question . Does the music player in the computer assume/know that audio files are sampled at a particular frequency? Would the music player work fine if the audio files are sampled at a different rate?
Valerio, thank you for this amazing work. You are helping me a lot, I am studying audio and you are answering all my questions. Do you have any book recommendation for me?
Thank you! Unfortunately, there aren't many resources about AI audio. I'm currently writing a book on the topic. A book with "traditional" DSP approaches is "Fundamentals of Music Processing" www.springer.com/gp/book/9783319219448
Hi Valerio ! do you perform the conversion in librosa or it need to code from the beginning ? and can i have the example of pseudocode or algorithm for the adc conversion?
Hi Valerio, I have one doubt that if an audio has a sampling rate of 8000 Hz. Can you say whether it is correct to extract audio features by upsampling the audio to 44100 Hz or 32KHz. KIndly give some suggestions on this
@@ValerioVelardoTheSoundofAI I believe that you are saying, it is to keep 8000 hz for audio feature extraction. Whether it is wrong/some effects will be there when we have tuned sampling rate to 44100hz and extracted audio features. Kindly reply
@@venkatesanr9455 if you upsample, that would not hurt your signal ... But that would require more data storage to accommodate more samples, as @Valerio mentioned. Moreover, if your signal is bandpass signal of a carrier frequency (not a baseband), there is a limit of upsample as well, I mean you can't upsample your signal as much higher rate as you want
@@imamuddin8042 Thanks for your kind response, Sharif. I have recorded speech samples at sample rate of 8000 Hz. While processing this I had this doubt by upsampling the speech data towards 44.1k or 32kHz and extracting audio features would have any effects.
@@venkatesanr9455 Since you have recorded Speech Signal (that is 20Hz-20 Khz spanned) at sample rate of 8kHz , so already included aliasing effect of signals over 4khz... So I would recommend to put an anti aliasing low pass filter cut off at 4 khz in analog domain, before you sample your data. Or, you can initially use sampling rate 44.1 k , instead of thinking about upsampling it later
If you need general feedback, I suggest you to join The Sound of AI community (sign up link in the description). If you need more involved help, I do consulting.
I honoustly never take the time to comment or compliment on youtube videos. You, my man, are simply amazing and I truly enjoy listening to you. Going all the way till your last video =)
Thank you for taking the time :)
I was just curious about sound processing and found your lecture series. After I started watching, I binge watched the whole series! Absolute piece of art! PS-I started watching with an absolute zero knowledge about the subject.
bro your content is so helpful. very concise and straight to the point.
Thanks!
Hey Valerio,
This is just amazing content, i like the depth, the way you explain in so simple terms, you satisfied my curiosity for this whole topic.
Loves the understanding, clarity in content & excellent examples through applications! love it
i came here for a language classification research, but now im amazed with the music thing
That's an incredibly deep rabbit hole :)
At 15:45, the picture that appears seems to have an error. The numbers of amplitude scale are out of order in binary notation.
Браво! Превосходная подача материала!
you are awesome! the best ML tutorial for audio signals
Thank you Joel!
Thank you so much for creating these video, I am really enjoying them! Always worked with computers, music and sound when I was young, and still am. Have all the basic knowledge of prorgamming, ml and music. But this is so much more depth, didnt knew I like this stuff. Thank you for creating a new passion for me. Ai with music❤️❤️❤️
Great stuff Valerio, this is amazing content - very educational. When you cover the audio features, can you also cover in depth MFCC's, and how they are typically used? I have yet to seen a good treatment of MFCCs and get an intuitive feel for how they work.
Thank you! I intend to cover MFCCs quite in depth. Stay tuned :)
Amazing content Valerio! Thank you!
Loving your content. Thanks a lot!
@9:13 of the video, is it above or below the nyquist frequency?
Thanks a lot for this, It's helping through a project I'm working on. I'm really grateful
You're welcome!
Thanks for your wonderful job, beautifully done!
I’m not sure your aliasing demo actually has aliasing. Usually you would hear nasty artifacts when downsampling so much without an antialiasing filter. Audacity is likely applying an antialiasing filter to reduce the bandwidth of the signal before downsampling it.
appreciate all the details! great
Thank you for this amazing walkthrough; this is going to help me SO much with ML. Also, question for this section 17:37, do you know why we have to divide the bit depth and resolution sampling rate by 1048 window? In other words, why do we divide by 1,048,576 and then again by 8 bytes? Is there some resource on why this is default? (I'm assuming this has to do with the way computers work.)
A bit can either be 1 or 0 (On / Off)
A byte is a group of 8 bits. So if there are 8 bits, where each bit can only be 0 or 1, there are a total of 2^8 = 256 different values that a byte can represent.
So let's consider bit depth * sampling rate = 16 * 44,100 = 705,600 bits per second.
There are 1,048,576 bits in megabits (mega means 1 million, but the closest binary representation of that is 2^20 = 1,048,576). So 705,600 / 1,048,576 = 0.6729 megabits per second.
Remember that 8 bits = 1 byte? 8 megabits is also 1 megabyte. So we just need to divide by 8.
0.6729/8 = 0.0841 megabytes per second = 5.0468 megabytes per minute. I'm assuming it's a typo in the video.
@@johnnyvishnevskiy8090 You're a freaking legend, thank you so much. This all makes sense and now I'm gonna read it like 1,048 times. lol
@@Drew_7 np!
Great one thanks ❤️
When you resample in Audacity, you are not hearing aliasing. Audacity used a LP filter (as any good downsampler should) to avoid aliasing. What you're hearing in the high frequencies being filtered out
This is invaluable. Thanks so much
I got a liitle confused, Does aliasing mean we can hear frequencys higher than our hearing rang after digitalization of signal?
Thank you for good works. I hava a question. (16x44100x60) / (8x1024x1024) = 5.046844
Why 5.49MB?
I guess he intended to say 5.047MB and made a typing error in the slide which he read out.
A question
.
Does the music player in the computer assume/know that audio files are sampled at a particular frequency? Would the music player work fine if the audio files are sampled at a different rate?
Amazing content!!!
Thank you very much. It is a very educational video. ( But in the audio there are some short bass bursts. )
Valerio, thank you for this amazing work. You are helping me a lot, I am studying audio and you are answering all my questions. Do you have any book recommendation for me?
Thank you! Unfortunately, there aren't many resources about AI audio. I'm currently writing a book on the topic.
A book with "traditional" DSP approaches is "Fundamentals of Music Processing" www.springer.com/gp/book/9783319219448
@@ValerioVelardoTheSoundofAI I'll buy yours when it is ready!
@@JogosEtudoMais thanks!
Hi Valerio ! do you perform the conversion in librosa or it need to code from the beginning ? and can i have the example of pseudocode or algorithm for the adc conversion?
Is there any sense in buying a headphones that have transfer range up to 80 000Hz if humans are capable of hearing sounds "only" up to 20 000Hz?
What is the Memory storage format of audio and video file???
your amazing please keep it up!!!
I really enjoyed it. I couldn't understand the profile photo abstractness.
List all the digital formats of audio that are saved in a memory ??please answer it
Would you suggest some reading material to accompany your videos?
Yes, this great book:
- Music Similarity and Retrieval www.springer.com/gp/book/9783662497203
@@ValerioVelardoTheSoundofAI Thanks for the suggestion.
Hi Valerio, I have one doubt that if an audio has a sampling rate of 8000 Hz. Can you say whether it is correct to extract audio features by upsampling the audio to 44100 Hz or 32KHz. KIndly give some suggestions on this
I would stick with the files at 8KHz, which has the advantage of resulting in lighter data.
@@ValerioVelardoTheSoundofAI I believe that you are saying, it is to keep 8000 hz for audio feature extraction. Whether it is wrong/some effects will be there when we have tuned sampling rate to 44100hz and extracted audio features. Kindly reply
@@venkatesanr9455 if you upsample, that would not hurt your signal ... But that would require more data storage to accommodate more samples, as @Valerio mentioned. Moreover, if your signal is bandpass signal of a carrier frequency (not a baseband), there is a limit of upsample as well, I mean you can't upsample your signal as much higher rate as you want
@@imamuddin8042 Thanks for your kind response, Sharif. I have recorded speech samples at sample rate of 8000 Hz. While processing this I had this doubt by upsampling the speech data towards 44.1k or 32kHz and extracting audio features would have any effects.
@@venkatesanr9455 Since you have recorded Speech Signal (that is 20Hz-20 Khz spanned) at sample rate of 8kHz , so already included aliasing effect of signals over 4khz... So I would recommend to put an anti aliasing low pass filter cut off at 4 khz in analog domain, before you sample your data. Or, you can initially use sampling rate 44.1 k , instead of thinking about upsampling it later
Thanks for the video
can you share the music?
What is the digital format of audio that saves in a memory ?
By far the most relevant one is wav
It is so so awesome.
this is great stuff
Thanks!
We have a project that recognizes speech, can i help me that
If you need general feedback, I suggest you to join The Sound of AI community (sign up link in the description). If you need more involved help, I do consulting.
Ok thank you
Thanks a lot! =)
Great Video. You look like a musician in long hair. I wonder if you really know any musical instrument.
Indeed I'm a musician :) I play the piano.
@@ValerioVelardoTheSoundofAI That's wonderful
Acoustic instruments (like the piano) aren't analog instruments (: