How to Install & Use Whisper AI Voice to Text
Вставка
- Опубліковано 2 лип 2024
- In this step-by-step tutorial, learn how to transcribe speech into text using OpenAI's Whisper AI. Whisper AI is an AI speech recognition system that can transcribe and translate audio files in approximately 100 different languages.
📚 RESOURCES
- Install Python: www.python.org/
- Install PyTorch: pytorch.org/get-started/locally/
- Install Chocolatey: chocolatey.org/
⌚ TIMESTAMPS
00:00 Introduction
00:40 Install overview
01:00 Install Python
02:31 Install PyTorch
03:55 Install Chocolatey package manager
04:53 Install ffmpeg
05:28 Install Whisper AI
05:59 Transcribe one file
07:18 Output files
07:58 Transcribe multiple files
08:39 Available models
09:51 Transcribe in other languages
10:31 Translate to English
11:06 Help
11:40 Quality
12:04 Uninstall
12:14 Wrap up
📺 RELATED VIDEOS
- Run Whisper AI in the cloud for free using Google Colab: • Best FREE Speech to Te...
😢 Uninstall instructions:
- Uninstall Whisper AI
In command prompt, enter:
pip uninstall openai-whisper
- Uninstall ffmpeg
In command prompt, enter:
choco uninstall ffmpeg
- Uninstall Chocolatey
In File Explorer, delete the folder:
"C:\ProgramData\chocolatey"
- Uninstall PyTorch
In Command Prompt, enter:
Pip3 uninstall torch torchvision torchaudio
- Uninstall Python
Go to Installed Apps in Windows Settings, search for Python and Python Launcher, click the three dots, and then uninstall.
📩 NEWSLETTER
- Get the latest high-quality tutorial and tips and tricks videos emailed to your inbox each week: kevinstratvert.com/newsletter/
🔽 CONNECT WITH ME
- Official web site: www.kevinstratvert.com
- LinkedIn: / kevinstratvert
- Discord: bit.ly/KevinStratvertDiscord
- Twitter: / kevstrat
- Facebook: / kevin-stratvert-101912...
- TikTok: / kevinstratvert
- Instagram: / kevinstratvert
🎒 MY COURSES
- Go from Excel novice to data analysis ninja in just 2 hours: kevinstratvert.thinkific.com/
🙏 REQUEST VIDEOS
forms.gle/BDrTNUoxheEoMLGt5
🔔 SUBSCRIBE ON UA-cam
ua-cam.com/users/kevlers?...
🙌 SUPPORT THE CHANNEL
- Hit the THANKS button in any video!
- Amazon affiliate link: amzn.to/3kCP2yz (Purchasing through this link gives me a small commission to support videos on this channel -- the price to you is the same)
#stratvert #whisperai #openai - Наука та технологія
Run Whisper AI in the cloud using Google Colab (requires no install and is also free): ua-cam.com/video/8SQV-B83tPU/v-deo.html
Didn't work for me. I just get error reports
Works great for me using Co-Lab. Or on my hard drive. Both work great.
But here's something:
I have multiple gmail accounts. And I have a number of tools, add-ons, extensions to Google Drive/Docs/Sheets, including Co-Lab, Apps Scripts, etc.
And I initially set them all up on one google account. But when I go to set up those same tools in my other google drive accounts, I get an error message, and can't do it.
It seems that I can't have stuff in Co-Lab, for example, in more than one google account.
there is a way with the installation on windows to use whisper OFFLINE?
@@francescooliva5951 once you install, you can use offline.
@@KevinStratvert so the only time i go online Is to download for the First Time the pre-trained model?(tiny/medium/large according to my choice)? I have a AMD Radeon 530 GPU... But whisper seems to not read It. In fact i use 99% of my CPU in task manager. What Is the medium time to transcribe a medium kind of file?
Gosh, Kevin, this is the first video I've seen of your and I am mightily impressed! I've been in IT for over 30 years and I can tell you that your presentation is one of the leanest and meanest I've ever seen. What a great contribution this is to the community. Thank you very much!
for the ones having issues with "file doesn't exist" you have to make sure that you add the file type at the end even if its not named that. For example if you file is named "file" and its an mp3 then you must type in whisper file.mp3. Hope this helps because this was not specified
I need help FP16 is not supported on CPU; using FP32 instead.. what does this mean?
@@lauram14 nothing, just more ram used and low speed
thank you i was stuck for two week now its work
still facing the issue for m4a file... is it possible we need to give only certain file types
wait why are you here?
This is probably my favorite video on UA-cam ever. It is amazing. It takes a process that I found complicated and turns it into easy to follow steps. It actually takes what could be stress inducing and makes it relaxing with some unintentional ASMR presentation. Very well done.
Thank you for doing a complete walkthrough, unlike so many other UA-camrs who act like they're being thorough but later find out they're skipping small but essential steps as if we already know!
Amazing walkthrough. Thank you. You've made something that would have been overwhelming for me and taken me hours (if I could do it at all) seem so easy and I was done in under half an hour!!
Thank you , Kevin, for sharing your knowledge and teaching skills in this and your other UA-cam contributions. I followed this UA-cam video to the letter and was able, with only a few hiccups (of my making), to transcribe very important audio files my wife recorded on her iPad. My Win 10 PC did the job flawlessly to my wife's stringent specifications. Happy wife, happy home. I first tried your excellent video, "Audio to Text" which was satisfactory for very small audio files due to the limited capacity available through Microsoft. The AI system worked very well on a 6 mb audio file (four pages of text in a MS Word file). I haven't yet tried a larger file size but believe it would work fine for larger files. Again thank you for all you do, for sharing your selfless talents and wonderful passion for what you present.
Dude, I can't tell you how many times you've saved me in my IT job. I'm an AV specialist and have never used python, but with your videos I've been able to use Whisper on google collab for short pieces and with this, a 40 minute piece, with no struggle. Your written and video guides are pretty incredible.
I had previous success with your Stable Diffusion video for a local install. It was the only one I found that was clear and perfectly detailed! This video also was excellent, I just followed your step by step instructions and everything is working great!
Amazing job Kevin. My first attempt at installing Whisper was bad, but your video had me running in no time.
Hi Kevin. Been watching you for awhile and just want to say thanks for all the explanations. Concise and interesting. You've helped me a lot and, again,
I thank you. Keep it rolling!
It worked after some serious debugging but couldn't have done it without this video. Thank you a ton!!
You know, this is one of those videos that you wish you could like 100 times. Much appreciated, man. Amazing video. Thank you so much. Subscribed
Wow, many thanks Kevin. I had my own videos that I was planning to do Voiceover and found it very difficult to listen to and translate the video, this way I was able to generate Arabic text and it is pretty good and even the translate feature to English is excellent. This video solved a lot for me, and I have tested it, and very promising. Many thanks again.
Another incredibly useful video and so very easy to follow as well! It works perfectly for my large assembly recordings. Thanks so much Kevin. You're such a great teacher, I just love your stuff!
YOUR VIDEO IS AMAZING!!! It helped me so much with learning languages, I used this whisper program, converting speech to text, and then I used chat GPT as a super translator, IT IS ABSOLUTELLY AMAZING. Thanks to this video I did in 1 day the amount of work for 4 days. The quality of Whisper is absolutelly amazing. Kevin Stratvert is the BEST, Thank you
This was indeed a helpful video, even if I wish you skipped package managers for ffmpeg installation. I got Whisper installed and working, testing transcription on a recording of a 70 minute meeting. With a fairly muscular PC, I tried with small, medium and large models. Surprisingly I got more accurate results with small, in addition to quicker results. Great tool, wonderful intro.
I was recently thinking how great it would be to have Whisper local, instead of online only. And, voila, here's Kevin! Readin' minds, and don't even know it; well, you do now. Thanks!
Many thanks for another excellent video. Some of the versions from this video have been updated but I was able to find the ones you mentioned so everything is working as expected.
I teach English (and digital literacy) and sometimes wanted a transcript for an interesting podcast. This is great as it is free, has no time limit and offers other languages which I am keen to test soon. I also like your video which shows how to use the online version of Word.
Btw, I use some of your tutorials in the classroom for my MS app classes and the students love your videos too. The only adjustment I do is slow down the playback as it is sometimes a little too fast for my learners :).
Many thanks again and please keep up the good work!
This is the first time I got a video playing straight after its release!
It worked on Python 3.11.4 and the latest PyTorch! I used a CPU and a 1 minute speech took 4 minutes to be transcribed using the small model, 10 minutes using the medium. The installation in the cloud (00:35) is much faster, with the result in under 1 minute. The medium model can recognize technical words. Thanks for showing this tool.
Thanks again, Kevin, for a very useful video. Nice to see Python at work. It reminds me of old-time programming - at least a little. I am 71 and wrote my first program using punch cards... :)
I dropped my whole stack of punch cards once :)
I was just telling my buddy about that. I think AI is going to be as big a jump as punch cards/numbered lines to named variables was
@@noreenstxs9605 I did that back around 1970!
I'm 72... Cards.. IBM 1130 Fortran Apple 2 Pascal 😅
@@hubertmallard7254 Yeah, same here. Programmed in Fortran, Cobol, Pascal, etc. What about the TRS-80? Remember those?
Thank you very much Kevin. Your channel helps even laymen like myself appear like tech nerds when I share these solutions with friends. And I always recommend your channel to them.
I use this to watch movies in other languages and this has boosted my language skills more than anything. I feel like language learners are never thought of because they make up such a small percentage of any user base and are quite silent. Thanks Kevin.
This is why I love internet! To execute a neural network you just have to follow simple guidelines!
There are issues and stuff to figure out yourself, but this is such a great jumpstart!
This is one of the best step-by-step instructions I've ever seen. Thank you!
It is really amazing how good it is at transcribing songs! Using that for my home build arranger/karaoke keyboard :)
Thank you so much. Great instructions with exactly the right level of detail. Got whisper running on first try.
Thank you! This is exactly what we needed to transcribe our tiny DnD podcast!
It's a great help to sort and summarize important info from a vidz 😊. Thank you mr. Kevs!
Wow! Really impressed how quick and easy this was. Would love a follow up video on how to incorporate something like pyannote to this so that we can also have speaker diarization!
As an educator, I really like you style of explaining. Tnx
Love these videos, Kevin Keep them coming man!
Thank You! I was able to transcribe my mp3 file. Excellent technology for next week's online course.
Thank you Kevin for sharing your walkthrough, been looking at paid at platform for transcription. So easy when you know how
So useful and clearly presented, never stop making videos
bless your soul, my assignment would've never been submitted on time if it weren't for this video 🙏
Incredibly helpful. Thank you.
Whenever I want to use some (free and very useful) open-source tool I'm always baffled how difficult unintuitive it is to get it running by yourself
Outstanding tutorial as always Kevin. Thank you. I used this to transcribe my recording of a 45-minute webinar so I could read along and highlight as I listened to the replay. It took just 11 minutes on my high-end gaming computer with a Geforce RTX-3060 Ti graphics card. Very useful tool!. ‼
SOunds great, which model did you use? the default small model or a higher one?
@@generalgeert I used whisper -model medium
Which CPU did you have for that transcribe? Thank you
I have an RTX 7800xt, but when transcribing it is the CPU that does the work... how do I use the GPU?
amazing tutorial. Thank you for this super high quality well thought out tutorial. went super smooth.
Excellent how-to, easy to follow and descriptive. Thanks!
UPDATE:
This is truly the holy grail.
For technical writers, journalists, people who do tons of interviews that need accurate transcription. For paralegals.
This is a game-changer.
I had used the one via Co-Lab before, per Kevin's instructions.
But you are limited to three transcripts a day or something.
With this on my HARD DRIVE, I can translate multiple files.
I assume there's no cap, no limit.
Getting the transcript in all those multiple formats Kevin shows in the video? Almost too good to believe.
I don't have a dedicated graphics card, so I chose "CPU." (Hence the slowness, I reckon.)
I DO have an i7 processor.
But it's a laptop with only 8GB of RAM, and no ability to add more.
I want a desktop so that I can upgrade RAM, get a dedicated graphics card, upgrade processors, etc.
For more of this kind of thing. Automation. Some heavy lifting.
------------------
Okay. Seems to be running. Slowly, but running.
I now have at least two different versions of Python installed on my PC. Installed 3-10-10 just for whisper. Already had 3-11 to run globally.
I always make choosing an installation location more complicated than it has to be.
But I don't want to run into compatibility problems with the various versions of Python -- plus, I don't know what the implications are as far as Environment Variables, and the fact that the various versions all have to call ffmpeg, chocolatey, or selenium, or whatever.
I installed 3.11 in the default location for Program Files.
In installed 3-10-10 in a folder directly on C drive that I created for it, called python-3-10-10.
I think that part of the key to success here is following kevin's protocol of going to the folder where the audio files are at, and typing CMD directly into the address bar FROM THERE. (I've seen one or two other vids about python. No one mentioned this good tip.)
Anyway, with my limited knowledge, I think it's like this:
I've installed the following globally:
chocolately
ffmpeg
python 3-11
pytorch.
Then, I've installed 3-10 locally, in a folder on the c drive.
I bring my audio files into that 3-10 folder, enter CMD into the address bar there, and all is well.
I'm running 3-10, and still, I guess, accessing all those global resources that I need to.
I have a similar hardware setup (no CUDA, only CPU), and been wondering how long does it take to transcribe a 1-hour long video file using the --large model. What's been your experience?
@@antipupsz2411 Yup, I believe it was faster using Co-Lab. The advantage of using it on your hard drive, though, is transcribing multiple files. Set and forget it, go outside.
@@antipupsz2411 hey how do you achive to transcribe 1 hour. I tried 1 hour .mkv file but everytime it only transcribe 1 minute :(
@@etnisu You have to wait a lot for it to keep transcribing
@@antipupsz2411 Hey I'm transcribing one hour as well and it's been like 4 hours and only now it's halfway. I'm using medium model with my 6GB GPU and this is very slow. How long did it take for you?
You are a legend. What an amazingly helpful and easy to listen to tutorial on this.
Awesome! Thank you so much. You helped me actually get this to work (after watching several other videos!).
My brother you have saved me literally over a thousand hours of work. This made a life-changing improvement on my productivity
Running flawlessly for me. What a fantastic guide. I had to download latest version of pip to work but no hitches installing anything for me.
dude thank you this actually worked compared to other tutorials!
Great instructional video. Clear and informative. Thank you.
Bro I dunno what to say but this is the thing that I have been looking for. Thank you a lot.
This exercise gave me some solid experience troubleshooting errors. I had to pull teeth to get Homebrew (using a Mac) to install properly, and then had an SSL certificate error, but Google & Stack Overflow came to the rescue, and Whisper is working like a charm. Thanks for the great video!
By the way, if anyone gets an SSL certificate error using Python3 (which apparently is common), just enter the following in terminal, exactly as written (but check your version*):
/Applications/Python\ 3.11/Install\ Certificates.command
* Just adjust the version number to match your release, in the example above, I updated it to 3.11
people like you further motivate me to share my knowledge with the internet. Thank you so much! you have saved me a ton of time.
I have this problem but when I try your solution the terminal says: "no such file or directory: /Applications/Python" Do you know how to fix that?
Kevin, you are my tech genius! That came in the right time.
Thanks heaps for your amazing video:)
Very useful thanks. As always very clear succinct videos
Awesome tutorial. Thanks Kevin. Whisper AI is an amazing tool.
Very cool my dude, thank you for helping with this. I would have never gotten this on my own
Thanks so so much this great programme .Right now l am running
an English school . During this Covid 19 it is really hit my business so bad . I will share this useful app to help up my students . Again thanks so much .
Crystal clear tutorial. Worked the first time trying. Thanx buddy! 😁
Great to hear!
incroyable!! merci beaucoup j'ai tout compris c'était méga clair. bravo continue comme ça.
Thank you Kevin for what you do. I followed the instructions. I added the following in case some newbies wanted this.
I installed Python version 3.11.5 in Windows 11 and it works fine. In Windows Explorer, I created a folder under the C: Drive called Whisper. I then copied my mp3 audio file (from data drive) to C:\Whisper, typed in cmd in the address field to bring up the Command Prompt, and then typed
whisper filename.mp3 --model medium [and then Enter].
A 36-minute conversation (50mb) took a little over 39 minutes to run. I then cut all the files from C:\Whisper and pasted them into a folder on my data drive. Then I copied the text version into a version of Word that I don’t pay a monthly fee for and saved it. 😊
Hope this helps someone.
I tried Python 3.11.5 too, but every time i go in my C:\Whisper folder and type in CMD where I type in Whisper test.wav it says:
FileNotFoundError: [WinError 2] The system cannot find the specified file
Do you know a solution?
Well, I got it to work so I'm good. Your instructions are excellent!
Appreciate your teaching Kevin, love and respect from Singapore :)
I was able to transcribe and translate audio with Whisper!! Thank you so much!!
>M
Amazing! Thanks fpr such a helpful video, dude!
Brooo! The CMD trick is so good!
This was very helpful. Thanks a lot!
Well, it is a wonderful video and useful too, but it's taking longer time to load the transcript. Thanks to you Kevin!!
Thank you for great video, Kevin!
Awesome work Kevin. Subscribed
God bless you! Thank you for explaining the process in a simple and easy to follow way.
THANK YOU - this tutorial is fantastic.
It's working !!! Thank you for help ))
Excellent tutorial. Great job. Thank you
Thanks, Kevin. Super helpful!
This is great, but I wish there was a way to output in a Word document and segment by speaker - more a comment on whisper functionality than the video. Great work!
Worked for me. Thanks... good content.
Fantastic video. I'm going to grab the transcript and start installing on another i7 laptop and see what happens. Thank you sir!
Thank you very much, great walkthrough and thanks for the uninstall informations too
great explanation and all straightforward
extremely good tutorial! Thank you!
just 2 words for you.. you are incredibly awesome.
Works perfect! Thanks!
Thanks for the clear instructions to use the tool. It works on python 3.11.1 also albeit with a few errors that can be ignored
I'm running it on python 3.11.5 error free.
great illustration and I have successfully installed it on my computer. Thank you @kevin
omggggg!!!!!!!! this was so smooth .... Thanks!!
Worked for me, thank you.
Yes, I was! Thank you very much!😃
thank you for that guide, simple and to the point, but full of info, like.
and yes, ive install and use whisper, it works, somewhere lose correct endings of words or choose wrong letter, but it have insanely quality of transcribation even for Russian lang on normal base.
Hi Kevin, thank you for the training video.
Thank you for all the time you put into making this step-by-step guide, all worked, yay! It did, however, take over 2,5h to transcribe a 40 min interview in .wav. Is that how it's supposed to be? Anyone else noticing similar sluggishness? 🤔
Just awesome!!! Thanks a TON :buddy )
Your content is a jewel, ty!
I was not able to continue since you provided Chocolatey but no further explanation if using MAC.
Anyway awesome work.
I would have never managed without this video. Thanks man
Also, 2.7 Gb for Torch. Wow!!
Good tutorial! Easy to follow
Thanks! Worked perfectly ;)
Super helpful, thank you so much
Amazing video dude, thanks!
Nice video. I was able to install Whisper on Ubuntu 22.04 LTS and transcribe without a hitch.😄 Well done.
worked perfectly TY
Thank you for the details. I like your tutorial being logical and explaining things from the base. I am curious about the text being split into each clip. what those clips were split based on? if the audio is 2-person conversation, will each clip be based on person. I am stuck on person identification using whisper
This worked great!