SUPER Fast AI Real Time Speech to Text Transcribtion - Faster Whisper / Python

All About AI

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 15 січ 2025

КОМЕНТАРІ • 141

@OliNorwell 10 місяців тому ⁺⁸
Epic! - These videos are some of the best stuff on UA-cam - love the idea with the image generation at the end
@theraybae Рік тому ⁺⁷
This is amazing and inspiring. I love the ending of the video and can’t wait for Wednesday. As a dyslexic person I think you unlocked a new use case for learning.
@bim-techs Рік тому ⁺⁴⁴
Tips: You can transform your device's audio output into a "microphone" on Windows, so you don't need to place your headphones over your microphone.
1. Press Windows key + R -> type "mmsys.cpl"
2. In the Recording tab, enable the Stereo Mix option. Now, "Stereo Mix" is an available microphone option! You can select it as the audio input.
@weekendmakeit7760 Рік тому ⁺⁴
this really helped me! Thank you!
@aoeu256 10 місяців тому ⁺³
this a grewt idea, i was using voice meeter as a virtual audio thingy and its complicated to use
@TimothyHuey 2 місяці тому ⁺¹
I enabled the microphone but I don't know how to select it in the code. It doesn't hear anything when I start the app.
@filipphenderson6342 10 місяців тому ⁺¹²⁹
Pulling in people with a flashy thumbnail of a Python code that works and then trying to monetize your code based on a library that is already supposed to be open source is in my opinion bs. it is not fair for beginners that might not know Python or whisper very well. for that I give you a thumbs down!
@christianmccauley7340 5 місяців тому
Wow, an AI channel scamming people? Who would’ve ever heard of such a thing!
Tired of the fucking grifternet man, how did this happen?
@adhinabdallah5587 29 днів тому
@@majestechtn can you please share the code
@Peter55Craig 27 днів тому
@@majestechtn Can you share the code
@Squidling991 День тому
Ohh man !! this is the exact tutorial i need.
great ides and usage of Whisper AI tool.
Also planning to combine different AI into an UI interface .
So cool !
@reddyparthu5978 9 місяців тому ⁺⁹
how to get the code for this?
@jaujud 2 місяці тому
5:51 Neutral = I'm gonna go troll now. Funny stuff, great video! Thanks
@ArmandoMenicacci Рік тому ⁺²
Fantastic !!! A bit fast in explaining and showing, but I can always pause!
@benscottbongiben Рік тому
Good to see transcription and generate responses as audio in real-time for phone call
@cristobalmunoz84 6 місяців тому
Nice video!! thanks for your help in this topics!!
@ReadyMedia-no 11 місяців тому ⁺⁶
There is a product for Live video Transcription there. Live text services are expensive and does not work on many current languages.. Set up a server/service that will ingest a RTMP video source, delay the video and overlay text on video in perfect sync. then offer RTMP output with burned in Live text. :) There is need for this service.
@radudamianov Рік тому
Excellent! Thank you so much for sharing!
@unrealminigolf4015 Рік тому
Awesome bro! ❤
@ItsNsour 9 місяців тому ⁺²
can it translate?
@keeganclarkmusic 6 днів тому
Could you modify this program to work with a text to speech program?
@ferluisch 7 місяців тому ⁺¹
Hey man this is really cool! I'd like to know if you:
1) used the whisper v3 model? or the v2?
2) If you have seen the demos from gpt4, they also showed that gpt ASR is better than whisper v3, wonder if it will be open like whisper.
@enesgul2970 Рік тому ⁺¹
Gerçekten çok iyisiniz.
@HammerOnTheNet Рік тому
Amazing and inspiring work! Kris what about something less powerful but better accessible in terms of hardware?
@JPy90 Місяць тому
What if, instead of putting the headphones over the mic, to receive the signal, you want to send the voice from another app, like UA-cam or Zoom?
@jotixh 9 місяців тому ⁺¹
Is there a way to connect a live streaming url?
@svenborgers6908 10 місяців тому ⁺⁵
I have tried to get this to run on M1 MacBook. No joy. The CPU maxes out even with the tiny model. But then I tried with the Whisper.cpp implementation which is compiled for apple silicon. I found a whisper-cpp-python wrapper for that library. That actually runs and is far less CPU bound. It has a bit of a stutter, it is not as clean, it misses words between the chunk processing but you can see that with just a little bit more power it could work.
@MrThaitrinh 10 місяців тому
Hi Seven, could you please share your code with me? Thank you very much!
@calvinapollos 9 місяців тому
Great video! Thanks for going through this in such an easy-to-understand way! Can you share the python scripts?
@henrijohnson7779 10 місяців тому
@Kris : I already joined as an Adept member on Jan 18th 2024 and requested access to the Github Repo via email and also via Discord but have not had any response from you yet ?
@ytemre 9 місяців тому ⁺¹
I became a member how do I get access to the code and the github for this
@AllAboutAI 9 місяців тому
hello :D send me a e-mail at kris@allabtai.com
@prakashsahu-xn6qy 4 місяці тому ⁺¹
how can i get this code which you used in this videos same code i need.
@royzac7829 11 місяців тому
How does the transcription performance compare to assemblyAI?
@bigswede88 5 місяців тому
Heja Sverige ! Bra jobbat
@t-dsai Рік тому
Thanks for sharing your knowledge/experience.
I'm bit perplexed. The description here mentions 45+ prompts in the PDF book, the newsletter website says 40+, and the PDF doc says 35+. Which number is correct?
@gcardinal 2 місяці тому
none, its a scam.
@kimsteinhaug Рік тому
Interesting stuff on the image creation at the end while talking, not sure if you are taking into consideration puctuation in you sentences? Im pretty sure this would have to do with something cool, maby keeping an overview of all the text that has been moving out of the "buffer" for style ? Looks like something I could have a lot of fun with, do not have the GPU though :/ Colab however.
@maverick1901 11 місяців тому
running fully local is one thing ... doing this via webaudio api towards a backend is a different topic - is there any implementation for that as well foreseen?
@hjoseph777 5 місяців тому
I have been looking where to start, fantastic work, where can I have the code for testing
@110gotrek Рік тому ⁺⁹
Now make it translate and do phone-cals
@rne1223 Рік тому ⁺¹
Noooo…pls nooo. We got plenty auto callers already.
@ibrahimelshenhapy9179 10 місяців тому
@@rne1223
Where?
@luluw9699 3 місяці тому
Hello ur computer has a virus
@isaacmasinde1994 5 місяців тому
Which gpu are you using ?
@haloBean 9 місяців тому
Hi,
Can get the github repo of the above code ?
Thanks
@George-kx8fl 11 місяців тому
Would it be possible to do speaker recognition then pipe it into translation
@unleashAI23 2 місяці тому
where do I get the code sir?
@fredericpaillot2570 Рік тому
Hi Kris! I love what you do, I would like to become a member of your channel, but I can't access the page to subscribe, do you have a direct link? the one in description doesn't work for me.. have a good day!
@JohannaKarlsson 3 місяці тому
Hello and great to see this kind of contents.
I actually have a question about speech to text in another language and for example Swedish.. and passing it throw llama for correction ,.. maybe for a meeting conference or something like that .. what do you suggest ?
@claudiobalderrama1599 10 місяців тому
Do you think this could be used to transcribe, for example, phone calls made through the browser? I would greatly appreciate your response :)
@ryanjames3907 Рік тому
wow !! great video !!! Thank you for being so generous and teaching this to us, this is epic stuff! I can already start see all kinds of use cases, I cant wait to get it running, I'm really looking forward to Wednesday's video . Thanks again from Canada
@thedoctor5478 Рік тому ⁺¹
I think there's an even faster whisper module but I forget what it's called
@AustinKang-wk8cl 3 місяці тому
did you find out?
@maizizhamdo 8 місяців тому
i love your videos man , please video about fastwhisper on docker api please
@renatox5288 2 місяці тому
faster whisper or whisper turbo?
@crazyforhyunwoo119 10 місяців тому
Can I did this with javascript?
@thebigbigdaddy Рік тому ⁺¹
how can we identify different speakers?
@ickorling7328 10 місяців тому
Microsoft co-pilot in a teams call recording transcription. Cant simply call, needs to he a meeting call... subtle difference. Try 'meet now' in teams calender view, or make calendar event.
@magnoliasphinkter8622 4 місяці тому
thanks this is great! Where can I find the actual code you have on your screen? Struggling to find it on the github
@fewyayocx1280 Місяць тому
You get it?
@danielgh4814 10 місяців тому ⁺¹
Hi, I'm a subscriber but I do not have access to your github ,can you helpme please?
@thnmanucian7993 9 місяців тому
Hello. I’m beginner in this major. How can I get your code to refer? Thank you
@gurbachhansingh5715 3 місяці тому
confused can you please create step by step video and provide the code as well.
@martinvizar6430 11 місяців тому
Impresario thank you
@Siri-tz7dz 9 місяців тому
where do i get the setup/python code
@harshitsingh3061 Рік тому ⁺¹
where can we get the code
@crazyforhyunwoo119 10 місяців тому
github linked in the description.
@kylebolt5861 11 місяців тому ⁺¹
How do we join your community?
@AllAboutAI 11 місяців тому
Link in desc :) youtube member
@najafzawar8168 11 місяців тому
@@AllAboutAI just subscribed to your channel but not getting GitHub code..
@RicardoMaciasYepez6913 7 місяців тому
Can this run on raspberry pi?
@ShariqueAM 3 місяці тому
I want to do speech to text Audio from the browser speaker and not from the mic , how can we do that in real time ?
@knmrt2760 9 днів тому
It's easily done via running virtual audio cable (VAC) you can google it and there's a lot of resources out there on how to use this software.
@saqqara6361 6 місяців тому
how to access your sourcecode as a paid channel member?
@leucome Рік тому
Faster whisper and Insanely Fast Whisper don't seem to have AMD gpu support yet. So I had to go with an alternative for the 7900xt. I used wishper.cpp with cuda/HIP + distilled whisper model. Seriously this combination is kinda real-time too, even when using the distil large v2. Though there is a downside to that, the TTS and Whisper on the GPU gobble up like 8GB or vram. This put some limit to the LLM model I can use at same time.
@aoeu256 10 місяців тому
This will be a good tool for language immersion chinese / japanese / indonesian along with the deepl clipboard tool, edge browsers tts engine.
@nouriensha2873 4 місяці тому
Can i convert this code to cpp and implement using Arduino without api
@TonyHoangPodcast 9 місяців тому
does it support speaker diairzation?
@ShariqueAM 3 місяці тому
I want to do speech to text Audio from the browser speaker and not from the mic , how can we do that in real time ?
@maxstauss9579 7 місяців тому
i cant find the script of the realtime translation pls help me finding it :((
@aseel6910 9 місяців тому
If there any way to translate this text to another languages it will be awesome
@agardner-to7vi 6 місяців тому
that is awesome. Sooo i am trying to do something like this. My sister is deaf and i want something that can also just label the who is speaking. So for a small group it will say user 1 user 2 user 3. and who ever is speaking it will let person know. Do you think that is possible.. How could i do that. I got everything but that last part.
@joaopaulonadal8484 11 місяців тому
How can i get acess to this code?
@digitalsoultech Рік тому ⁺²
The accuracy sucks. Many words are incorrect which you can see in the image itself.
This isn't usable in the real world.
@himanshujaviya6021 8 місяців тому
Can we get the code used in this video that would be really helpful
@MuhammadArslan-ch3hx Місяць тому
Can I get the code of it?
@Edward_ZS Рік тому
Has anyone updated the code from the previous video to use this recording method instead?
@huhaifan 3 місяці тому
cannot find the code in github
@nusretalikok823 Рік тому
where can we find the code that you used?
@crazyforhyunwoo119 10 місяців тому
github linked in the description
@eliasbosc 4 місяці тому
Can you pls share you code?
@AlexPopov-hv3kp 7 місяців тому
what is a transcribe_chunk function in the code? Seems that it's not from faster_whisper?
@roygatz Місяць тому
It's model.transcribe() from faster-whisper example
@lutusp Рік тому
Hey, it's in your video description, therefore easily fixed: the word is "transcription". Why not avoid the irony of a video that extols modern AI voice to text ... transcription ... in which the AI engine will surely avoid this mistake, and at the speed of light.
@mattaylor-qg4yw 8 місяців тому
just joined. would be good to get my grubby paws on the files for this.
@عبدالرحيمعبدالرحيم-غ5غ Рік тому ⁺²
could you do another demo to see how it can translate in real time?
@gregh7457 Рік тому
yes! there are no really good or fast translation apps available. UA-cam auto translate is horrible!
@ahmedelkamash9323 8 місяців тому
how can we download this script?
@TimothyHuey 2 місяці тому
All I get is "Thank you! Thank You! Thank you! as my transcribed output....so weird
@naczelnyh8rpolskiegoyt167 2 місяці тому
hey, same problem here, actually exact same problem, have you figured it out?
@TimothyHuey 2 місяці тому
@@naczelnyh8rpolskiegoyt167 Yes I did. I went to Sound Recorder and made a test to see what was actually being recorded and playing it back. There was No Sound. Windows wasn't recording anything for some reason. I guess when nothing is recorded, Whisper hallucinates "Thank You" or sometimes just "You." So weird. But anyway, had to find a way to get the mic that this app was working with to record sound. So I would investigate that route, find out if the mic that this app is accessing is actually hearing anything at all.
@kebman Рік тому
The sentiment analysis really scares me. I mean, there's absolutely no chance that'll be abused by big tech in terms of political marketing. I mean, like, there's no way in hell right?
@erenkaraboga8570 10 місяців тому
Can we take source code ?
@Velnio_Išpera 10 місяців тому
Can you use different languages?
@maxstauss4821 7 місяців тому
iam a member but i cant acces the github pls HELP
@maxstauss4821 7 місяців тому
this i my github
maxaxaxaxxaxaxaax
@ayax5055 Місяць тому
doesn´t work because calls supertools and any file is called supertools, thanks for your time but ...
@kate-pt2ny Рік тому
Kris, you are a genius. Real-time speech transcription can do a lot of things. The last example is great. I can’t wait to watch the video released on Wednesday. My computer is a Mac M chip computer. I found the code in your github and changed it to run on the CPU. Later, some problems occurred, such as incomplete transcribed content and OSError. Can you release a version suitable for Mac computers? grateful
@kebman Рік тому
I might be jaded but... I mean really, how about an AI that calculates the probability of drone attacks or artillery attacks? How about an AI that calculates the probability of soldiers hiding in terrain? I mean, there are already good search algorithms out there, that one may-or-may-not use to carry out artillery strikes. I'm just thinking aloud here. Probably nothing.
@vallu-Tech 9 місяців тому
Bro can you put th video about live streaming voice to text
@avgplayer Рік тому
Waiting for the in deep video :) Btw your discord invite link is expired.
@gurbachhansingh5715 3 місяці тому
Please provode the code
@mujahidali2369 5 місяців тому
welldone
@slimshady91bat 5 місяців тому
ma è gratuito?
@Onlyindianpj 6 місяців тому ⁺¹
This is Presentation not tutorial
@ScaryLasers 6 місяців тому
how do i get access to the github?? TAKE MY MONEY!
lol no but seriously how
@MiguelCayazaya 5 місяців тому
pip install patience and kindness
@fufu9352 10 місяців тому
Zero latency? I have been check your video timeline. terminal output and audio is not correspond. you must be living a world 1-2 second ahead our timeline. 😅
@ramadanhasan1574 Рік тому
Where is the link to this source code ? Thanks amazing
@nafila5084 10 місяців тому
did you get the code
@ramadanhasan1574 10 місяців тому
no @@nafila5084
@KaMingLeung-kk6ey 9 місяців тому
@@nafila5084 Can share the code to me as well?
@curtisnewton895 11 місяців тому
transcriPtion
@AlphaScraperOne 10 місяців тому
🧡
@tharosen-g4q Рік тому
🎈
@gcardinal 2 місяці тому
What a disgusting practice of hiding very basic and poorly written code behind a paywall. No effort, no skill, GPT generated based on million dollar investments shared for free - slamming behind a paywall is as low as you can get as a UA-camr. But you don't care.
@MarxOrx Рік тому
BROOOO 🎉 FIRST
@aspect7083 Місяць тому
This video is so bad, yo ass did not explain anything

Наступне

Автоматичне відтворення

Local Real Time AI Speech to Image | Stable Diffusion, Faster-whisper, Python, ComfyUI ++