Multi Speaker Transcription with Speaker IDs with Local Whisper
Вставка
- Опубліковано 1 жов 2024
- In this video, I will show you how to do speaker identification while doing speak to text transcription with Whisper. The Speaker Diarisation is performed with pyannote.audio package.
Want to Follow:
🦾 Discord: / discord
▶️️ Subscribe: www.youtube.co...
Want to Support:
☕ Buy me a Coffee: ko-fi.com/prom...
|🔴 Support my work on Patreon: / promptengineering
Need Help?
📧 Business Contact: engineerprompt@gmail.com
💼Consulting: calendly.com/e...
Join this channel to get access to perks:
/ @engineerprompt
LINKS:
WhisperX Github: github.com/m-b...
pyannote Github: github.com/pya...
Transcription Notebook: tinyurl.com/mw...
Speaker Diarization Notebook: tinyurl.com/6v...
Whisper Transcription: • Use OpenAI Whisper For...
All Interesting Videos:
Everything LangChain: • LangChain
Everything LLM: • Large Language Models
Everything Midjourney: • MidJourney Tutorials
AI Image Generation: • AI Image Generation Tu...
It is amazing, even better if you can do a near real-time speaker diarization and speech-to-text
How can I download the transcript in a .txt file? Thanks.
Is there a way to have live, real time transcription with diarization?
Also interested!
I would also like to know the answer to this
How do you get the .txt .srt .json .tsv .vtt files ? The model works but cant find these files after running the model. Thanks in advance.
This is unbelievably good timing for me! I just started researching how to do this with pyannote and whisper last night and gave up before starting to integrate the two, and woke up to this in my subscriptions 😅
glad it was helpful :)
Me too! I was building this from scratch using pyannote. I was considering using whisper, but was still sorting out the diarization aspect of it. I was planning on Diarization first, sorting out the gaps, etc. This may save me a considerable amount of time.
What if i want to use large-v3?, did i just change on the "model = whisperx.load_model("large-v2", device, compute_type=compute_type)" to model = whisperx.load_model("large-v3", device, compute_type=compute_type)
After identifying two speakers, I want to completely cut out speaker 2 and create an audio file with only speaker 1 segments joined together. Is this possible?
That's great, thanks! Could you please explain how you turn the results into a readable txt file, as in the original whisper transcription?
i would love if you could do a realtime audio transcription using whisper which will exactly output speaker, start and end time and the transcription at that point?
@PromptEngineering
This was working really well but it appears the latest WhisperX update broke the colab notebook. Could you please update it? Thanks
Is there a realtime audio transcription possible?
Yes, I have done it via the API, will make a tutorial on it if there is interest.
@@engineerprompt Would be nice to get this running localy for assists like memgpt, chat or RP models
Thanks for the video. Would like to see meta seamless4mt for speech to text and the reverse. And it's support more than 1000 languages also... Already use whisper locally for speech 2 text.
>says local whisper
>shows only google colab
AssertionError Traceback (most recent call last)
in ()
1 embeddings = np.zeros(shape=(len(segments), 192))
2 for i, segment in enumerate(segments):
----> 3 embeddings[i] = segment_embedding(segment)
4
5 embeddings = np.nan_to_num(embeddings)
Hey, I am getting assertion error, please let me know how to solve this error, Thanks
@engineerprompt I noticed you had AI instructions in the video for "separate speakers" - Would you be able to create a video showing how you got Mac Whisper to to do this, and what are the export results. Thank you.
I completed all the steps and it works but am I missing something isn't it purpose to transfer into a .txt file? so I can read it!
can we also label the speaker with their name instead of saying speaker 00 and speaker 01?
will the speaker ID Diarization work for multiple audio files with the same speakers? Like If I have multiple podcast episodes, will it always recognize the same speaker as the same ID ?
No, the speaker id is local to a single file only. It will not be able to map two speaker id of two diiferent audio files
How do I give custom instructions for my whisper ai model in order to fine-tuning
why using whisperx is faster then directly using pyannote ?
Can you detect how many speakers are talking at the same time?
Hi... Can you please tell me how to transcribe and then translate and the recognize multi speaker
This is so good, I whish that there was a keyboard that uses Whisper locally in the mobile. There is one, but not multilingual.
That's exactly what I was thinking! 😀
SwiftKey belongs to Microsoft, but I guess if they integrated Whisper into it, there'd be a huge spike in computational ressources that are needed for such a new feature roll out...
What's the name of the one you found (not multilingual)?
Sorry, I saw the message just now, the name is OpenAi Whisper Keyboard by Kai Soapbox.
But now I use FUTO voice input
You’re the best
Can we save the model by running it on colab.
Then download the saved model on my cpu machine to do transcribe.
Is it possible??
Yes, I think its possible but you will need a GPU to run it.
discord link is expired OP
Do we know how long it usually takes to get access to the diarization? I submitted my company name and website, but the API calls still aren't working after about 30min. Are those manually approved by the research team?
try making a new token , that seemed to have worked for me
Super useful, thanks!!!🙏🏻🙏🏻
Thank you
whisperx.load_align_model returned that: No default align-model for language: sl
Does this only work for english? :)
thanks a lot this was really good.
Thanks a lot!
hi does speaker diarisation work with other languages
it works for all langages embedded in whisper normally
very very nice ! i like !
@Prompt Engineering And how does the transcript export to me. I don't see that in the video you show the export of the transcript to: json, txt,srt,vtt
The result is Json, you will just need to write that to disk
@@engineerprompt I'm a newbie, how do I do this?
@@iluminathy3210 yes
What do you use for adding subs to your youtube videos?
Descript.com
I really wish you had shown more end results of the diarization. I can barely tell if this will work for me. I really wanted to make sure it was worth the time and energy to make this happen.
waste of time, need standalone app
Care to venture in on how would you build something like this at scale?