Multi Speaker Transcription with Speaker IDs with Local Whisper

Поділитися
Вставка
  • Опубліковано 1 жов 2024
  • In this video, I will show you how to do speaker identification while doing speak to text transcription with Whisper. The Speaker Diarisation is performed with pyannote.audio package.
    Want to Follow:
    🦾 Discord: / discord
    ▶️️ Subscribe: www.youtube.co...
    Want to Support:
    ☕ Buy me a Coffee: ko-fi.com/prom...
    |🔴 Support my work on Patreon: / promptengineering
    Need Help?
    📧 Business Contact: engineerprompt@gmail.com
    💼Consulting: calendly.com/e...
    Join this channel to get access to perks:
    / @engineerprompt
    LINKS:
    WhisperX Github: github.com/m-b...
    pyannote Github: github.com/pya...
    Transcription Notebook: tinyurl.com/mw...
    Speaker Diarization Notebook: tinyurl.com/6v...
    Whisper Transcription: • Use OpenAI Whisper For...
    All Interesting Videos:
    Everything LangChain: • LangChain
    Everything LLM: • Large Language Models
    Everything Midjourney: • MidJourney Tutorials
    AI Image Generation: • AI Image Generation Tu...

КОМЕНТАРІ • 58

  • @AngusLou
    @AngusLou 5 місяців тому +5

    It is amazing, even better if you can do a near real-time speaker diarization and speech-to-text

  • @jorgemorales5584
    @jorgemorales5584 14 днів тому +1

    How can I download the transcript in a .txt file? Thanks.

  • @grantsolomon7417
    @grantsolomon7417 9 місяців тому +5

    Is there a way to have live, real time transcription with diarization?

    • @Tournicoton
      @Tournicoton 5 місяців тому +4

      Also interested!

    • @jamesonvparker
      @jamesonvparker 2 дні тому

      I would also like to know the answer to this

  • @serroipjlan-lp4gv
    @serroipjlan-lp4gv 8 місяців тому +2

    How do you get the .txt .srt .json .tsv .vtt files ? The model works but cant find these files after running the model. Thanks in advance.

  • @BenoitStPierre
    @BenoitStPierre 10 місяців тому +17

    This is unbelievably good timing for me! I just started researching how to do this with pyannote and whisper last night and gave up before starting to integrate the two, and woke up to this in my subscriptions 😅

    • @engineerprompt
      @engineerprompt  10 місяців тому +2

      glad it was helpful :)

    • @station2040
      @station2040 10 місяців тому +1

      Me too! I was building this from scratch using pyannote. I was considering using whisper, but was still sorting out the diarization aspect of it. I was planning on Diarization first, sorting out the gaps, etc. This may save me a considerable amount of time.

  • @SmogyKev
    @SmogyKev 13 днів тому

    What if i want to use large-v3?, did i just change on the "model = whisperx.load_model("large-v2", device, compute_type=compute_type)" to model = whisperx.load_model("large-v3", device, compute_type=compute_type)

  • @AbrahamKoshy
    @AbrahamKoshy 2 місяці тому +1

    After identifying two speakers, I want to completely cut out speaker 2 and create an audio file with only speaker 1 segments joined together. Is this possible?

  • @arthur3183
    @arthur3183 4 місяці тому +1

    That's great, thanks! Could you please explain how you turn the results into a readable txt file, as in the original whisper transcription?

  • @adhirajsingh483
    @adhirajsingh483 10 місяців тому +2

    i would love if you could do a realtime audio transcription using whisper which will exactly output speaker, start and end time and the transcription at that point?

  • @MrJames-d5x
    @MrJames-d5x 9 місяців тому +1

    @PromptEngineering
    This was working really well but it appears the latest WhisperX update broke the colab notebook. Could you please update it? Thanks

  • @Hasi105
    @Hasi105 10 місяців тому +2

    Is there a realtime audio transcription possible?

    • @engineerprompt
      @engineerprompt  10 місяців тому +10

      Yes, I have done it via the API, will make a tutorial on it if there is interest.

    • @Hasi105
      @Hasi105 10 місяців тому +1

      @@engineerprompt Would be nice to get this running localy for assists like memgpt, chat or RP models

    • @henkhbit5748
      @henkhbit5748 10 місяців тому

      Thanks for the video. Would like to see meta seamless4mt for speech to text and the reverse. And it's support more than 1000 languages also... Already use whisper locally for speech 2 text.

  • @J-io1uj
    @J-io1uj 7 місяців тому +4

    >says local whisper
    >shows only google colab

  • @shikharmishra7208
    @shikharmishra7208 7 місяців тому

    AssertionError Traceback (most recent call last)
    in ()
    1 embeddings = np.zeros(shape=(len(segments), 192))
    2 for i, segment in enumerate(segments):
    ----> 3 embeddings[i] = segment_embedding(segment)
    4
    5 embeddings = np.nan_to_num(embeddings)
    Hey, I am getting assertion error, please let me know how to solve this error, Thanks

  • @bryguy7290
    @bryguy7290 Місяць тому

    @engineerprompt I noticed you had AI instructions in the video for "separate speakers" - Would you be able to create a video showing how you got Mac Whisper to to do this, and what are the export results. Thank you.

  • @jasonhoneychurch1
    @jasonhoneychurch1 3 місяці тому

    I completed all the steps and it works but am I missing something isn't it purpose to transfer into a .txt file? so I can read it!

  • @KunalMehta-u3s
    @KunalMehta-u3s Місяць тому

    can we also label the speaker with their name instead of saying speaker 00 and speaker 01?

  • @MartinRadio2
    @MartinRadio2 9 місяців тому

    will the speaker ID Diarization work for multiple audio files with the same speakers? Like If I have multiple podcast episodes, will it always recognize the same speaker as the same ID ?

    • @akashsahu9787
      @akashsahu9787 4 місяці тому

      No, the speaker id is local to a single file only. It will not be able to map two speaker id of two diiferent audio files

  • @thambidurai-qc7cx
    @thambidurai-qc7cx 5 місяців тому

    How do I give custom instructions for my whisper ai model in order to fine-tuning

  • @oumaimamouelhi5161
    @oumaimamouelhi5161 5 місяців тому

    why using whisperx is faster then directly using pyannote ?

  • @harrisonford4103
    @harrisonford4103 Місяць тому

    Can you detect how many speakers are talking at the same time?

  • @pragati_agrawal
    @pragati_agrawal 6 місяців тому

    Hi... Can you please tell me how to transcribe and then translate and the recognize multi speaker

  • @opusmas7909
    @opusmas7909 10 місяців тому +1

    This is so good, I whish that there was a keyboard that uses Whisper locally in the mobile. There is one, but not multilingual.

    • @ilianos
      @ilianos 10 місяців тому

      That's exactly what I was thinking! 😀
      SwiftKey belongs to Microsoft, but I guess if they integrated Whisper into it, there'd be a huge spike in computational ressources that are needed for such a new feature roll out...

    • @ilianos
      @ilianos 10 місяців тому

      What's the name of the one you found (not multilingual)?

    • @opusmas7909
      @opusmas7909 7 місяців тому

      Sorry, I saw the message just now, the name is OpenAi Whisper Keyboard by Kai Soapbox.
      But now I use FUTO voice input

  • @10dollarbanana
    @10dollarbanana 10 місяців тому +1

    You’re the best

  • @Rollex-rr2xq
    @Rollex-rr2xq 7 місяців тому

    Can we save the model by running it on colab.
    Then download the saved model on my cpu machine to do transcribe.
    Is it possible??

    • @engineerprompt
      @engineerprompt  6 місяців тому

      Yes, I think its possible but you will need a GPU to run it.

  • @Webwerp-hs3gp
    @Webwerp-hs3gp 4 місяці тому

    discord link is expired OP

  • @toastrecon
    @toastrecon 9 місяців тому

    Do we know how long it usually takes to get access to the diarization? I submitted my company name and website, but the API calls still aren't working after about 30min. Are those manually approved by the research team?

    • @ruffy9937
      @ruffy9937 8 місяців тому +1

      try making a new token , that seemed to have worked for me

  • @aa-xn5hc
    @aa-xn5hc 10 місяців тому +1

    Super useful, thanks!!!🙏🏻🙏🏻

  • @Slokingseba
    @Slokingseba 8 місяців тому

    whisperx.load_align_model returned that: No default align-model for language: sl
    Does this only work for english? :)

  • @amortalbeing
    @amortalbeing 9 місяців тому

    thanks a lot this was really good.

  • @frazuppi4897
    @frazuppi4897 10 місяців тому

    Thanks a lot!

  • @amin816
    @amin816 10 місяців тому

    hi does speaker diarisation work with other languages

    • @spider279
      @spider279 9 місяців тому +1

      it works for all langages embedded in whisper normally

  • @Nihilvs
    @Nihilvs 10 місяців тому

    very very nice ! i like !

  • @slofo22
    @slofo22 10 місяців тому

    @Prompt Engineering And how does the transcript export to me. I don't see that in the video you show the export of the transcript to: json, txt,srt,vtt

    • @engineerprompt
      @engineerprompt  10 місяців тому

      The result is Json, you will just need to write that to disk

    • @slofo22
      @slofo22 10 місяців тому +1

      @@engineerprompt I'm a newbie, how do I do this?

    • @slofo22
      @slofo22 4 місяці тому +1

      @@iluminathy3210 yes

  • @TirtharajBiswas
    @TirtharajBiswas 10 місяців тому

    What do you use for adding subs to your youtube videos?

  • @carmenlanders6663
    @carmenlanders6663 6 місяців тому +1

    I really wish you had shown more end results of the diarization. I can barely tell if this will work for me. I really wanted to make sure it was worth the time and energy to make this happen.

  • @themodfather9382
    @themodfather9382 8 місяців тому +1

    waste of time, need standalone app

    • @picklenickil
      @picklenickil Місяць тому

      Care to venture in on how would you build something like this at scale?