OpenAI Whisper Speaker Diarization - Transcription with Speaker Names

OpenAI Whisper: The Ultimate Tool for Audio Transcription and Sentiment Analysis

Training Speaker Identification Models in Python with Speakerbox

Портрет полеглого воїна Андрія Безпалки - на Алеї Слави Героїв у Мачухах

Харламов не сдержал смех 😂 #ComedyClub #КамедиКлаб #галыгин #Харламов #тнт4#тайнасфинкса #конецсвета

ВІКТОРИНА #32. СТАС БОКЛАН ТА КОСТЯ ВОЙТЕНКО х КУРАН ТА ВЕНЯ | Актори проти ВКВ

transcription and speaker identification OpenAI-Whisper and Pyannote [Python]

Mastering Python

Переглядів 13 976

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 13 тра 2023
Hello guys, in this video I will how you how to transcribe and identify the speaker by using OpenAI Whisper, Pyannote and Pydub .
For Pyannote you must register on huggingface website to get the access token.
Support me by subscribing to my channel and leave a like.
Github repository for the source code :
github.com/Mastering-Python-G...
OpenAi github link :
github.com/openai/whisper
Pyannote github link :
github.com/pyannote/pyannote-...
Pydub github link :
github.com/jiaaro/pydub
#openai
#openai_whisper
#pyannote
#pydub
#python
#speaker_identification
#transcription
#diarization

КОМЕНТАРІ • 36

@Yacine_zaki_abderrazzak Рік тому ⁺¹
Thanks man, you deserve the best
@positivevibe142 7 місяців тому
ما شاء الله تبارك أخ محمد .... شكراً لك
@bootneck2222 8 місяців тому
Great video. Thank you. Can the output be displayed on screen whilst it is processing?
@ryanschwartz3340 11 місяців тому ⁺¹
nice video. Is the repo hard-coded to your directory structure? when I tried to change it, it said the format wasn't recognized
@masteringpython 11 місяців тому
do you mean segment file ?
@chungrandy780 3 місяці тому ⁺¹
Is there a colab version?
@hrishikeshnamboothiri.v.n2195 8 місяців тому ⁺³
try to include its requirements.txt also...
Thanks
@lawrencemedina5593 8 місяців тому ⁺¹
conda activate open_chatting does not work on my computer. "EnvironmentNameNotFound: Could not find conda environment: open_chatting
You can list all discoverable environments with `conda info --envs`."
@masteringpython 8 місяців тому
install conda toolkit then create an environment called open_chatting by typing :
conda create --name open_chatting
after that install the libraries that i mentioned in the video then run the code
@ThePikkutyyppi 10 місяців тому ⁺¹
can i use this program to split speakers to their own files? or is this only for transcription
@masteringpython 10 місяців тому
read more about pyannote to see how to split speakers
@ThePikkutyyppi 10 місяців тому
@@masteringpython What? Where?
@leoncezammit2502 7 місяців тому
Im really struggling to get this working, would i be able you to send you my output log ?
@user-iu8le1pl3x 6 місяців тому
Hi, Thanks for the Video. Need approach on how we can implement the solution with the large Audio with duration of 3 hours.
@KamilKaczmarekSolutions 6 місяців тому
chunks
@KamilKaczmarekSolutions 6 місяців тому
chunks and saving .txt from these chunks in files, add logic to see what chunks it already has (if you face error or sth, and you want to come back and don't have to start over, just continue where it left off)
@WhiteShark010 28 днів тому
You have chance.
@Hirotodoroki Рік тому ⁺²
trying to run this but getting File contains data in an unknown format. tried several files and tried a wav file too, but no luck
@masteringpython Рік тому
I advise you to use python anaconda to create development environment .Then install whisper openai ,after installing this library run a simple test to check if everything works correctly .Then install pyannote library and also run a simple test ( read carefully the installation guides maybe you missed something while installing the library)
@nadeembaig5943 Місяць тому
@Hirotodoroki were you able to resolve the error (File Contains data in Unknown Format)?
@user-ej4ol8zv9y 11 місяців тому ⁺¹
does this model work on languages other than English?
@masteringpython 11 місяців тому
onely english
@PaweDuzy 4 місяці тому
@@masteringpython Only english? What is I change model = whisper.load_model("small.en") to "small"? Acording to Whisper github documentation.
@user-zz3iv1qz6v Рік тому ⁺¹
Thanks for the demo. I am getting the following error, even while using your audio.mp3 file:
end = int(millisec(j[3]))
return (int)((int(spl[0]) * 60 * 60 + int(spl[1]) * 60 + float(spl[2])) * 1000)
ValueError: invalid literal for int() with base 10: ''
@user-zz3iv1qz6v Рік тому
@mamido mami Yes, I did that, still getting the same error
@auflute Рік тому
same problem
@user-uy7fc3sf8x Рік тому
same problem
@jbatista2008 10 місяців тому
From the error message and the code, it seems that the error is happening because the millisec function is trying to convert an empty string to an integer.
The millisec function splits a time string, given in the format "hh:mm:ss.sss", into hours, minutes, and seconds, and then converts these components to milliseconds.
Here is an example of the string being parsed:
['[', '00:00:00.998', '-->', '', '00:00:20.622]', 'G', 'SPEAKER_01']
When this loop runs, it returns an empty 'end' string:
for l in range(len(k)):
j = k[l].split(" ")
start = int(millisec(j[1]))
end = int(millisec(j[3]))
The array position you want for 'end' is 4, not 3. Plus, it has a ']' symbol, so it must be cleaned up:
for l in range(len(k)):
j = k[l].split(" ")
start = int(millisec(j[1].rstrip(']'))) # remove trailing ']'
end = int(millisec(j[4].rstrip(']'))) # remove trailing ']'
@enriqueleonmacias249 9 місяців тому
Wow, the transcript takes like two times the duration of the file to process. I guess that this solution wouldn’t work to monitor hours of call recordings unless you use gpu servers.
@masteringpython 9 місяців тому
it is recomended to use cuda ( nvidia gpu ) for speed
cpu is very slow
@patoyrigoyen 11 місяців тому ⁺¹
Does this need GPU?
@masteringpython 11 місяців тому ⁺²
in this video i did not used GPU, but if you want to use it read the pyannote documentation
@ghulamshabbir9532 9 місяців тому ⁺¹
do this work offline ?
@masteringpython 9 місяців тому ⁺¹
yes
@kmillanr 16 днів тому
no code in video

Наступне

Автоматичне відтворення

OpenAI Whisper Speaker Diarization - Transcription with Speaker Names

OpenAI Whisper Speaker Diarization - Transcription with Speaker Names

OpenAI Whisper: The Ultimate Tool for Audio Transcription and Sentiment Analysis

OpenAI Whisper: The Ultimate Tool for Audio Transcription and Sentiment Analysis

Training Speaker Identification Models in Python with Speakerbox

Training Speaker Identification Models in Python with Speakerbox

Портрет полеглого воїна Андрія Безпалки - на Алеї Слави Героїв у Мачухах

Портрет полеглого воїна Андрія Безпалки – на Алеї Слави Героїв у Мачухах

Харламов не сдержал смех 😂 #ComedyClub #КамедиКлаб #галыгин #Харламов #тнт4#тайнасфинкса #конецсвета

Харламов не сдержал смех 😂 #ComedyClub #КамедиКлаб #галыгин #Харламов #тнт4#тайнасфинкса #конецсвета

ВІКТОРИНА #32. СТАС БОКЛАН ТА КОСТЯ ВОЙТЕНКО х КУРАН ТА ВЕНЯ | Актори проти ВКВ

ВІКТОРИНА #32. СТАС БОКЛАН ТА КОСТЯ ВОЙТЕНКО х КУРАН ТА ВЕНЯ | Актори проти ВКВ

«Проти семи окупантів, вів бій з автомату»: «Монгол» про оборону позиції на Запорізькому напрямку

«Проти семи окупантів, вів бій з автомату»: «Монгол» про оборону позиції на Запорізькому напрямку

What is LangChain?

What is LangChain?

Automate Your Meeting Notes with ChatGPT: Instant Meeting Summaries!

Automate Your Meeting Notes with ChatGPT: Instant Meeting Summaries!

it begins… developers LEAVING Copilot

it begins… developers LEAVING Copilot

The Secret to Instant Meeting Summaries: Whisper Diarization Revealed

The Secret to Instant Meeting Summaries: Whisper Diarization Revealed

Best FREE Speech to Text AI - Whisper AI

Best FREE Speech to Text AI - Whisper AI

The Complete Guide to Python Virtual Environments!

The Complete Guide to Python Virtual Environments!

Speaker diarization -- Herve Bredin -- JSALT 2023

Speaker diarization -- Herve Bredin -- JSALT 2023

Самое простое объяснение нейросети

Самое простое объяснение нейросети

I Built a Personal Speech Recognition System for my AI Assistant

I Built a Personal Speech Recognition System for my AI Assistant

🤯СКОЛЬКО ТРАВМ ПОЛУЧИЛ ГЛУПЫЙ ПАРЕНЬ ! #roblox #shorts #zengi #projectfight

🤯СКОЛЬКО ТРАВМ ПОЛУЧИЛ ГЛУПЫЙ ПАРЕНЬ ! #roblox #shorts #zengi #projectfight

⚡️ТОЙ САМИЙ ЕФІР! Портников розніс Латиніну! Ви маєте ЦЕ ПОБАЧИТИ!

⚡️ТОЙ САМИЙ ЕФІР! Портников розніс Латиніну! Ви маєте ЦЕ ПОБАЧИТИ!

Чёткий Нивасик обзор🔥 продажа

Чёткий Нивасик обзор🔥 продажа

Зливи з НАБУ: хто «кріт», скріни листувань фігуранта і перевірка поліграфом

Зливи з НАБУ: хто «кріт», скріни листувань фігуранта і перевірка поліграфом

«Я зрозумів, що ми потрібні» Військовий про перші враження після звільнення з полону РФ

«Я зрозумів, що ми потрібні» Військовий про перші враження після звільнення з полону РФ

ГОРОД ТОЛЬКО ИЗ ОДНОЙ ДОРОГИ - ПОЛНОЕ ВИДЕО ПО ССЫЛКЕ ЧУТЬ ВЫШЕ! #embro #CitiesSkylines2 #shorts

ГОРОД ТОЛЬКО ИЗ ОДНОЙ ДОРОГИ - ПОЛНОЕ ВИДЕО ПО ССЫЛКЕ ЧУТЬ ВЫШЕ! #embro #CitiesSkylines2 #shorts

✈️ ЗСУ відтісняють авіацію РФ за полярне коло

✈️ ЗСУ відтісняють авіацію РФ за полярне коло

«Проти семи окупантів, вів бій з автомату»: «Монгол» про оборону позиції на Запорізькому напрямку

«Проти семи окупантів, вів бій з автомату»: «Монгол» про оборону позиції на Запорізькому напрямку