Dark AI Agents: The Most Dangerous AI Today?

Intro to RAG for AI (Retrieval Augmented Generation)

26 Incredible Use Cases for the New GPT-4o

Литвин - свадьба с Адель, конфликт с Моргеном, кража челленджей Меллстроем

Арестович: Разворот Зеленского. Дело идет к миру. Сбор для военных👇

УГАДАЙ КОНТЕЙНЕР - ЗАБЕРИ ТАЧКУ: Варпач, Булкин с дедом, Юра Волков, Никитос, Блуд, jetcar

GPT-4o Low Latency Screen to Voice Tutorial - SUPER IMPRESSIVE OCR!

All About AI

Переглядів 10 088

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 12 лип 2024
GPT-4o Low Latency Screen to Voice Tutorial - SUPER IMPRESSIVE OCR!
👊 Become a member and get access to GitHub and Code:
/ allaboutai
🤖 Great AI Engineer Course:
scrimba.com/learn/aiengineer?...
🔥 Open GitHub Repos:
github.com/AllAboutAI-YT/easy...
📧 Join the newsletter:
www.allabtai.com/newsletter/
🌐 My website:
www.allabtai.com
Today we recap my livestream where i built a low latency screen to voice reader with great ocr capabilites. This will look at the screen, answer any question or explain a problem, with pretty low latency pre new voice mode from GPT4o.
00:00 GPT4o Screen to Voice Intro
00:57 GPT4o Flowchart
01:42 Lets Build The Screen Reader
06:05 First Test
07:05 Lets Build The Voice
09:48 Second Test with Voice
10:32 Adding Control Key
11:05 Final Tests
Наука та технологія

КОМЕНТАРІ • 24

@JohnSmith762A11B Місяць тому ⁺⁷
Dude. That thumbnail is terrifying. 😂
@Ms.Robot. Місяць тому ⁺⁶
Legit shit. A real coder pwning the Ai matrix❤.
@Ginto_O Місяць тому
yeah he wrote so much code
@watchdog163 29 днів тому
@@Ginto_O
Where is your code?
@BThunder30 Місяць тому
Cool. You projects are always amazing. The local open source projects are the most amazing and interesting to me.
@3choff Місяць тому ⁺²
Pretty cool project idea. If you don't mind, I stole it and use Gemini Flash to analyze the images; it's pretty fast too. You should try it.
@lokeshart3340 Місяць тому
U know always here to support u
@ksem1337 Місяць тому ⁺¹
I need of tech like that for my desktop virtual 3d assistant.
I have a 3d model of a character (AI agent) that has to interact with computer in many interesting ways up to controlling pixels of the screen by itself, for example if it want to impose a an object to interact with virtual space. I hope soon enough we will have enough speed and power for AI agents to be sentient and working seamlessly with any type of information.
@dniliveact Місяць тому
Amazing stuff 😮
@taoxu1798 Місяць тому
Awesome
@protimaranipaul7107 Місяць тому ⁺¹
Being a member I have been trying to access the github repo, I have sent multiple emails to the provided email address, yet to receive a response it has been 48hrs. Please advise.
@enthuesd Місяць тому
This is great. Can we add voice prompt?
@3-deez Місяць тому
is there a copy of the code you used in the documentation you sent to OpenAI in your first prompt?
@branislannjemec9050 21 день тому
Do you know when will be having an access to gpt 4o voice api
@pedrorafaelnunes Місяць тому
Im from Portugal, the portuguese is a mixture of mostly Portuguese from Brasil and a lil bit of Portuguese from Portugal heheh
Spanish is not my primary language but it is not that bad also !
@pedrorafaelnunes Місяць тому
Btw nice project ! :D
@abhishekrakhe2788 Місяць тому
Hey how do i get access to git and discord?
@PTHastings Місяць тому
🎯 Key points for quick navigation:
00:00 *🖥️ Overview of the project setup*
- Setting up for screenshot analysis using GPT-4o
- Detailing the low latency approach for image understanding
- Collecting documentation and writing the initial iteration of the script
02:18 *🛠️ Implementing functions and configurations*
- Fetching documentation from OpenAI for implementing GPT-4o with image inputs
- Inclusion of functions from prior projects to streamline the process
- Utilizing EnV files to fetch the OpenAI key for configuration
07:21 *🔊 Integrating text-to-speech functionality*
- Obtaining OpenAI documentation for speech-to-text-to-speech functionalities
- Implementing a feature to read out responses using TTS
- Troubleshooting and fixing errors in the TTS APIs and configuration
10:55 *🎛️ Controlling the main function with a trigger key*
- Adding a feature to control the main function trigger using a key command
- Testing the control setup with screen prompts for AI responses
- Demonstrating the capability of the system to respond effectively with controlled triggers
Made with HARPA AI
@tumbalasu3718 Місяць тому
Is that need gpu?
@protimaranipaul7107 Місяць тому
Can you please share the code
@lokeshart3340 Місяць тому ⁺¹
I am 3rd
@MudroZvon Місяць тому
What is Anal Ysing?
@luisvictorf Місяць тому ⁺²
Spanish isn't really Spanish if it's speaking with an US accent...

Наступне

Автоматичне відтворення

Dark AI Agents: The Most Dangerous AI Today?

Dark AI Agents: The Most Dangerous AI Today?

Intro to RAG for AI (Retrieval Augmented Generation)

Intro to RAG for AI (Retrieval Augmented Generation)

26 Incredible Use Cases for the New GPT-4o

26 Incredible Use Cases for the New GPT-4o

Литвин - свадьба с Адель, конфликт с Моргеном, кража челленджей Меллстроем

Литвин — свадьба с Адель, конфликт с Моргеном, кража челленджей Меллстроем

Арестович: Разворот Зеленского. Дело идет к миру. Сбор для военных👇

Арестович: Разворот Зеленского. Дело идет к миру. Сбор для военных👇

УГАДАЙ КОНТЕЙНЕР - ЗАБЕРИ ТАЧКУ: Варпач, Булкин с дедом, Юра Волков, Никитос, Блуд, jetcar

УГАДАЙ КОНТЕЙНЕР - ЗАБЕРИ ТАЧКУ: Варпач, Булкин с дедом, Юра Волков, Никитос, Блуд, jetcar

От первого лица: Школа 6 🤩 НОЧЬ с ДЕВУШКОЙ ❤️ СЛОМАЛИ КАРЬЕРУ ДИРЕКТОРА 😭 ФИНАЛ ГЛАЗАМИ ШКОЛЬНИКА

От первого лица: Школа 6 🤩 НОЧЬ с ДЕВУШКОЙ ❤️ СЛОМАЛИ КАРЬЕРУ ДИРЕКТОРА 😭 ФИНАЛ ГЛАЗАМИ ШКОЛЬНИКА

Why Agent Frameworks Will Fail (and what to use instead)

Why Agent Frameworks Will Fail (and what to use instead)

Build a Meta Llamma-3 Powered Voice Assistant with Ollama and Python

Build a Meta Llamma-3 Powered Voice Assistant with Ollama and Python

Two GPT-4os interacting and singing

Two GPT-4os interacting and singing

Local Low Latency Speech to Speech - Mistral 7B + OpenVoice / Whisper | Open Source AI

Local Low Latency Speech to Speech - Mistral 7B + OpenVoice / Whisper | Open Source AI

Kyutais New "VOICE AI" SHOCKS The ENTIRE INDUSTRY!" (Beats GPT4o!)

Kyutais New "VOICE AI" SHOCKS The ENTIRE INDUSTRY!" (Beats GPT4o!)

Humanity Is Not Ready For These AI Voice Conversations.

Humanity Is Not Ready For These AI Voice Conversations.

10 Insane AI Tools in DaVinci Resolve Studio!

10 Insane AI Tools in DaVinci Resolve Studio!

Make an Offline GPT Voice Assistant in Python

Make an Offline GPT Voice Assistant in Python

ChatTTS - Best Quality Open Source Text-to-Speech Model? | Tutorial + Ollama Setup

ChatTTS - Best Quality Open Source Text-to-Speech Model? | Tutorial + Ollama Setup

Собери ПК и Получи 10,000₽

Собери ПК и Получи 10,000₽

Хотела заскамить на Айфон!😱📱(@gertieinar)

Хотела заскамить на Айфон!😱📱(@gertieinar)

Worlds smallest 4K headset 😎 Visor.com #tech #vr #technology #virtualreality #insideout2

Worlds smallest 4K headset 😎 Visor.com #tech #vr #technology #virtualreality #insideout2

iOS 18 Beta 3 обновление! Что нового iOS 18 Beta 3? Нагрев iPhone пропал на iOS 18 Beta 3!

iOS 18 Beta 3 обновление! Что нового iOS 18 Beta 3? Нагрев iPhone пропал на iOS 18 Beta 3!

Как удвоить напряжение? #электроника #умножитель

Как удвоить напряжение? #электроника #умножитель

Samsung Z Flip/Fold 6, Watch Ultra, Buds Pro and Ring Impressions!

Samsung Z Flip/Fold 6, Watch Ultra, Buds Pro and Ring Impressions!

Samsung Galaxy Unpacked July 2024: Official Replay

Samsung Galaxy Unpacked July 2024: Official Replay

Я купил ВСЕ НОВИНКИ FIFINE с Aliexpress и протестировал их! Микрофон, колонки, аудиокарта, наушники

Я купил ВСЕ НОВИНКИ FIFINE с Aliexpress и протестировал их! Микрофон, колонки, аудиокарта, наушники