GPT-4o Low Latency Screen to Voice Tutorial - SUPER IMPRESSIVE OCR!
Вставка
- Опубліковано 12 лип 2024
- GPT-4o Low Latency Screen to Voice Tutorial - SUPER IMPRESSIVE OCR!
👊 Become a member and get access to GitHub and Code:
/ allaboutai
🤖 Great AI Engineer Course:
scrimba.com/learn/aiengineer?...
🔥 Open GitHub Repos:
github.com/AllAboutAI-YT/easy...
📧 Join the newsletter:
www.allabtai.com/newsletter/
🌐 My website:
www.allabtai.com
Today we recap my livestream where i built a low latency screen to voice reader with great ocr capabilites. This will look at the screen, answer any question or explain a problem, with pretty low latency pre new voice mode from GPT4o.
00:00 GPT4o Screen to Voice Intro
00:57 GPT4o Flowchart
01:42 Lets Build The Screen Reader
06:05 First Test
07:05 Lets Build The Voice
09:48 Second Test with Voice
10:32 Adding Control Key
11:05 Final Tests - Наука та технологія
Dude. That thumbnail is terrifying. 😂
Legit shit. A real coder pwning the Ai matrix❤.
yeah he wrote so much code
@@Ginto_O
Where is your code?
Cool. You projects are always amazing. The local open source projects are the most amazing and interesting to me.
Pretty cool project idea. If you don't mind, I stole it and use Gemini Flash to analyze the images; it's pretty fast too. You should try it.
U know always here to support u
I need of tech like that for my desktop virtual 3d assistant.
I have a 3d model of a character (AI agent) that has to interact with computer in many interesting ways up to controlling pixels of the screen by itself, for example if it want to impose a an object to interact with virtual space. I hope soon enough we will have enough speed and power for AI agents to be sentient and working seamlessly with any type of information.
Amazing stuff 😮
Awesome
Being a member I have been trying to access the github repo, I have sent multiple emails to the provided email address, yet to receive a response it has been 48hrs. Please advise.
This is great. Can we add voice prompt?
is there a copy of the code you used in the documentation you sent to OpenAI in your first prompt?
Do you know when will be having an access to gpt 4o voice api
Im from Portugal, the portuguese is a mixture of mostly Portuguese from Brasil and a lil bit of Portuguese from Portugal heheh
Spanish is not my primary language but it is not that bad also !
Btw nice project ! :D
Hey how do i get access to git and discord?
🎯 Key points for quick navigation:
00:00 *🖥️ Overview of the project setup*
- Setting up for screenshot analysis using GPT-4o
- Detailing the low latency approach for image understanding
- Collecting documentation and writing the initial iteration of the script
02:18 *🛠️ Implementing functions and configurations*
- Fetching documentation from OpenAI for implementing GPT-4o with image inputs
- Inclusion of functions from prior projects to streamline the process
- Utilizing EnV files to fetch the OpenAI key for configuration
07:21 *🔊 Integrating text-to-speech functionality*
- Obtaining OpenAI documentation for speech-to-text-to-speech functionalities
- Implementing a feature to read out responses using TTS
- Troubleshooting and fixing errors in the TTS APIs and configuration
10:55 *🎛️ Controlling the main function with a trigger key*
- Adding a feature to control the main function trigger using a key command
- Testing the control setup with screen prompts for AI responses
- Demonstrating the capability of the system to respond effectively with controlled triggers
Made with HARPA AI
Is that need gpu?
Can you please share the code
I am 3rd
What is Anal Ysing?
Spanish isn't really Spanish if it's speaking with an US accent...