GPT-4o Low Latency Screen to Voice Tutorial - SUPER IMPRESSIVE OCR!

Поділитися
Вставка
  • Опубліковано 12 лип 2024
  • GPT-4o Low Latency Screen to Voice Tutorial - SUPER IMPRESSIVE OCR!
    👊 Become a member and get access to GitHub and Code:
    / allaboutai
    🤖 Great AI Engineer Course:
    scrimba.com/learn/aiengineer?...
    🔥 Open GitHub Repos:
    github.com/AllAboutAI-YT/easy...
    📧 Join the newsletter:
    www.allabtai.com/newsletter/
    🌐 My website:
    www.allabtai.com
    Today we recap my livestream where i built a low latency screen to voice reader with great ocr capabilites. This will look at the screen, answer any question or explain a problem, with pretty low latency pre new voice mode from GPT4o.
    00:00 GPT4o Screen to Voice Intro
    00:57 GPT4o Flowchart
    01:42 Lets Build The Screen Reader
    06:05 First Test
    07:05 Lets Build The Voice
    09:48 Second Test with Voice
    10:32 Adding Control Key
    11:05 Final Tests
  • Наука та технологія

КОМЕНТАРІ • 24

  • @JohnSmith762A11B
    @JohnSmith762A11B Місяць тому +7

    Dude. That thumbnail is terrifying. 😂

  • @Ms.Robot.
    @Ms.Robot. Місяць тому +6

    Legit shit. A real coder pwning the Ai matrix❤.

    • @Ginto_O
      @Ginto_O Місяць тому

      yeah he wrote so much code

    • @watchdog163
      @watchdog163 29 днів тому

      @@Ginto_O
      Where is your code?

  • @BThunder30
    @BThunder30 Місяць тому

    Cool. You projects are always amazing. The local open source projects are the most amazing and interesting to me.

  • @3choff
    @3choff Місяць тому +2

    Pretty cool project idea. If you don't mind, I stole it and use Gemini Flash to analyze the images; it's pretty fast too. You should try it.

  • @lokeshart3340
    @lokeshart3340 Місяць тому

    U know always here to support u

  • @ksem1337
    @ksem1337 Місяць тому +1

    I need of tech like that for my desktop virtual 3d assistant.
    I have a 3d model of a character (AI agent) that has to interact with computer in many interesting ways up to controlling pixels of the screen by itself, for example if it want to impose a an object to interact with virtual space. I hope soon enough we will have enough speed and power for AI agents to be sentient and working seamlessly with any type of information.

  • @dniliveact
    @dniliveact Місяць тому

    Amazing stuff 😮

  • @taoxu1798
    @taoxu1798 Місяць тому

    Awesome

  • @protimaranipaul7107
    @protimaranipaul7107 Місяць тому +1

    Being a member I have been trying to access the github repo, I have sent multiple emails to the provided email address, yet to receive a response it has been 48hrs. Please advise.

  • @enthuesd
    @enthuesd Місяць тому

    This is great. Can we add voice prompt?

  • @3-deez
    @3-deez Місяць тому

    is there a copy of the code you used in the documentation you sent to OpenAI in your first prompt?

  • @branislannjemec9050
    @branislannjemec9050 21 день тому

    Do you know when will be having an access to gpt 4o voice api

  • @pedrorafaelnunes
    @pedrorafaelnunes Місяць тому

    Im from Portugal, the portuguese is a mixture of mostly Portuguese from Brasil and a lil bit of Portuguese from Portugal heheh
    Spanish is not my primary language but it is not that bad also !

  • @abhishekrakhe2788
    @abhishekrakhe2788 Місяць тому

    Hey how do i get access to git and discord?

  • @PTHastings
    @PTHastings Місяць тому

    🎯 Key points for quick navigation:
    00:00 *🖥️ Overview of the project setup*
    - Setting up for screenshot analysis using GPT-4o
    - Detailing the low latency approach for image understanding
    - Collecting documentation and writing the initial iteration of the script
    02:18 *🛠️ Implementing functions and configurations*
    - Fetching documentation from OpenAI for implementing GPT-4o with image inputs
    - Inclusion of functions from prior projects to streamline the process
    - Utilizing EnV files to fetch the OpenAI key for configuration
    07:21 *🔊 Integrating text-to-speech functionality*
    - Obtaining OpenAI documentation for speech-to-text-to-speech functionalities
    - Implementing a feature to read out responses using TTS
    - Troubleshooting and fixing errors in the TTS APIs and configuration
    10:55 *🎛️ Controlling the main function with a trigger key*
    - Adding a feature to control the main function trigger using a key command
    - Testing the control setup with screen prompts for AI responses
    - Demonstrating the capability of the system to respond effectively with controlled triggers
    Made with HARPA AI

  • @tumbalasu3718
    @tumbalasu3718 Місяць тому

    Is that need gpu?

  • @protimaranipaul7107
    @protimaranipaul7107 Місяць тому

    Can you please share the code

  • @lokeshart3340
    @lokeshart3340 Місяць тому +1

    I am 3rd

  • @MudroZvon
    @MudroZvon Місяць тому

    What is Anal Ysing?

  • @luisvictorf
    @luisvictorf Місяць тому +2

    Spanish isn't really Spanish if it's speaking with an US accent...