Learn How he reproduced Karpathy's GPT-2 for Audio!!!

Поділитися
Вставка
  • Опубліковано 1 жов 2024
  • 🔗 Links 🔗
    Building GPT2o - Part 1 : Audio
    / building-gpt2o-part-1-...
    GPT-2 for Audio - github.com/niv...
    Srinivas Billa Twitter
    x.com/sbeastwindy
    Srinivas Billa Linkedin
    / srinivasbilla
    Andrej Karpathy's GPT-2 Video - • Let's reproduce GPT-2 ...
    ❤️ If you want to support the channel ❤️
    Support here:
    Patreon - / 1littlecoder
    Ko-Fi - ko-fi.com/1lit...
    🧭 Follow me on 🧭
    Twitter - / 1littlecoder
    Linkedin - / amrrs

КОМЕНТАРІ • 32

  • @Srinivas_Billa
    @Srinivas_Billa 3 місяці тому +20

    Thanks for having me!

  • @CubicPostcode
    @CubicPostcode 3 місяці тому +2

    I was thinking about how OpenAI could come up with nice voices without being prone to be sued legally. I came up with this idea. It would generate voices randomly and provably so, it would be possible to prove the voices where generated randomly and then people could upvote or downvote voices, so that the most popular ones according the crowdsourced polling would be the ones featured in the app. Since the voices where randomly generated no one could say they where an imitation of someone. Also it would be no fault of OpenAI that people preferred some of them. Also, this seems a better approach than just allowing users to upload a sample of their desired voices since with this approach you can avoid misuses to do with deepfaking.

  • @maulikmadhavi
    @maulikmadhavi 3 місяці тому +2

    thanks for sharing 😊

  • @petersobolewski1354
    @petersobolewski1354 3 місяці тому

    How about traiming it on animal sounds? Will it learn to speak with them?

  • @mshonle
    @mshonle 3 місяці тому

    Very cool! I wonder if you could leverage available text models to do something like the model mashups or Franken-merges? For example if you do a LoRA-like fine-tuning, but focused on all layers with addition layers added to both ends (to translate from the audio encodings to the pretrained model’s hidden embeddings and then back to decodable audio again).

  • @christaylor-gz6mi
    @christaylor-gz6mi 3 місяці тому +1

    Thank you! This is fantastic content!

  • @WebWizard977
    @WebWizard977 3 місяці тому +1

    Its feel like tts which convert audio to text then send it to gpt server🤔

    • @1littlecoder
      @1littlecoder  3 місяці тому

      This is one single native audio model

    • @WebWizard977
      @WebWizard977 3 місяці тому

      @@1littlecoder wow that's amazing

    • @efexzium
      @efexzium 3 місяці тому

      Its a Large Multy Modal
      Model
      ?

    • @WebWizard977
      @WebWizard977 3 місяці тому +1

      @@efexzium ya sure brother I have my own llm also but thanks for theory update

    • @efexzium
      @efexzium 3 місяці тому

      @@WebWizard977 me 2 but I just cherry pick from the internet the best model.

  • @haileycollet4147
    @haileycollet4147 3 місяці тому

    HF datasets/jhu-clsp/seamless-align-expressive
    The English half of this is ~3500 hours I think.

  • @satyamtiwari3839
    @satyamtiwari3839 3 місяці тому

    It is inspiring

  • @chickenp7038
    @chickenp7038 3 місяці тому

    awesome video.

  • @OumarDicko-c5i
    @OumarDicko-c5i 3 місяці тому +1

    Mind blowing

  • @1msirius
    @1msirius 3 місяці тому

    cool bro!

  • @TheRealUsername
    @TheRealUsername 3 місяці тому

    GPT-4o ?

    • @1littlecoder
      @1littlecoder  3 місяці тому

      He mentions that's his motivation

    • @TheRealUsername
      @TheRealUsername 3 місяці тому

      @@1littlecoder I mean, it could be the same tech behind GPT-4o

    • @Srinivas_Billa
      @Srinivas_Billa 3 місяці тому +1

      This is a very naive way to do it but yeah. There probably ar eothe r optimisation to make but I'd rather have something to play with than not I guess?

    • @TheRealUsername
      @TheRealUsername 3 місяці тому

      @@Srinivas_Billa so that does mean the open-source community is capable of building something similar to GPT-4o

    • @Srinivas_Billa
      @Srinivas_Billa 3 місяці тому +2

      @@TheRealUsername of course! I plan to try video next and then ultimately combine them together. The tools are there to do it. compute is the issue.