Learn How he reproduced Karpathy's GPT-2 for Audio!!!
Вставка
- Опубліковано 1 жов 2024
- 🔗 Links 🔗
Building GPT2o - Part 1 : Audio
/ building-gpt2o-part-1-...
GPT-2 for Audio - github.com/niv...
Srinivas Billa Twitter
x.com/sbeastwindy
Srinivas Billa Linkedin
/ srinivasbilla
Andrej Karpathy's GPT-2 Video - • Let's reproduce GPT-2 ...
❤️ If you want to support the channel ❤️
Support here:
Patreon - / 1littlecoder
Ko-Fi - ko-fi.com/1lit...
🧭 Follow me on 🧭
Twitter - / 1littlecoder
Linkedin - / amrrs
Thanks for having me!
Crazy work bro
Do u have a discord
Thanks for sharing this with us!
thanks for sharing your code and explanation!
wonder if any augmentation can help to overcome overfitting issue
I was thinking about how OpenAI could come up with nice voices without being prone to be sued legally. I came up with this idea. It would generate voices randomly and provably so, it would be possible to prove the voices where generated randomly and then people could upvote or downvote voices, so that the most popular ones according the crowdsourced polling would be the ones featured in the app. Since the voices where randomly generated no one could say they where an imitation of someone. Also it would be no fault of OpenAI that people preferred some of them. Also, this seems a better approach than just allowing users to upload a sample of their desired voices since with this approach you can avoid misuses to do with deepfaking.
thanks for sharing 😊
Thanks for watching!
How about traiming it on animal sounds? Will it learn to speak with them?
Very cool! I wonder if you could leverage available text models to do something like the model mashups or Franken-merges? For example if you do a LoRA-like fine-tuning, but focused on all layers with addition layers added to both ends (to translate from the audio encodings to the pretrained model’s hidden embeddings and then back to decodable audio again).
Thank you! This is fantastic content!
Glad you enjoyed it!
Its feel like tts which convert audio to text then send it to gpt server🤔
This is one single native audio model
@@1littlecoder wow that's amazing
Its a Large Multy Modal
Model
?
@@efexzium ya sure brother I have my own llm also but thanks for theory update
@@WebWizard977 me 2 but I just cherry pick from the internet the best model.
HF datasets/jhu-clsp/seamless-align-expressive
The English half of this is ~3500 hours I think.
It is inspiring
awesome video.
Mind blowing
🚀
cool bro!
✌
GPT-4o ?
He mentions that's his motivation
@@1littlecoder I mean, it could be the same tech behind GPT-4o
This is a very naive way to do it but yeah. There probably ar eothe r optimisation to make but I'd rather have something to play with than not I guess?
@@Srinivas_Billa so that does mean the open-source community is capable of building something similar to GPT-4o
@@TheRealUsername of course! I plan to try video next and then ultimately combine them together. The tools are there to do it. compute is the issue.