The Tech Trance
The Tech Trance
  • 16
  • 79 108
OpenAI SORA - the technical breakdown made easy
It's the era of video generation. A Machine Learning Engineer provides a technical explanation of SORA in an easy to understand way. We discuss the dataset collection, labeling the dataset, the model framework of SORA, space-time patches, its multi-modal abilities, and its limitations. And a possible solution to such limitations. This presentation is designed for both beginners and developers!
TIMESTAMPS ⏰
0:00 Wow
0:53 Gather dataset
3:30 Label dataset
4:10 Model framework
6:45 Space time patches
9:37 Conditioning with prompts
12:52 Model all together
15:45 Multi-modal capabilities
19:04 Limitations + solution
31:31 Evaluation
22:24 Wow in pink
SOURCES
Sora landing page openai.com/sora/
Sora technical report openai.com/index/video-generation-models-as-world-simulators/
Sora reverse engineering arxiv.org/pdf/2402.17177
Diffusion Transformer models arxiv.org/pdf/2212.09748
DISCLAIMER
This video provides an explanation of OpenAI's newest SORA model based on my personal research and insights. While I’ve made every effort to ensure accuracy, please note that some details may not be entirely correct, as the information is based on my own interpretation of available sources. I encourage viewers to consult official documentation and sources for the most up-to-date and precise information.
Переглядів: 1 386

Відео

SORA creates beautiful videos. But what does a Machine Learning Engineer see?
Переглядів 1,2 тис.14 днів тому
Shipmas is done, evaluations are in! The third day of Shipmas brought us SORA - the long-awaited text-to-video generator. SORA can create worlds with the click of a button - but how good are they? A machine learning engineer evaluates the qualitative performance of SORA. We inspect temporal consistency, physical consistency, lighting, occlusion, object behavior, and much more. We also discuss t...
A Machine Learning Engineer goes to an A.I. conference
Переглядів 2,7 тис.2 місяці тому
This AI conference is packed with innovation and nerds everywhere! Come with me, a Machine Learning Engineer, as I attend the AI conference that OpenAI speaks at every year. This is the Ray Summit 2024, and Ray is the AI compute engine that companies like OpenAI and Uber rely on for their ML work. We get insights into the industry, talk about the latest models o1 and Llama 3.2, and of course, g...
OpenAI o1 - the biggest black box of all. Let’s break it open.
Переглядів 12 тис.3 місяці тому
A Machine Learning Engineer provides the most detailed and technical explanation of o1 out there. We go through o1’s reinforcement learning algorithm, its training procedure, test-time compute, and how its design compares with GPT-4 models. With its reasoning capabilities, it’s the biggest black box in the AI industry yet. Let’s use our human chain of thoughts and break it down 💪 TIMESTAMPS 00:...
Code with me: Machine learning on a Macbook GPU (works for all M1, M2, M3) for a 10x speedup
Переглядів 8 тис.4 місяці тому
Step aside, NVIDIA CUDA! Apple Macbooks now have powerful M1 M2 M3 chips that are great for machine learning. This is your complete guide on how to run Pytorch ML models on your Mac’s GPU, instead of the CPU or CUDA. A machine learning engineer walks you through the easy, simple code changes needed to tap into your GPU - with only 5 lines of code! As a result, you’ll see a 10-20x speedup when r...
“Smart glasses will be used by billions“ and other A.I. strategies revealed at SIGGRAPH 2024
Переглядів 4254 місяці тому
NVIDIA CEO Jensen Huang and Meta CEO Mark Zuckerberg come together publicly for the first time. They have a fruitful conversation at SIGGRAPH 2024 on the next steps for AI and their strategic reasonings for it, which include smart glasses, custom agents, and open sourcing. As a machine learning engineer who has worked in Big Tech, I provide technical insights for each of the takeaways. TIMESTAM...
GPT-4o is so smart, it flirts. There’s a reason for that.
Переглядів 29 тис.7 місяців тому
GPT-4o is so smart, it can flirt 🔥 With my machine learning background, I break down the differences between GPT-4 and GPT-4o, where we shift from multiple single-modality models to a single multi-modality model, aka "omni-model". With this new AI model able to express emotion, we are one step closer to human-like intelligence! TIMESTAMPS 00:00 Flirtier, but not smarter? 01:56 OpenAI descriptio...
AI vs. braces: Who will win?
Переглядів 1,7 тис.8 місяців тому
Every AI models suffers from edge cases. Is braces one of them? Here I test AI cloning and image generation on how well it handles this atypical dental scenario. I'm a machine learning engineer and I investigate this question in a fun but critical way, providing explanations through the lens of a ML engineer. TIMESTAMPS 00:00 A baddie with braces! 00:36 Model 1: AI Cloning 02:32 My reaction 04:...
NVIDIA 2024 Keynote: 2 hours in 20 minutes, narrated by ML Engineer
Переглядів 9639 місяців тому
NVIDIA presents their AI Revolution.The original keynote is 2 hours, but here it is summarized in 20 minutes with detail by a Machine Learning Engineer. We cover the 5 components of NVIDIA's AI revolution, including the new Blackwell GPU, NIMs for their pre-packaged AI models, NEMO for helping companies finetune their large AI models, and lastly, the Omniverse that enables physical AI such as a...
How I became a Machine Learning Engineer
Переглядів 3 тис.9 місяців тому
This is how I became a Machine Learning Engineer in Silicon Valley. It involved me going on a worldwide journey to explore my life and career options and pursuing machine learning opportunities. It wasn’t a straightforward path, but that’s what makes it more fun! Here I share my journey, my learning lessons, and my advice in how to become a machine learning engineer. Hope you enjoy! TIMESTAMPS ...

КОМЕНТАРІ

  • @toufisaliba2806
    @toufisaliba2806 4 дні тому

    Does it pick up its own poop? 💩

  • @nirokay136
    @nirokay136 4 дні тому

    sounds strongly like an ad

    • @TheTechTrance
      @TheTechTrance 3 дні тому

      Not an ad! I include the tech specs bc it's good to know what the dog is capable of / made with

  • @gileneusz
    @gileneusz 4 дні тому

    RIP Hollywood, now I can create next episodes of LOTR at home

    • @TheTechTrance
      @TheTechTrance 4 дні тому

      @@gileneusz Careful, Gollum’s appearance might look more abnormal than usual ;)

    • @gileneusz
      @gileneusz 4 дні тому

      @@TheTechTrance my precioussssssss

  • @checksinthemail
    @checksinthemail 4 дні тому

    People haven't dressed them up yet - waiting for that comical moment.... only $4k! Breed - lol - I can't wait to see it at the next Purina Dog show!

    • @TheTechTrance
      @TheTechTrance 4 дні тому

      Haha they can be dressed in stickers ✨

  • @vanesagomez-gonzalez6532
    @vanesagomez-gonzalez6532 5 днів тому

    Where can I get the cat-bot???

  • @vanessaaa.paaark
    @vanessaaa.paaark 5 днів тому

    At least it doesn’t poop 💩

  • @tiffany33094
    @tiffany33094 5 днів тому

    A bit freaky…

  • @jameswilliams7224
    @jameswilliams7224 5 днів тому

    Thanks tech trance !

  • @samathmikabk
    @samathmikabk 5 днів тому

    This is an amazing video! I had a hard time understanding the paper on SORA. Extremely glad I came across your video🙌 thank you for this🥺🙌

    • @TheTechTrance
      @TheTechTrance 5 днів тому

      @@samathmikabk I’m happy to hear that! Glad it helped :)

  • @EnglishTipsWithElaheh
    @EnglishTipsWithElaheh 5 днів тому

    Will your opinion be valued in the future? I want to start, but I don't know where and how. And is there any guarantee that this job will have a place in the future?

    • @TheTechTrance
      @TheTechTrance 4 дні тому

      There’s no guarantee for anything, but AI engineers have strong prospects. AI is here to stay :)

    • @EnglishTipsWithElaheh
      @EnglishTipsWithElaheh 4 дні тому

      @TheTechTrance If you were to put yourself in the place of a recent graduate who is very interested in programming and mathematics, what would you suggest?

  • @dheerajakula7
    @dheerajakula7 8 днів тому

    Really great video, I now understand the science behind the magic. I dont want to pretend that I understand everything, but diffusion transformer model part didn't make sense to me. I assume we need transformers to understand the temporal coherence but why would we need to add noise and then again train to remove the noise?

    • @TheTechTrance
      @TheTechTrance 8 днів тому

      I'm glad it was helpful! To your question, adding noise is the diffusion *process* and removing noise is the diffusion *model*'s task. Both adding and removing are there during the training phase so that the Diffusion Transformer Model can be trained. During inference, we start with random noise already, so only a removal of noise will happen. Working with noise gives the video generation process a lot of flexibility in molding the noise into whatever it sees fit. I hope that helps!

    • @dheerajakula7
      @dheerajakula7 8 днів тому

      @ Thanks for the reply that makes sense, this reminds me of how GANS generator network uses random inputs to generate images

  • @EnglishTipsWithElaheh
    @EnglishTipsWithElaheh 10 днів тому

    It is true???

  • @algopasaconmerry
    @algopasaconmerry 10 днів тому

    From someone that works within tech but is not technical these videos are suuuper useful 🙌🏻 Thanks for the deep breakdown

    • @TheTechTrance
      @TheTechTrance 10 днів тому

      I aim to make the material suitable for both technical and non-technical people - glad you enjoyed it!

  • @HelloWorlds__JTS
    @HelloWorlds__JTS 10 днів тому

    Great vids! Q: What makes you conclude that the augmented prompt for the alien in the city didn't just model the existing SORA ad video?

    • @TheTechTrance
      @TheTechTrance 10 днів тому

      I conclude that the augmented prompt *does model the SORA video! That the coincidence between what ChatGPT shows and what SORA shows is too uncanny. Thus they are both likely using the same LLM for augmenting prompts AND the LLM was instruction-tuned a little too hard :)

    • @HelloWorlds__JTS
      @HelloWorlds__JTS 10 днів тому

      @@TheTechTrance what you have reiterated here was already made clear in your video. What I'm asking is how you can conclude there's a forward correlation, when it could've just as well been due to the reverse correlation; the augmented prompt matches the video because it had already been used to create that video before you even entered the prompt!

    • @TheTechTrance
      @TheTechTrance 8 днів тому

      I see what you mean. I think it’s unlikely that ChatGPT’s outputs were trained based on Sora’s outputs. And simply bc Sora relies on an LLM for augmentation and ChatGPT is an LLM, that’s why I think the forward correlation is more likely :)

  • @tiffany33094
    @tiffany33094 10 днів тому

    You’re the only one who does technical explanations of AI. Everyone is just talking about the hype. Glad to have found your channel

    • @TheTechTrance
      @TheTechTrance 10 днів тому

      Thank you, I try to contribute that way. Glad you enjoyed!

  • @checksinthemail
    @checksinthemail 11 днів тому

    Thanks TechTrance, "now my brain is full" :) It was dense, so going to have to watch/listen to this again to fully get it. I thought a couple parts stood out - The self-attention mechanism to keep it temporal - I didn't think about that! "Limited semantic precision" - ugh, yes, DALL-E does the same On limitations - it sounds like a horrible reason to melt the earth even more, but couldn't there be (along with a physics-first engine) some sort of o1-ish test time compute, to make a bunch of videos, and have an internal voting mechanism as to the best? Scratch this idea, it sounds horrible Happy to the new year

    • @TheTechTrance
      @TheTechTrance 10 днів тому

      I'm glad you enjoyed it! And it is dense - so yea definitely rewatch parts over! Your idea is actually good - I think we'll see a lot of techniques used on LLMs being applied to vision models. Possibly even RL-based ones for vision. Happy new year to you too!

  • @gillricky29
    @gillricky29 11 днів тому

    💪

  • @teachingcomputershowtotalk
    @teachingcomputershowtotalk 11 днів тому

    love the presentation - very informative and easy to watch!

  • @hectornetdev
    @hectornetdev 11 днів тому

    Thank you and Happy New Year!

  • @IN-hw8it6
    @IN-hw8it6 11 днів тому

    First! ❤

  • @algopasaconmerry
    @algopasaconmerry 14 днів тому

    Yayyyyy one last video before the end of the year 🙌🏻 love the bonus

    • @TheTechTrance
      @TheTechTrance 14 днів тому

      It's not the last one of the year just yet 😄💖

  • @TrungTran-hq2ys
    @TrungTran-hq2ys 17 днів тому

    WUBABABABALUBABABA A Hit as always. Beep boop bob beep

  • @mattfarmerai
    @mattfarmerai 18 днів тому

    Great insights

  • @cryptamie
    @cryptamie 18 днів тому

    Thanks for lending us a pair of ML engineer lens to see how you think through these projects - really inSIGHTful

  • @gizmomismo7071
    @gizmomismo7071 18 днів тому

    Great video! I’d love to see a comparison between Google’s Veo 2 model and Sora. Since Sora doesn’t rely as much on marketing tricks as Google, it’s really hard to know which one is objectively more advanced.

    • @TheTechTrance
      @TheTechTrance 18 днів тому

      Thanks for the suggestion! It would be interesting to see how they compare indeed. I'll see what I can do :)

  • @hectornetdev
    @hectornetdev 18 днів тому

    Happy Holidays!!!

  • @checksinthemail
    @checksinthemail 18 днів тому

    Merry Christmas, lots of hamming it up for fun, and great breakdown as always. Looking forward to the techinical deep dive!

    • @TheTechTrance
      @TheTechTrance 18 днів тому

      Happy holidays to you as well! See you soon for the deep dive!

  • @jameswilliams7224
    @jameswilliams7224 18 днів тому

    Great video and commentary Tech Trance 👽🔥🙏

  • @A_Me_Amy
    @A_Me_Amy 19 днів тому

    I wonder how long it will be until the next level of SORA comes out... It seems ( My abuse is relevanthere for some reason, I feel like crap, or I am mad and defending myself, or whatever you want? Or not want? idk, why do i comment this? no clue...) that if the gpt3 to 03 track is repeated then by the end of next year This all will be hyper life like. And it is exciting (hope I get to see it all and don't die/become blind/burn in hell before then)...

    • @TheTechTrance
      @TheTechTrance 18 днів тому

      I think the next SORA will come out pretty quick. They’ll likely incorporate physics modeling models or … reinforcement learning. Seems to be the magic sauce for everything now lol. I hope you feel better! Happy holidays :)

  • @gillricky29
    @gillricky29 19 днів тому

    🎄🎅🧑‍🎄

  • @tiffany33094
    @tiffany33094 19 днів тому

    She’s baaaaack! So happy to see another knowledge drop from you 🤩

  • @lucasalvarezlacasa2098
    @lucasalvarezlacasa2098 20 днів тому

    What are the specs of your Mac? I'm currently looking to buy one.

    • @TheTechTrance
      @TheTechTrance 18 днів тому

      I have a M2 Max 32GB (base model). It works great, even with my heavy usage of it. The latest models are surely even better!

  • @MichelCourtine
    @MichelCourtine 23 дні тому

    Well done! Really great video. Straight to the point, no bs, only quality content in a packed and pleasant format. Thank you. Don't often subscribe organically but you got my attention. Keep up the good work!

    • @TheTechTrance
      @TheTechTrance 22 дні тому

      @MichelCourtine Thank you for your attentiveness and compliments!

  • @TKingstom
    @TKingstom 27 днів тому

    Hello

  • @batmanatkinson1188
    @batmanatkinson1188 Місяць тому

    Does Apple let you shoot around the campus?

  • @toufisaliba2806
    @toufisaliba2806 2 місяці тому

    U adorbzzz,,, gosh, stop it!! Lol. And hilarious, :)

  • @____2080_____
    @____2080_____ 2 місяці тому

    Looking forward Graph of Thought thinking inference

  • @quantumspark343
    @quantumspark343 2 місяці тому

    I think Q* stands for Quiet STaR (thinking and self taught reasoner), which is another paper, not the Q learning with A*

    • @TheTechTrance
      @TheTechTrance 2 місяці тому

      I believe you are right, good catch!

    • @quantumspark343
      @quantumspark343 2 місяці тому

      ​@@TheTechTrancewow thanks, wasnt expecting that 😳

  • @StaceyAlGhawas
    @StaceyAlGhawas 2 місяці тому

    Thank you for taking the time to create this informational and interesting summary of the AI conference!

    • @TheTechTrance
      @TheTechTrance 2 місяці тому

      @@StaceyAlGhawas Glad you enjoyed it! It was my pleasure

  • @lautarosuarez5393
    @lautarosuarez5393 2 місяці тому

    Hey cool recap. I’m researching inference, would you recommend me any blogs or lectures to go deeper into it¿, thanks¡

    • @TheTechTrance
      @TheTechTrance 2 місяці тому

      What about inference exactly are you interested in?

  • @mattfarmerai
    @mattfarmerai 2 місяці тому

    Great video 🦾🤖

  • @williamcase426
    @williamcase426 2 місяці тому

    I'm gonna chat with chatGPT

  • @Samanbeachhikkaduwa
    @Samanbeachhikkaduwa 2 місяці тому

    🙋‍♂️...congratulations... 🎉

  • @gillricky29
    @gillricky29 2 місяці тому

    🙌

  • @timalete
    @timalete 2 місяці тому

    Welcome to the AI social media stage with a thumbs up. Your video skill equals the depth of your technical knowledge abilities. . May your site grow rapidly among your peers.. It deserves that on merit and will be successful. Are you a student or a teacher of recursive learning. A student recursive learner as well as a teacher. Thanks for bridging the gap for my student learning in the chain of things.

    • @TheTechTrance
      @TheTechTrance 2 місяці тому

      Glad you found it helpful! And thank you for your encouraging words!

  • @mohdjibly6184
    @mohdjibly6184 2 місяці тому

    Nice video...thanks for sharing......newly subs :)

  • @EliudSibuor
    @EliudSibuor 2 місяці тому

    Awesome video, thanks. 🎈❤️ from Kenya 😂🇰🇪

    • @TheTechTrance
      @TheTechTrance 2 місяці тому

      Hujambo, asante! (I hope that’s right)

    • @EliudSibuor
      @EliudSibuor 2 місяці тому

      @@TheTechTrance On point! 🥰🥰 I can't believe it

  • @claudioagmfilho
    @claudioagmfilho 2 місяці тому

    🇧🇷🇧🇷🇧🇷🇧🇷👏🏻, Amazing video!

  • @estyalasu
    @estyalasu 2 місяці тому

    Oooohhh I wish I could’ve been there! You make it seem fun 🎉😊