- 16
- 79 108
The Tech Trance
United States
Приєднався 5 чер 2023
Hi I’m Tam! I’m a Machine Learning Engineer based in Silicon Valley. I've done computer vision and deep learning at Apple. And robotic arms at other tech companies. You can expect videos on tech that are easy and fun to understand. See you around!
COMING SOON: Videos with coding sessions, ML explanations, interviews with authors of research papers.
COMING SOON: Videos with coding sessions, ML explanations, interviews with authors of research papers.
OpenAI SORA - the technical breakdown made easy
It's the era of video generation. A Machine Learning Engineer provides a technical explanation of SORA in an easy to understand way. We discuss the dataset collection, labeling the dataset, the model framework of SORA, space-time patches, its multi-modal abilities, and its limitations. And a possible solution to such limitations. This presentation is designed for both beginners and developers!
TIMESTAMPS ⏰
0:00 Wow
0:53 Gather dataset
3:30 Label dataset
4:10 Model framework
6:45 Space time patches
9:37 Conditioning with prompts
12:52 Model all together
15:45 Multi-modal capabilities
19:04 Limitations + solution
31:31 Evaluation
22:24 Wow in pink
SOURCES
Sora landing page openai.com/sora/
Sora technical report openai.com/index/video-generation-models-as-world-simulators/
Sora reverse engineering arxiv.org/pdf/2402.17177
Diffusion Transformer models arxiv.org/pdf/2212.09748
DISCLAIMER
This video provides an explanation of OpenAI's newest SORA model based on my personal research and insights. While I’ve made every effort to ensure accuracy, please note that some details may not be entirely correct, as the information is based on my own interpretation of available sources. I encourage viewers to consult official documentation and sources for the most up-to-date and precise information.
TIMESTAMPS ⏰
0:00 Wow
0:53 Gather dataset
3:30 Label dataset
4:10 Model framework
6:45 Space time patches
9:37 Conditioning with prompts
12:52 Model all together
15:45 Multi-modal capabilities
19:04 Limitations + solution
31:31 Evaluation
22:24 Wow in pink
SOURCES
Sora landing page openai.com/sora/
Sora technical report openai.com/index/video-generation-models-as-world-simulators/
Sora reverse engineering arxiv.org/pdf/2402.17177
Diffusion Transformer models arxiv.org/pdf/2212.09748
DISCLAIMER
This video provides an explanation of OpenAI's newest SORA model based on my personal research and insights. While I’ve made every effort to ensure accuracy, please note that some details may not be entirely correct, as the information is based on my own interpretation of available sources. I encourage viewers to consult official documentation and sources for the most up-to-date and precise information.
Переглядів: 1 386
Відео
SORA creates beautiful videos. But what does a Machine Learning Engineer see?
Переглядів 1,2 тис.14 днів тому
Shipmas is done, evaluations are in! The third day of Shipmas brought us SORA - the long-awaited text-to-video generator. SORA can create worlds with the click of a button - but how good are they? A machine learning engineer evaluates the qualitative performance of SORA. We inspect temporal consistency, physical consistency, lighting, occlusion, object behavior, and much more. We also discuss t...
A Machine Learning Engineer goes to an A.I. conference
Переглядів 2,7 тис.2 місяці тому
This AI conference is packed with innovation and nerds everywhere! Come with me, a Machine Learning Engineer, as I attend the AI conference that OpenAI speaks at every year. This is the Ray Summit 2024, and Ray is the AI compute engine that companies like OpenAI and Uber rely on for their ML work. We get insights into the industry, talk about the latest models o1 and Llama 3.2, and of course, g...
OpenAI o1 - the biggest black box of all. Let’s break it open.
Переглядів 12 тис.3 місяці тому
A Machine Learning Engineer provides the most detailed and technical explanation of o1 out there. We go through o1’s reinforcement learning algorithm, its training procedure, test-time compute, and how its design compares with GPT-4 models. With its reasoning capabilities, it’s the biggest black box in the AI industry yet. Let’s use our human chain of thoughts and break it down 💪 TIMESTAMPS 00:...
Code with me: Machine learning on a Macbook GPU (works for all M1, M2, M3) for a 10x speedup
Переглядів 8 тис.4 місяці тому
Step aside, NVIDIA CUDA! Apple Macbooks now have powerful M1 M2 M3 chips that are great for machine learning. This is your complete guide on how to run Pytorch ML models on your Mac’s GPU, instead of the CPU or CUDA. A machine learning engineer walks you through the easy, simple code changes needed to tap into your GPU - with only 5 lines of code! As a result, you’ll see a 10-20x speedup when r...
“Smart glasses will be used by billions“ and other A.I. strategies revealed at SIGGRAPH 2024
Переглядів 4254 місяці тому
NVIDIA CEO Jensen Huang and Meta CEO Mark Zuckerberg come together publicly for the first time. They have a fruitful conversation at SIGGRAPH 2024 on the next steps for AI and their strategic reasonings for it, which include smart glasses, custom agents, and open sourcing. As a machine learning engineer who has worked in Big Tech, I provide technical insights for each of the takeaways. TIMESTAM...
GPT-4o is so smart, it flirts. There’s a reason for that.
Переглядів 29 тис.7 місяців тому
GPT-4o is so smart, it can flirt 🔥 With my machine learning background, I break down the differences between GPT-4 and GPT-4o, where we shift from multiple single-modality models to a single multi-modality model, aka "omni-model". With this new AI model able to express emotion, we are one step closer to human-like intelligence! TIMESTAMPS 00:00 Flirtier, but not smarter? 01:56 OpenAI descriptio...
AI vs. braces: Who will win?
Переглядів 1,7 тис.8 місяців тому
Every AI models suffers from edge cases. Is braces one of them? Here I test AI cloning and image generation on how well it handles this atypical dental scenario. I'm a machine learning engineer and I investigate this question in a fun but critical way, providing explanations through the lens of a ML engineer. TIMESTAMPS 00:00 A baddie with braces! 00:36 Model 1: AI Cloning 02:32 My reaction 04:...
NVIDIA 2024 Keynote: 2 hours in 20 minutes, narrated by ML Engineer
Переглядів 9639 місяців тому
NVIDIA presents their AI Revolution.The original keynote is 2 hours, but here it is summarized in 20 minutes with detail by a Machine Learning Engineer. We cover the 5 components of NVIDIA's AI revolution, including the new Blackwell GPU, NIMs for their pre-packaged AI models, NEMO for helping companies finetune their large AI models, and lastly, the Omniverse that enables physical AI such as a...
How I became a Machine Learning Engineer
Переглядів 3 тис.9 місяців тому
This is how I became a Machine Learning Engineer in Silicon Valley. It involved me going on a worldwide journey to explore my life and career options and pursuing machine learning opportunities. It wasn’t a straightforward path, but that’s what makes it more fun! Here I share my journey, my learning lessons, and my advice in how to become a machine learning engineer. Hope you enjoy! TIMESTAMPS ...
Does it pick up its own poop? 💩
I think it's a poop-free model
sounds strongly like an ad
Not an ad! I include the tech specs bc it's good to know what the dog is capable of / made with
RIP Hollywood, now I can create next episodes of LOTR at home
@@gileneusz Careful, Gollum’s appearance might look more abnormal than usual ;)
@@TheTechTrance my precioussssssss
People haven't dressed them up yet - waiting for that comical moment.... only $4k! Breed - lol - I can't wait to see it at the next Purina Dog show!
Haha they can be dressed in stickers ✨
Where can I get the cat-bot???
Octopus bot too pls
At least it doesn’t poop 💩
It’s a feature 💁♀️
A bit freaky…
It is indeed
Thanks tech trance !
This is an amazing video! I had a hard time understanding the paper on SORA. Extremely glad I came across your video🙌 thank you for this🥺🙌
@@samathmikabk I’m happy to hear that! Glad it helped :)
Will your opinion be valued in the future? I want to start, but I don't know where and how. And is there any guarantee that this job will have a place in the future?
There’s no guarantee for anything, but AI engineers have strong prospects. AI is here to stay :)
@TheTechTrance If you were to put yourself in the place of a recent graduate who is very interested in programming and mathematics, what would you suggest?
Really great video, I now understand the science behind the magic. I dont want to pretend that I understand everything, but diffusion transformer model part didn't make sense to me. I assume we need transformers to understand the temporal coherence but why would we need to add noise and then again train to remove the noise?
I'm glad it was helpful! To your question, adding noise is the diffusion *process* and removing noise is the diffusion *model*'s task. Both adding and removing are there during the training phase so that the Diffusion Transformer Model can be trained. During inference, we start with random noise already, so only a removal of noise will happen. Working with noise gives the video generation process a lot of flexibility in molding the noise into whatever it sees fit. I hope that helps!
@ Thanks for the reply that makes sense, this reminds me of how GANS generator network uses random inputs to generate images
It is true???
Yep!
From someone that works within tech but is not technical these videos are suuuper useful 🙌🏻 Thanks for the deep breakdown
I aim to make the material suitable for both technical and non-technical people - glad you enjoyed it!
Great vids! Q: What makes you conclude that the augmented prompt for the alien in the city didn't just model the existing SORA ad video?
I conclude that the augmented prompt *does model the SORA video! That the coincidence between what ChatGPT shows and what SORA shows is too uncanny. Thus they are both likely using the same LLM for augmenting prompts AND the LLM was instruction-tuned a little too hard :)
@@TheTechTrance what you have reiterated here was already made clear in your video. What I'm asking is how you can conclude there's a forward correlation, when it could've just as well been due to the reverse correlation; the augmented prompt matches the video because it had already been used to create that video before you even entered the prompt!
I see what you mean. I think it’s unlikely that ChatGPT’s outputs were trained based on Sora’s outputs. And simply bc Sora relies on an LLM for augmentation and ChatGPT is an LLM, that’s why I think the forward correlation is more likely :)
You’re the only one who does technical explanations of AI. Everyone is just talking about the hype. Glad to have found your channel
Thank you, I try to contribute that way. Glad you enjoyed!
Thanks TechTrance, "now my brain is full" :) It was dense, so going to have to watch/listen to this again to fully get it. I thought a couple parts stood out - The self-attention mechanism to keep it temporal - I didn't think about that! "Limited semantic precision" - ugh, yes, DALL-E does the same On limitations - it sounds like a horrible reason to melt the earth even more, but couldn't there be (along with a physics-first engine) some sort of o1-ish test time compute, to make a bunch of videos, and have an internal voting mechanism as to the best? Scratch this idea, it sounds horrible Happy to the new year
I'm glad you enjoyed it! And it is dense - so yea definitely rewatch parts over! Your idea is actually good - I think we'll see a lot of techniques used on LLMs being applied to vision models. Possibly even RL-based ones for vision. Happy new year to you too!
💪
love the presentation - very informative and easy to watch!
glad you enjoyed it!
Thank you and Happy New Year!
Happy new year to you too!
First! ❤
Yayyyyy one last video before the end of the year 🙌🏻 love the bonus
It's not the last one of the year just yet 😄💖
WUBABABABALUBABABA A Hit as always. Beep boop bob beep
Great insights
Thanks for lending us a pair of ML engineer lens to see how you think through these projects - really inSIGHTful
🤓😎🤓😎🤓😎
Great video! I’d love to see a comparison between Google’s Veo 2 model and Sora. Since Sora doesn’t rely as much on marketing tricks as Google, it’s really hard to know which one is objectively more advanced.
Thanks for the suggestion! It would be interesting to see how they compare indeed. I'll see what I can do :)
Happy Holidays!!!
Happy holidays!
Merry Christmas, lots of hamming it up for fun, and great breakdown as always. Looking forward to the techinical deep dive!
Happy holidays to you as well! See you soon for the deep dive!
Great video and commentary Tech Trance 👽🔥🙏
Thank you!
I wonder how long it will be until the next level of SORA comes out... It seems ( My abuse is relevanthere for some reason, I feel like crap, or I am mad and defending myself, or whatever you want? Or not want? idk, why do i comment this? no clue...) that if the gpt3 to 03 track is repeated then by the end of next year This all will be hyper life like. And it is exciting (hope I get to see it all and don't die/become blind/burn in hell before then)...
I think the next SORA will come out pretty quick. They’ll likely incorporate physics modeling models or … reinforcement learning. Seems to be the magic sauce for everything now lol. I hope you feel better! Happy holidays :)
🎄🎅🧑🎄
She’s baaaaack! So happy to see another knowledge drop from you 🤩
What are the specs of your Mac? I'm currently looking to buy one.
I have a M2 Max 32GB (base model). It works great, even with my heavy usage of it. The latest models are surely even better!
Well done! Really great video. Straight to the point, no bs, only quality content in a packed and pleasant format. Thank you. Don't often subscribe organically but you got my attention. Keep up the good work!
@MichelCourtine Thank you for your attentiveness and compliments!
Hello
Does Apple let you shoot around the campus?
U adorbzzz,,, gosh, stop it!! Lol. And hilarious, :)
Looking forward Graph of Thought thinking inference
I think Q* stands for Quiet STaR (thinking and self taught reasoner), which is another paper, not the Q learning with A*
I believe you are right, good catch!
@@TheTechTrancewow thanks, wasnt expecting that 😳
Thank you for taking the time to create this informational and interesting summary of the AI conference!
@@StaceyAlGhawas Glad you enjoyed it! It was my pleasure
Hey cool recap. I’m researching inference, would you recommend me any blogs or lectures to go deeper into it¿, thanks¡
What about inference exactly are you interested in?
Great video 🦾🤖
@@mattfarmerai thank you!
I'm gonna chat with chatGPT
That's a good idea!
🙋♂️...congratulations... 🎉
🙌
Welcome to the AI social media stage with a thumbs up. Your video skill equals the depth of your technical knowledge abilities. . May your site grow rapidly among your peers.. It deserves that on merit and will be successful. Are you a student or a teacher of recursive learning. A student recursive learner as well as a teacher. Thanks for bridging the gap for my student learning in the chain of things.
Glad you found it helpful! And thank you for your encouraging words!
Nice video...thanks for sharing......newly subs :)
Thanks! And welcome :)
Awesome video, thanks. 🎈❤️ from Kenya 😂🇰🇪
Hujambo, asante! (I hope that’s right)
@@TheTechTrance On point! 🥰🥰 I can't believe it
🇧🇷🇧🇷🇧🇷🇧🇷👏🏻, Amazing video!
Obrigado!! 🫶
Oooohhh I wish I could’ve been there! You make it seem fun 🎉😊
It's how nerds party :D