No Priors Ep.61 | OpenAI's Sora Leaders Aditya Ramesh, Tim Brooks and Bill Peebles

No Priors: AI, Machine Learning, Tech, & Startups

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 20 тра 2024
AI-generated videos are not just leveled-up image generators. But rather, they could be a big step forward on the path to AGI. This week on No Priors, the team from Sora is here to discuss OpenAI’s recently announced generative video model, which can take a text prompt and create realistic, visually coherent, high-definition clips that are up to a minute long.
Sora team leads, Aditya Ramesh, Tim Brooks, and Bill Peebles join Elad and Sarah to talk about developing Sora. The generative video model isn’t yet available for public use but the examples of its work are very impressive. However, they believe we’re still in the GPT-1 era of AI video models and are focused on a slow rollout to ensure the model is in the best place possible to offer value to the user and more importantly they’ve applied all the safety measures possible to avoid deep fakes and misinformation. They also discuss what they’re learning from implementing diffusion transformers, why they believe video generation is taking us one step closer to AGI, and why entertainment may not be the main use case for this tool in the future.
Show Notes:
0:00 Sora team Introduction
1:05 Simulating the world with Sora
2:25 Building the most valuable consumer product
5:50 Alternative use cases and simulation capabilities
8:41 Diffusion transformers explanation
10:15 Scaling laws for video
13:08 Applying end-to-end deep learning to video
15:30 Tuning the visual aesthetic of Sora
17:08 The road to “desktop Pixar” for everyone
20:12 Safety for visual models
22:34 Limitations of Sora
25:04 Learning from how Sora is learning
29:32 The biggest misconceptions about video models
Наука та технологія

КОМЕНТАРІ • 30

@jonkraghshow 25 днів тому ⁺⁴
Really great interview. Thanks to all.
@garsett 24 дні тому ⁺²
Smart! 😊
Personalisation and esthetics. Cool.
But also PRACTICAL worldbuilding please.
How can this help create quality lifestyles? Happy communities? A convivial society?
@erniea5843 24 дні тому ⁺¹
Cool interview, awesome to see a glimpse into the innovation being done to develop these video models
@leslietetteh7292 25 днів тому ⁺⁴
Interesting video! It really highlights the potential of using 3D tokens with time as an added dimension :). My experience with diffusion models and video generation didn't show anything quite like Sora's temporal coherence. Looking ahead, I'm excited about the prospects of evolving from polygon rendering to photorealism via image-to-image inference. While I might be biased due to my interest in this rendering, I think incorporating 'possibility' as an additional dimension, as suggested by "imagining higher dimensions", could address issues like the leg switching effects we currently see. Such physics-consistent behavior could potentially be borrowed from game engine scenarios, where, unlike an apple that behaves predictably when dropped, a leg has specific movement constraints (also affected by perspective shifts). It’s a speculative route, but it might be worth exploring if it promises substantial improvements.
@tianjiancai1118 24 дні тому
Maybe internal 3D modling should be introduced to solve the issue you have mentioned (leg switching, or so called "entity inconsistency".
@leslietetteh7292 21 день тому
@@tianjiancai1118 How so? (NB: you're familiar with how diffusion models work? It's just learning to denoise an image, or a cube in this case. I just suggest that it learns to denoise the branching possibilities rather than a cube, so it knows what is not a possibility - suggesting, not guaranteeing the idea will work. There are things like ControlNets though, so if this internal 3D modelling is a valid idea, please share)
@tianjiancai1118 21 день тому
Sorry to clear that, but internal 3d modeling is hard to achieve in a diffusion model (as far as I know). What I mean is somehow a totally new arch.
@Glowbox3D 24 дні тому ⁺¹
As a 3d artist, filmmaker and actor, SORA has me super excited. I can't wait to play around with this tech. It's pretty crazy how all these modalities are happening at once--image, video, voice, sound effect, and music. All the pipelines needed to create media. There will be a time not far off, where we can plug in the prompt, and SORA 5 will create all the needed departments. As the human working with this, I would of course be heavily involved in the iterative generation and direction of each piece of media...and in the end the edit would be mine. I wonder how much 'authorship' a creator will have or be given.
@boonkiathan 21 день тому
but prior to commercially utilizing the SORA output
there must be clarity on the source of the training data
it can't be OpenAI pushing it to creators, and the creators saying they trust OpenAI
this is almost the exact same issue as textual generation
for fun and brainstorming, fair use i suppose
@EnigmaCodeCrusher 24 дні тому ⁺¹
Great interview
@JustinHalford 25 днів тому ⁺¹
Compute and data are converging on becoming interchangeable sides of the same coin. Flops are all you need.
@amritbro 25 днів тому ⁺¹
Im definitely following these three talented guys on X. Really great interview and without a doubt Sora is already making an impact in Hollywood like once Pixar did during a steve jobs era.
@AIlysAI 25 днів тому
Really all these amazing things are just possible with transformers, nothing much innovation but just apply transformers to X and scale it. The most innovative thing they did was a tokenization method as boxes the rest is mechanics.
@leslietetteh7292 25 днів тому
Adding another axis in the form of imaginary numbers improved our ability to model higher dimensional interactions before. That's negative, bordering on bias - if it isn't innovation, then why didn't everyone else do it?
@BadWithNames123 21 день тому ⁺¹
vocal fry contest
@oiuhwoechwe 25 днів тому ⁺¹¹
I'm old. these guys look like they just left high school.
@voncolborn9437 24 дні тому
Haha, I'm 71. I know exactly what you mean. The average age of the developers of the first Mac was 28 years old. It seems like the average age of the AI community is so young but that gives these super smart people a lot of years to get things straightened out.
@mosicr 23 дні тому
They almost have . Peebles is just out of university.
@phen-themoogle7651 24 дні тому
The Matrix basically
@jeffspaulding43 25 днів тому ⁺¹
our subconscious does a much better job at modeling physics. you conscious mind imagines the apple falling vaguely. your subconsious mind can learn to juggle several apples without dropping them so it knows when they will be where
@leslietetteh7292 24 дні тому
We perceive possibility (which can be thought of as an extra dimension, idea from "imagining extra dimensions"). I would think if trained on branching "possibilities" it'd be much more consistent physics. But especially with the idea of polygon-rendering to photoreal image-to-image inference on the horizon, there's more of a focus on speeding up inference these days (see Meta's amazing work on "Imagine flash" with emu). With this sort of temporal consistency, if openai manages to get inference speed up, could just use a traditional videogame physics engine with photoreal inference laid on top. It'll probably sell a lot, especially if they map electrical signals through the spinal cord to touch input and replicate that. Seeing and touching the real world through vr will be epic, and yeah probably sell loads. Could train the next gen of AI engineers (think deep-sea or deep space repair) in a simulation that looks identical to, and behaves identically to the real world.
@tianjiancai1118 24 дні тому
Branching possibility introduces higher cost in an exponential way, so knowing how to (ralatively) precisely predict something is also important. Human certainly learn possibility, and we learn certainty too.
@leslietetteh7292 23 дні тому
@tianjiancai1118 Certainly. I'm almost sure it'd have a positive effect on modelling what are essentially 4d interactions effectively, but with the sort of inference speed ups we're seeing now, I'm pretty sure image-to-image inference, polygon rendering to photorealistic is the way to go for the easy win.
@tianjiancai1118 23 дні тому
You have memtioned "easy win". I would argue that any generation without understanding its nature can't be precise enough. Reference speed is important, but reference quality is also important to achieve indistinguishable (or so called no mistake) result. Though you can speed up reference and offer realtime generation, they are still cases requiring resonable results.
@leslietetteh7292 21 день тому
@@tianjiancai1118 "Imagine Flash: Accelerating Emu Diffusion Models with Backward Distillation" is a really good paper by Meta that you should read, its achieves super-fast inference without really compromising on quality. there are some pretty good demos of the quality they're achieving with real-time inference.
@davidh.65 24 дні тому
Why would they hype Sora up and then not even have a timeline for releasing a product??
@tianjiancai1118 24 дні тому
Because they are still working on prevention from misuse

Наступне

Автоматичне відтворення

ChatGPT Can Now Talk Like a Human [Latest Updates]