MIT 6.S191: Language Models and New Frontiers

MIT Introduction to Deep Learning | 6.S191

MIT 6.S191 (2023): Reinforcement Learning

Тайское мороженое в Калининграде

Пилот обманул смерть ракета пролетела рядом с ним #shorts

Кирилл Набутов. Арестович в Кремле, кто взорвал командующего в Москве, война России с НАТО

MIT 6.S191: Reinforcement Learning

Alexander Amini

Переглядів 63 872

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 21 гру 2024

КОМЕНТАРІ •

@visheshphutela 6 місяців тому ⁺⁴⁶
Babe wake up new 6.S191 lecture just dropped
@BheezHandle 6 місяців тому ⁺²
Lol...
@VisatoVino 6 місяців тому ⁺¹
@@BheezHandle Feel the vibessssss
@crarewhiteheadpoin9471 5 місяців тому ⁺²
U got it
@bookish3018 2 дні тому
one of the best presentations about deep reinforcement learning concept, thanks a bunch for sharing it
@izharulhaq2436 6 місяців тому ⁺¹⁴
One of the best intro to RL. Recommended to every student interested in this field to watch this amazing lecture. I have just completed it at 1:40 AM...Now waiting for Actor-Critic Type RL Agent to be released soon...Thanks and Good night.
@artukikemty 6 місяців тому ⁺¹
Amazing intro to the subject. Since it is interrelated to control theory it is mandatory to have a good back ground on control theory such as state space models and optimal control
@Asif-fp8gy 6 місяців тому ⁺⁵
Awesome job. Only curious if someone can explain how was the target part of the loss function computed at 26:40?
@ravenclaw3693 5 місяців тому
immediate reward + discounted best possible future reward
@gamalieliissacnyambacha3029 6 місяців тому ⁺¹
I'm curious to listen to this lecture. I need more concepts to apply in my Thesis. I'm looking forward to seeing this happen soon.
@melvinkuriakose2708 6 місяців тому ⁺²
10:30 equation for total reward should be summation of rewards from t=0 to t=t, right? But in equation its from t to infinity...why?
@rorisangsitoboli4601 4 місяці тому
The total reward is from time 't' to a later time/time in far future (t^inf). Initial value of reward is r_t. The next one will be r_{t+1}, r_{t+2}, ..., till termination-assumed some time in the future but can be user chosen, e.g. time {t+n} as the termination time. Remember you can be rewarded now (t) or anytime in the far future (inf) so you sum over the entire duration.
@xxyyzz8464 2 місяці тому ⁺¹
You’re correct the lecturer screwed up here. What he says in spoken language does not match the equation he shows. His equation is the expected return (total future rewards) from time t given no uncertainty in future rewards as you follow the policy until the end of an episode, but in language he claims it is the sum of all rewards from time t=0 to time t, but that is clearly not what the equation states. I haven’t finished this but it’s likely the equation is right but his statement in language is wrong given he then shows the form where you discount future rewards. You would not discount past rewards which is why I think the equation is right but he just is not describing the equation properly in language.
@ViolentWarrior Місяць тому
What are the system requirements?
@hrishabhg 6 місяців тому
Lovely lecture.❤
Self driving car is a dynamic environment as compared to Gaming environment. It may be mentioned.
@artukikemty 6 місяців тому ⁺³
Transformers can be used as a direct replacement for DRL since it can process sequences as well. There is an article in medium related to this alternative.
@collinspo 2 місяці тому
Got a link?
@anoopitiss 6 місяців тому
Following since 3 years
@Crashrapescrypto 6 місяців тому
can you advise for my startup, we applied for YC, we want to setup up indian team and RLHF as well as using SIMPO to agentify the hospital system and remove the inefficiences faced in the current hospital systems. im an aussie coming to america. we have hardware as well, been in guangzhou for the last 6 weeks finding the best containers and cameras triend to train for guaging container volume for measuring stock remaining.
@Huayi-x3p 4 місяці тому
Hi, when i tried to run the modeling building part of lab 1, the line "tf.keras.layers.Embedding(vocab_size, embedding_dim, batch_input_shape=[batch_size, None])," does not work, and the error says batch_input_shape is an unrecognized keyword argument to Embeddings, has anyone else encountered this problem? I looked up the tf.keras.Embeddings documentation and couldnt' find anything to replace it...What did you guys to solve it? Thanks!
@Yeanpc 3 місяці тому ⁺¹
Hi, from my understanding when looking at TF documentation, Embeding doesn't take a batch_input_shape as parameter. I justg went ahead and executed the embedding as: tf.keras.layers.Embedding(input_dim=vocab_size, output_dim=embedding_dim) and it worked for me.
@christianrink4093 4 місяці тому
Can one conclude from the AlphaGo vs. AlphaZero showcase, that the bottleneck of "achieving" AGI/ASI, are we humans and the ethical/safety restrictions we have set?
@Radiant-84 4 місяці тому ⁺¹
Both alphago and zero rely on world models (and self play) which they can use to try out or plan different moves based on the simulated results. While it's super easy to do this simulation in board games, where the rules are deterministic, creating such a world model for something with drastically more complexity like the real world is far more challenging. Algorithims like MuZero, which use learned models, are getting their, but technically speaking, Deepminds got a lot more work to do before they can make Alpha-terminator ;)
@TheNewton 6 місяців тому
Please repeat questions, question askers audio is blown out or intelligible.
Some of the questions manage to be in the captions others but not all.
The professors mic is perfect however with a great mix one of the few series where you don't have to be max volume all the time.
@ssrwarrior7978 3 місяці тому
This is Awesome !!!!!
@ikpesuemmanuel7359 6 місяців тому
Is there an application of reinforcement learning for subsurface reservoir simulation?
@foregroundtreble05 6 місяців тому ⁺¹
Needed u
@wangfenjin 6 місяців тому ⁺¹
太牛了
@breezecreator8751 6 місяців тому
🎉
@Diego0wnz 6 місяців тому
👏
@Yume-x9v 6 місяців тому
Kenchin kokoro no.tabi.Study of the waste.

Наступне

Автоматичне відтворення

MIT 6.S191: Language Models and New Frontiers

MIT 6.S191: Language Models and New Frontiers

MIT Introduction to Deep Learning | 6.S191

MIT Introduction to Deep Learning | 6.S191

MIT 6.S191 (2023): Reinforcement Learning

MIT 6.S191 (2023): Reinforcement Learning

Тайское мороженое в Калининграде

Тайское мороженое в Калининграде

Пилот обманул смерть ракета пролетела рядом с ним #shorts

Пилот обманул смерть ракета пролетела рядом с ним #shorts

Кирилл Набутов. Арестович в Кремле, кто взорвал командующего в Москве, война России с НАТО

Кирилл Набутов. Арестович в Кремле, кто взорвал командующего в Москве, война России с НАТО

Хто такий РОМАН СВІТАН? Звідки бере інформацію про фронт?

Хто такий РОМАН СВІТАН? Звідки бере інформацію про фронт?

6. Monte Carlo Simulation

6. Monte Carlo Simulation

MIT 6.S091: Introduction to Deep Reinforcement Learning (Deep RL)

MIT 6.S091: Introduction to Deep Reinforcement Learning (Deep RL)

What is Q-Learning (back to basics)

What is Q-Learning (back to basics)

Accelerating scientific discovery with AI

Accelerating scientific discovery with AI

MIT 6.S191 (2023): Convolutional Neural Networks

MIT 6.S191 (2023): Convolutional Neural Networks

Stanford CS234 Reinforcement Learning I Introduction to Reinforcement Learning I 2024 I Lecture 1

Stanford CS234 Reinforcement Learning I Introduction to Reinforcement Learning I 2024 I Lecture 1

The Elegant Math Behind Machine Learning

The Elegant Math Behind Machine Learning

Lecture 1: Introduction to Superposition

Lecture 1: Introduction to Superposition

David Silver: AlphaGo, AlphaZero, and Deep Reinforcement Learning | Lex Fridman Podcast #86

David Silver: AlphaGo, AlphaZero, and Deep Reinforcement Learning | Lex Fridman Podcast #86

Як азовська піхота прийняла групу розвідки вс рф? Зізнання окупантів і кадри з GoPro

Як азовська піхота прийняла групу розвідки вс рф? Зізнання окупантів і кадри з GoPro

Анна Трінчер - Треш (Official Music Video)

Анна Трінчер - Треш (Official Music Video)

ГРАВИТАЦИЯ! ВЫЖИВАНИЕ на ЛЕТАЮЩЕМ ОСТРОВЕ(DDprod.) в РАСТ/RUST

ГРАВИТАЦИЯ! ВЫЖИВАНИЕ на ЛЕТАЮЩЕМ ОСТРОВЕ(DDprod.) в РАСТ/RUST

Удержаться на воде?? 🌊 #симбочкапимпочка #симбочка #симба

Удержаться на воде?? 🌊 #симбочкапимпочка #симбочка #симба

"Бажано відбити посадку без втрат": військовий розповів, як загибель побратимів впливає на психіку

"Бажано відбити посадку без втрат": військовий розповів, як загибель побратимів впливає на психіку

ФИЛЬМ! НЕВИНОВНЫЙ ГОТОВИТ ДЕРЗКИЙ ПОБЕГ С НЕПРИСТУПНОГО ОСТРОВА-ТЮРЬМЫ! Мотылёк! Русский фильм

ФИЛЬМ! НЕВИНОВНЫЙ ГОТОВИТ ДЕРЗКИЙ ПОБЕГ С НЕПРИСТУПНОГО ОСТРОВА-ТЮРЬМЫ! Мотылёк! Русский фильм

Что будет если украсть в магазине шоколадку 🍫

Что будет если украсть в магазине шоколадку 🍫

Что-что Мурсдей говорит? 💭 #симбочка #симба #мурсдей

Что-что Мурсдей говорит? 💭 #симбочка #симба #мурсдей