Deep Q-Networks Explained!

CodeEmporium

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 27 лис 2023
Let's talk about deep q-learning, a popular reinforcement learning algorithm
ABOUT ME
⭕ Subscribe: ua-cam.com/users/CodeEmporiu...
📚 Medium Blog: / dataemporium
💻 Github: github.com/ajhalthor
👔 LinkedIn: / ajay-halthor-477974bb
PLAYLISTS FROM MY CHANNEL
⭕ Reinforcement Learning: • Reinforcement Learning...
Natural Language Processing: • Natural Language Proce...
⭕ Transformers from Scratch: • Natural Language Proce...
⭕ ChatGPT Playlist: • ChatGPT
⭕ Convolutional Neural Networks: • Convolution Neural Net...
⭕ The Math You Should Know : • The Math You Should Know
⭕ Probability Theory for Machine Learning: • Probability Theory for...
⭕ Coding Machine Learning: • Code Machine Learning
MATH COURSES (7 day free trial)
📕 Mathematics for Machine Learning: imp.i384100.net/MathML
📕 Calculus: imp.i384100.net/Calculus
📕 Statistics for Data Science: imp.i384100.net/AdvancedStati...
📕 Bayesian Statistics: imp.i384100.net/BayesianStati...
📕 Linear Algebra: imp.i384100.net/LinearAlgebra
📕 Probability: imp.i384100.net/Probability
OTHER RELATED COURSES (7 day free trial)
📕 ⭐ Deep Learning Specialization: imp.i384100.net/Deep-Learning
📕 Python for Everybody: imp.i384100.net/python
📕 MLOps Course: imp.i384100.net/MLOps
📕 Natural Language Processing (NLP): imp.i384100.net/NLP
📕 Machine Learning in Production: imp.i384100.net/MLProduction
📕 Data Science Specialization: imp.i384100.net/DataScience
📕 Tensorflow: imp.i384100.net/Tensorflow

КОМЕНТАРІ • 41

@CodeEmporium 6 місяців тому ⁺⁹
If you like this video and you think I deserve it, please consider giving this video a like. Subscribe for more!
@0xabaki 3 місяці тому
I second this statement
@neetpride5919 6 місяців тому ⁺¹⁷
Where does the target network come from, and if it's the ideal "conscience" why not just use that? If we already have the ideal network, why bother training a second one?
@CodeEmporium 6 місяців тому ⁺¹³
Good question. Maybe my rhetoric was not super clear here. Essentially, without that target network, the Q network would compute the loss by comparing to itself. In practice, this can lead to unstable values as it is chasing a moving target. Hence a slightly delayed network is introduced to stabilize training .
Note this target network- isn’t the final iteration of ideal conscience. It is rather an iteration in the direction of ideal conscience. I say “ideal conscience” in this context to illustrate that the loss is computed based on this target network value. But this target network also gets better over time
@zerge69 2 місяці тому ⁺⁶
The target network should be called the "snapshot network". It's simply an older version of the Q-network, over which you improve.
@sharonkevin9906 3 місяці тому
Love your videos mehn. They’ve really helped me understand the concepts
@royvivat113 2 місяці тому
Great video, this was helpful for me. The only thing that I found pretty confusing was the target network explanation, which I saw you address in another comment. You described it as the ideal conscience which really made it seem like its the optimal q-network that we're comparing to (which would defeat the purpose of training if we had that). In fact since gets updated every few batches, its less ideal that the q-network.
@johantchassem1553 3 місяці тому
Thanks for the explanation.
@florentb8578 2 місяці тому
Brillant explanation, well done
@deviduttanayak2684 5 місяців тому ⁺⁶
quiz 1=A
q2=B
q3=C
@sotasearcher 6 місяців тому ⁺²
A scenario a computer could benefit from learning on it's own: I remember Google reporting research on a model that used RL and was able to find more efficient assembly code for a sorting algorithm
@sotasearcher 6 місяців тому ⁺²
It was AlphaDev
@hakunamatata1o1 2 місяці тому
GOOD EXPLAINATION
@gayatri8728 27 днів тому
Amazing explanation 🎉🎉🎉🎉
@katnip1917 3 дні тому
Great Video!! Thank you for the explanation. My question is, why not use the current state in the target network, instead of the next state?
@axelolafsson7312 Місяць тому
this video is great
@zerge69 2 місяці тому
Awesome explanation, thanks. Except the quizzes.
@ajaytaneja111 3 місяці тому ⁺⁴
Sorry Ajay, I'm not sure I'm getting it. What do you mean an idealised network (you say Frank's idealised conscience)? Where does it come from? Looks like you say that's the actual solution (idealised conscience) but what's it's origin?
@amithapa1994 5 місяців тому ⁺²
Quiz 2:
B. It stores Q for future reference
@bean217 3 місяці тому
Is the target network also randomly initialized? Is it initialized with the same parameters as the Q-network?
From what I gather, the Q-network is acting as our behavior policy, and the target network is acting as our target policy. The way you describe it here makes it seem like the target network is already learned, but that would defeat the purpose of the algorithm in the first place.
@rpraver1 6 місяців тому ⁺²
This was a good video, but I would love to see a deeper dive into your transformer series, that was the best, but I am still missing clarity on some of the steps. Your explanations are the best and would love to see more.
I have re-watched your videos atleast 10 times and have many questions, we need more of your explanations. Keep it up.
@CodeEmporium 6 місяців тому ⁺¹
Thanks! Yea I am trying to get core concept videos out first and will soon love to dive into a series where I implement this system too :)
@hamzaali98 5 місяців тому
@@CodeEmporium Hey! A decision transformer video would be really appreciated
@harshsonar9346 26 днів тому
✨Quiiizz Timmmeeee✨
@eliasblancocastro9677 9 днів тому
Amazing video and explanation! I have a question, Can I use SGD instead of MSE?
@CodeEmporium 9 днів тому
Thanks! SGD is an optimizer (algorithm that describes HOW a model learns) while MSE is a loss function (a function that describes WHAT to minimize). They serve different purposes. But in general, you can replace loss functions with appropriate counter parts. They may not work exactly as described, but they can work in general
@user-sx1rt2lm6b 6 місяців тому
I did not understand where the target network comes from? And if it exists, why should a new one be trained?
@CodeEmporium 6 місяців тому
Good question. From my understanding the answer is more practical than theoretical.
The target network ensures the Q network isn’t chasing a moving target. If the network was compared against itself for every iteration, training would not be stable. Hence another slightly delayed network is introduced to ensure this stability
@himanshumeena745 2 місяці тому
quiz time 3 ka answer hai C , sahI khe rha hu na codemporium bhai
@sotasearcher 6 місяців тому ⁺³
2:40 A
@CodeEmporium 6 місяців тому ⁺²
That’s right! Nice!
@riadhossainbhuiyan4978 4 місяці тому ⁺¹
B
@NaveenKumar-vn7vx 6 місяців тому ⁺²
A
@CodeEmporium 6 місяців тому ⁺¹
A! Yep that’s right for Quiz 1
@meguellatiyounes8659 6 місяців тому
teach him how to use a pencil
@CodeEmporium 6 місяців тому ⁺⁶
I am a pencil
@user-vp6fh8gx7z 6 місяців тому
QAnon Network.
@riadhossainbhuiyan4978 4 місяці тому
Q3.A
@jongxina3595 5 місяців тому
Very cringey but good video nonetheless 👍
@hakunamatata1o1 2 місяці тому ⁺²
SHUDDAP
HE'S GIVING A GOOD VIBE

Наступне

Автоматичне відтворення

Proximal Policy Optimization | ChatGPT uses this