The step at 11:20 seems a little hand-wavy. I don't see how it follows that the fraction is p(h_j | x) just because fraction describes a probability distribution. How do I know it describes the distribution p(h_j | x)? You say it *must* be so. But why?
Good question! It's hard to give the derivation here, but we know that p(h_j=1|x) = sigmoid(W_{j,.} x + b_j) and that p(h_j=0|x) 1-sigmoid(W_{j,.} x + b_j). Then, if you do the exercise of calculating the expression I highlight at 11:20 for h_j = 1 and for h_j=0, you'll see that they match. Hope this helps!
It would be interesting if you show na execution example of the RBM in a small dataset. Anyway, thank you for the explanation. Keep up the great work! =)
I strongly recommend you check out his lectures on CRFs. But basically, you have to sum over all possible values of all of the hidden units, because there are simply those many hidden units involved.
Hugo Larochelle thanks for the tip. I'll definitely check that out. I really enjoyed watching your lectures, you have this extraordinary gift of explain things clearly. So sad I cant speak French.
Hi, Hugo Larochelle, thanks for your video. I am confused what is the different between the sum of the j(numerator) and the sum of the h'_j(denominator). Could you explain it?
Is this explanation based on the assumption that both, h and x are either 1 or 0? I understand that they can´t be a discrete distribution of values between 1 and 0?
Good question! Yes the explanation is specific to the case where the values of x and h are 0 or 1. But it would be possible to derive a version where x and h takes any continuous value between 0 and 1 (the derivation is just a bit more complicated, requiring integrals). Hope this helps!
Hugo, Can you also share the assignments/exams associated with the course ? It would help me calibrate how much of the material I have correctly assimilated.
Sure! Once you've computed the value of the sigmoid (let's call that value p), you sample the value of the corresponding unit by sampling a real number between 0 and 1 from a uniform distribution, and if that number is smaller than p, then you set the unit to 1. Otherwise, you set it to 0. Hope this helps!
You can skip these lectures if you know this material already. Otherwise, you cannot code anything on your own. You can always look for implementations by others though, but most probably you won't get a deep understanding of the subject that way.
Thanks Hugo ! Out of many available videos on this topic, yours is the most lucid and easy to follow. Big help for me !
Thanks for your kind words!!
Thanks a lottt Hugo! I have become a great fan of your works!
The step at 11:20 seems a little hand-wavy. I don't see how it follows that the fraction is p(h_j | x) just because fraction describes a probability distribution. How do I know it describes the distribution p(h_j | x)? You say it *must* be so. But why?
Good question! It's hard to give the derivation here, but we know that p(h_j=1|x) = sigmoid(W_{j,.} x + b_j) and that p(h_j=0|x) 1-sigmoid(W_{j,.} x + b_j). Then, if you do the exercise of calculating the expression I highlight at 11:20 for h_j = 1 and for h_j=0, you'll see that they match.
Hope this helps!
Hi Hugo!
Amazing video
Can you please help me with the derivation of 4:42 ?? if there is a video supported or a document?
Thanks
@16:38 denominator is bit confusing. I mean why we find neighbors of z instead of z' in the denominator factor fuction?
There's no "reason". This is simply a statement of what the local Markov property is. In other words, this is how it is defined.
Thank you :)
really nice vids! i like your slide style
It would be interesting if you show na execution example of the RBM in a small dataset. Anyway, thank you for the explanation. Keep up the great work! =)
Many thanks for the detailed explanation!
Can you explain @7:00 how the nested sum of hidden units happened?
I strongly recommend you check out his lectures on CRFs. But basically, you have to sum over all possible values of all of the hidden units, because there are simply those many hidden units involved.
Very good explanation. Thanks a lot!
that's very very detailed.
which book did you follow?
Thanks for your kind words! I didn't follow any book actually :-)
@@hugolarochellethanks to you man!
Hi, Could you provide the HDBRM experiment code in the paper named "Classification using DRBM"? I try to recreat experiment and get stuck in HDRBM.
thank you Hugo, it helps a lot.
would you also do some cast in RNN?
Unfortunately no :-( Maybe I'll make some some day. In the mean time, I'd consider reading this:
www.iro.umontreal.ca/~bengioy/dlbook/rnn.html
Hugo Larochelle thanks for the tip. I'll definitely check that out.
I really enjoyed watching your lectures, you have this extraordinary gift of explain things clearly.
So sad I cant speak French.
Ji Feng Thanks, I really appreciate your kind words :-)
Link to RNN resource is down.. any other suggestion!?
Hi, Hugo Larochelle, thanks for your video. I am confused what is the different between the sum of the j(numerator) and the sum of the h'_j(denominator). Could you explain it?
never mind, I figured out lol
Good job! :-)
Is this explanation based on the assumption that both, h and x are either 1 or 0? I understand that they can´t be a discrete distribution of values between 1 and 0?
Good question! Yes the explanation is specific to the case where the values of x and h are 0 or 1. But it would be possible to derive a version where x and h takes any continuous value between 0 and 1 (the derivation is just a bit more complicated, requiring integrals).
Hope this helps!
@@hugolarochelle thanks heaps for answering, and so quickly!
Hey,
could you upload the presentations?
Sure! Everything is here: info.usherbrooke.ca/hlarochelle/neural_networks/content.html
Hugo Larochelle
Thank you !
Hugo,
Can you also share the assignments/exams associated with the course ? It would help me calibrate how much of the material I have correctly assimilated.
you'll find 3 assignments here: info.usherbrooke.ca/hlarochelle/neural_networks/evaluations.html
thank you !
Thank you for the video it is very helpful!
Hello, can you pls explain how the hidden layer is updated as 0 or 1 after obtaining the probability (operating with the activation function) ?
Sure! Once you've computed the value of the sigmoid (let's call that value p), you sample the value of the corresponding unit by sampling a real number between 0 and 1 from a uniform distribution, and if that number is smaller than p, then you set the unit to 1. Otherwise, you set it to 0.
Hope this helps!
@6.44 at the first line the denominator should probably be sum(p(x|h')), not sum(p(x, h')).
Nope, it's indeed \sum p(x,h'). That is because we have p(h|x) = p(x,h) /p(x), and p(x) = \sum_{h'} p(x,h').
OMG THANK YOU!!!!!
My pleasure :-)
cool!
45 minutes of match and proof, no exercise, no code so far.
Not saying I could do better, but maybe someone could.
You should understand this don't think any language could hold on in the future if you know this you won't have to worry in the future
You can skip these lectures if you know this material already. Otherwise, you cannot code anything on your own. You can always look for implementations by others though, but most probably you won't get a deep understanding of the subject that way.