32:07 It helps me to think of Laplace smoothing as Pr(observation gets label) = (count of observations with label)/(number of observations) --> Pr(observation gets label) = (count of observations with label + 1)/(number of observations + number of possible labels)
In 19:15 wouldn't it be more accurate to say multinouli instead of multinomial, since the concept of number of trials that's a parameter of the multinomial distribution doesn't really apply here?
Thanks for the great video! One question: 8:00, if you have this NIPS in your feature, were you even able to train your model if you don't have any email contains NIPS? Your MLE formula will yield 0 probability. (Or actually you not really train your model, you got analytic solution directly, and prediction just use the counting solution?) Thanks in advance for any advice!
We estimate the parameters in the analytical solutions using MLE. If NIPS didn't occur, we can resolve the problem of zero division with Laplace smoothing. Or perhaps u mean NIPs not in your training dictionary. In which case a sentinel value it used to represent all other values not present in training data
nice and helpful. but initially i got little confused between geometrical and funcitonal margin that why are we defining two terms for just normalizing.
A doubt : When talking about NIPS conference making zero probability in Naive Bayes ; in the first place, probability of word NIPS shouldn't come up in the calculation P(x /y=0) , as the the binary column vector of 10000 elements won't have this word in it as its not in the top 10000 words cuz it started appearing very recently.
NIPS is the 6017 word in 10000 words dictionary but as the word doesn't appear in the mails that is received in the beginning the MLE is a product so it would tend to be zero, now when the word started appearing in the mail the detection by the model would be still zero as the product in the MLE is already at zero
No n_i is the number of words in the i'th email but the term we add to the bottom in laplace smoothing is the number of possible labels which in andrews example is the dictionary size=10'000
T amo demais essa noite foi tão rápido mas se Deus quiser vir buscar f xi tô falando w se quiser ir comigo te amo e fica tranquilo então obrigada pelo convite lá pegar o valor é é é só o mesmo do trabalho e depois do jogo e do trabalho é melhor hoje
camera person - please do not move it frequently next time. It should focus on what is written on board. You are tracing professor and losing content. We can relate voice to what is written on board. It should always be on vision what he talking. Your and his hard work got wasted a bit.
C BB GG GG GG GG e GG e GG GG GG GG e o e um pouco então né eu tenho e um beijo e o cafezinho e o carro de manhã r ER r viu o jogo é só no é só no é o nome de quem é
SVM starts from 46:20
Thank you very much!
Thanks bro
thanks alot for the timestamp.
51:00 to be precise
saved my life
32:07
It helps me to think of Laplace smoothing as
Pr(observation gets label) = (count of observations with label)/(number of observations) -->
Pr(observation gets label) = (count of observations with label + 1)/(number of observations + number of possible labels)
26:30 memo. he explains the difference between the multinomial event and the multivariate Bernoulli event model.
In 19:15 wouldn't it be more accurate to say multinouli instead of multinomial, since the concept of number of trials that's a parameter of the multinomial distribution doesn't really apply here?
Don't buy drugs, guys.
I did drugs so I could become a machine learner!
A lot of software engineers take Adderall and micro doses of molly, shrooms, and acid 😂
too late
Even Jabba the Hutt was interested in and asked a question
Thanks for the great video! One question: 8:00, if you have this NIPS in your feature, were you even able to train your model if you don't have any email contains NIPS? Your MLE formula will yield 0 probability. (Or actually you not really train your model, you got analytic solution directly, and prediction just use the counting solution?) Thanks in advance for any advice!
We estimate the parameters in the analytical solutions using MLE. If NIPS didn't occur, we can resolve the problem of zero division with Laplace smoothing.
Or perhaps u mean NIPs not in your training dictionary. In which case a sentinel value it used to represent all other values not present in training data
just had a doubt....... at 54:56 , what does g(z) denote ? is it the sigmoid function ?
Yes, for sigmoid function, when theta transpose x > 0, the sigmoid will be > 0.5
too many side quests in this level
nice and helpful. but initially i got little confused between geometrical and funcitonal margin that why are we defining two terms for just normalizing.
SVM from 46th min
thanks
A doubt : When talking about NIPS conference making zero probability in Naive Bayes ; in the first place, probability of word NIPS shouldn't come up in the calculation P(x /y=0) , as the the binary column vector of 10000 elements won't have this word in it as its not in the top 10000 words cuz it started appearing very recently.
I think he said a dictionary with 10k words where Nips is a he 6017th word, dictionary doesn't contain top 10k words
NIPS is the 6017 word in 10000 words dictionary but as the word doesn't appear in the mails that is received in the beginning the MLE is a product so it would tend to be zero, now when the word started appearing in the mail the detection by the model would be still zero as the product in the MLE is already at zero
They lost 😭
The exact thing I was wondering
so it doesn't need to do Laplace smoothing
at 35:21 shouldnt there be ni in general instead of the 10000 that is being added
No n_i is the number of words in the i'th email but the term we add to the bottom in laplace smoothing is the number of possible labels which in andrews example is the dictionary size=10'000
I was wondering the same thing for a second
1/4 done!😵
where can I find class notes for this lecture? please any one know this
cs229.stanford.edu/lectures-spring2022/main_notes.pdf
laplace smoothy
Done!
G a gente se vê se fala ET r viu o cafezinho tava no forno
T amo demais essa noite foi tão rápido mas se Deus quiser vir buscar f xi tô falando w se quiser ir comigo te amo e fica tranquilo então obrigada pelo convite lá pegar o valor é é é só o mesmo do trabalho e depois do jogo e do trabalho é melhor hoje
camera person - please do not move it frequently next time. It should focus on what is written on board. You are tracing professor and losing content. We can relate voice to what is written on board. It should always be on vision what he talking. Your and his hard work got wasted a bit.
C BB GG GG GG GG e GG e GG GG GG GG e o e um pouco então né eu tenho e um beijo e o cafezinho e o carro de manhã r ER r viu o jogo é só no é só no é o nome de quem é
He needs to learn how to speak loud and more clearly... Otherwise it's a good lecture 👍🏾
Turn up your head phones. I listen on 2x speed and can understand him. When I went to normal speed I understood less.
his vocal/volume is more than enough for me at laptop volume 35-40. maybe its your phone/device at fault.