- 161
- 444 219
Herman Kamper
Приєднався 8 кві 2020
Can we solve inequality in South Africa? Interview with Dieter von Fintel (TGIF 2024)
Can we solve inequality in South Africa? Interview with Dieter von Fintel (TGIF 2024)
Переглядів: 351
Відео
Reinforcement learning from human feedback (NLP817 12.3)
Переглядів 5766 місяців тому
Lecture notes: www.kamperh.com/nlp817/notes/12_llm_notes.pdf Full playlist: ua-cam.com/play/PLmZlBIcArwhOVLRdimL3lS9F_33fzh9jU.html Course website: www.kamperh.com/nlp817/ PPO theory: ua-cam.com/video/3uvnoVjM8nY/v-deo.html Proximal policy optimization explained: ua-cam.com/video/HrapVFNBN64/v-deo.html
The difference between GPT and ChatGPT (NLP817 12.2)
Переглядів 2186 місяців тому
Lecture notes: www.kamperh.com/nlp817/notes/12_llm_notes.pdf Full playlist: ua-cam.com/play/PLmZlBIcArwhOVLRdimL3lS9F_33fzh9jU.html Course website: www.kamperh.com/nlp817/
Large language model training and inference (NLP817 12.1)
Переглядів 2916 місяців тому
Lecture notes: www.kamperh.com/nlp817/notes/12_llm_notes.pdf Full playlist: ua-cam.com/play/PLmZlBIcArwhOVLRdimL3lS9F_33fzh9jU.html Course website: www.kamperh.com/nlp817/ Andrej Karpathy's LLM video: ua-cam.com/video/zjkBMFhNj_g/v-deo.html Byte pair encoding: ua-cam.com/video/20xtCxAAkFw/v-deo.html Transformers: ua-cam.com/play/PLmZlBIcArwhOPR2s-FIR7WoqNaBML233s.html
Extensions of RNNs (NLP817 9.7)
Переглядів 1316 місяців тому
Lecture notes: www.kamperh.com/nlp817/notes/09_rnn_notes.pdf Full playlist: ua-cam.com/play/PLmZlBIcArwhOSBWBgRR70xip-NnbOwSji.html Course website: www.kamperh.com/nlp817/ Andrej Karpathy's blog: karpathy.github.io/2015/05/21/rnn-effectiveness/
Solutions to exploding and vanishing gradients (in RNNs) (NLP817 9.6)
Переглядів 1156 місяців тому
Lecture notes: www.kamperh.com/nlp817/notes/09_rnn_notes.pdfFull playlist: ua-cam.com/play/PLmZlBIcArwhOSBWBgRR70xip-NnbOwSjiCourse.html website: www.kamperh.com/nlp817/ Gradient descent: ua-cam.com/video/BlnLoqn3ZBo/v-deo.html Colah's blog: colah.github.io/posts/2015-08-Understanding-LSTMs/
Vanishing and exploding gradients in RNNs (NLP817 9.5)
Переглядів 1636 місяців тому
Lecture notes: www.kamperh.com/nlp817/notes/09_rnn_notes.pdf Full playlist: ua-cam.com/play/PLmZlBIcArwhOSBWBgRR70xip-NnbOwSji.html Course website: www.kamperh.com/nlp817/ Vector and matrix derivatives: ua-cam.com/video/xOx2SS6TXHQ/v-deo.html
Backpropagation through time (NLP817 9.4)
Переглядів 3906 місяців тому
Lecture notes: www.kamperh.com/nlp817/notes/09_rnn_notes.pdf Full playlist: ua-cam.com/play/PLmZlBIcArwhOSBWBgRR70xip-NnbOwSji.html Course website: www.kamperh.com/nlp817/ Vector and matrix derivatives: ua-cam.com/video/xOx2SS6TXHQ/v-deo.html Computational graphs for neural networks: ua-cam.com/video/fBSm5ElvJEg/v-deo.html Forks in neural networks: ua-cam.com/video/6mmEw738MQo/v-deo.html
RNN definition and computational graph (NLP817 9.3)
Переглядів 2376 місяців тому
Lecture notes: www.kamperh.com/nlp817/notes/09_rnn_notes.pdf Full playlist: ua-cam.com/play/PLmZlBIcArwhOSBWBgRR70xip-NnbOwSji.html Course website: www.kamperh.com/nlp817/
RNN language model loss function (NLP817 9.2)
Переглядів 2236 місяців тому
Lecture notes: www.kamperh.com/nlp817/notes/09_rnn_notes.pdf Full playlist: ua-cam.com/play/PLmZlBIcArwhOSBWBgRR70xip-NnbOwSji.html Course website: www.kamperh.com/nlp817/
From feedforward to recurrent neural networks (NLP817 9.1)
Переглядів 4767 місяців тому
Lecture notes: www.kamperh.com/nlp817/notes/09_rnn_notes.pdf Full playlist: ua-cam.com/play/PLmZlBIcArwhOSBWBgRR70xip-NnbOwSji.html Course website: www.kamperh.com/nlp817/
Embedding layers in neural networks
Переглядів 5267 місяців тому
Full video list and slides: www.kamperh.com/data414/ Introduction to neural networks playlist: ua-cam.com/play/PLmZlBIcArwhMHnIrNu70mlvZOwe6MqWYn.html Word embeddings playlist: ua-cam.com/play/PLmZlBIcArwhPN5aRBaB_yTA0Yz5RQe5A_.html
Git workflow extras (including merge conflicts)
Переглядів 1278 місяців тому
Full playlist: ua-cam.com/play/PLmZlBIcArwhPFPPZp7br31Kbjt4k0NJD1.html Notes: www.kamperh.com/notes/git_workflow_notes.pdf
A Git workflow
Переглядів 3678 місяців тому
Full playlist: ua-cam.com/play/PLmZlBIcArwhPFPPZp7br31Kbjt4k0NJD1.html Notes: www.kamperh.com/notes/git_workflow_notes.pdf
Evaluating word embeddings (NLP817 7.12)
Переглядів 3789 місяців тому
Full playlist: ua-cam.com/play/PLmZlBIcArwhPN5aRBaB_yTA0Yz5RQe5A_.html Lecture notes: www.kamperh.com/nlp817/notes/07_word_embeddings_notes.pdf Course website: www.kamperh.com/nlp817/
Skip-gram with negative sampling (NLP817 7.10)
Переглядів 1,1 тис.9 місяців тому
Skip-gram with negative sampling (NLP817 7.10)
Continuous bag-of-words (CBOW) (NLP817 7.9)
Переглядів 2779 місяців тому
Continuous bag-of-words (CBOW) (NLP817 7.9)
Skip-gram as a neural network (NLP817 7.7)
Переглядів 7419 місяців тому
Skip-gram as a neural network (NLP817 7.7)
Skip-gram model structure (NLP817 7.5)
Переглядів 2949 місяців тому
Skip-gram model structure (NLP817 7.5)
Skip-gram loss function (NLP817 7.4)
Переглядів 4109 місяців тому
Skip-gram loss function (NLP817 7.4)
One-hot word embeddings (NLP817 7.2)
Переглядів 2089 місяців тому
One-hot word embeddings (NLP817 7.2)
What can large spoken language models tell us about speech? (IndabaX South Africa 2023)
Переглядів 15111 місяців тому
What can large spoken language models tell us about speech? (IndabaX South Africa 2023)
Hidden Markov models in practice (NLP817 5.13)
Переглядів 268Рік тому
Hidden Markov models in practice (NLP817 5.13)
Why expectation maximisation works (NLP817 5.11)
Переглядів 174Рік тому
Why expectation maximisation works (NLP817 5.11)
exactly what I needed for my upcoming exam! Tysm
Thank you so much!
Don't be sorry. Thanks for such content going through basics.
Thank you for uploading these! One things is, I am so confused about the notation of X in these videos. are the X values here the literal symbol input or the word embedding?
Awesome series!!
This is indeed the best video, it cleared up my concept of training, validation, and testing a model's dataset.
At 5:55 when calculating negative log likelihood, is it base-10 log or natural log?
I'm not sure but in 12:12 I think using a FFN to transform would be very very cool. However I don't know whether it's better to put the net after the red vector then concat or concat then FFN.
11:55 at this moment when you finished the sentence I realized I am immensely enjoying math after years
one important question: in 2:51 you say ŷ₁ is a vector of probabilities. but isn't that just the word embedding that the model has predicted, that is then going to be superimposed to vocab size and then softmaxed to get the output word? or am I understanding it wrong?
wow. I am attending a god awful university for my bachelor's and subjects are explained in the most superficial way possible. Don't get me wrong, our professors are very kind and welcoming but the environment is not. especially with other students who are not interested in these subjects. Watching these I realize how badly I wanted to be in your classes xD
Thank you! the flow and explanation in this series is consice, informative and on-point!
Super useful, thanks!
SO happy this helps! :)
Got to learn many things about RNN Thanks
Thanks a ton for the kind message! :)
😘😘😘😘😘 Thank YOu SOo soo muchhhh
This is incredible, thank you for providing such high quality resources online for free. My university teacher could not do in 1 semester what you taught me in 1 video.
Thanks so much for the encouragement!!
why are we writng that k value in vec at 4:57, as our prediction will already contain some value at that point
greate explaintion the orignal image are so misleading
nice explanation sir
nice video. The Wikipedia link proved useful for my econometrics 101101 class
tried but failed again lol..thanks a lot
Your explanation just keeps getting better and better into the video, incredible job!
Thank you so much, this is the best explanation I have came across, I went through 10+ videos from popular instructors and institute but this was clear and through
great explanation
Sir which book is this
This is amazing!!!
17:35 What confuses me about this is, can we do the comparison to figure out if the same word is in the signal or if the both signals came from the same speaker? (IIrc the algo used for this is called DTW which is very similar to the Edit Distance algo)
Cool video, thanks.
Simple to understand. Thank you for writing the intermediate steps out. It really helps!
Great video but I still dont understand why you would have to use sin and cosine. You can just adjust the frequency of sin or cos and then get unique encodings and still maintain relative distance relationships between tokens. Why bother with sin and cosine? I know it has to do with the linear transform but I dont see why you cant perform a linear transform with cos or sin only.
Hey, I love your explanations and I use your UA-cam channel for machine learning almost exclusively. Could you please make a playlist to explain SVMs
So happy this helps! :) I should negotiate with my boss so I can make videos full time...
But how do I use polynomial regression when I have multiple points?
Bro this is like the best video I have seen on this topic
Thanks so much! Really appreciate! :)
Please do one on hierarchical clustering
Your lectures are amazing! Could you do some videos on hierarchical clustering?
Thanks so much for the encouraging message! :) I wish I had time to just make lecture videos... But hierarchical clustering is high on the list!
@@kamperh Thankss!!
K,Q,V is one of those concepts that will go into the history of Computer Science as one of the most unfortunate metaphors ever.
Great
Great lecture Prof. May I ask what are d=6 and d=7 here? Is it the embedding dimension? If so, for d=6, we should be having 3 pairs of sine-cosine waves right?
Hey Arnab! Sorry if this was a bit confusing. No, d=6 is the 6th dimension of the positional embedding. The embedding dimensionality itself will typically be the dimensionality of the word embeddings. If you jump to 14:00-ish, you will see the complete positional embedding. The earlier plots would be one dimension of this (when I should d=6, that would be the 6th dimension within this embedding).
Thanks so much for the prompt clarification Prof!
And since B is the max element, this justifies the interpretation of the log-sum-exp as a 'smooth max operator'
Really well explained, thanks.
Great video.
Drake Forges
I'm confused about the deriative of a vector function at 5:40, i think the gradient of a function f:Rn→Rm should be a matrix of size m×n. not sure about it
2:59 actually the pseudo algo you are using is 0 index.
Wonderful Lecture. Thank you
Cuz world wars!
For AnyOne having any doubts in relation bt NLL abd Cross entropy . this is a must watch !!!
This helped a lot. Fantastic intuitive explanation.
Super happy that it helped! :)
Great video series! The algorithm video was the one that finally got me to "get" DTW!
Your content is amazing !!!
Thanks Aditya!