- 161
- 390 149
Herman Kamper
Приєднався 8 кві 2020
Can we solve inequality in South Africa? Interview with Dieter von Fintel (TGIF 2024)
Can we solve inequality in South Africa? Interview with Dieter von Fintel (TGIF 2024)
Переглядів: 260
Відео
Reinforcement learning from human feedback (NLP817 12.3)
Переглядів 2642 місяці тому
Lecture notes: www.kamperh.com/nlp817/notes/12_llm_notes.pdf Full playlist: ua-cam.com/play/PLmZlBIcArwhOVLRdimL3lS9F_33fzh9jU.html Course website: www.kamperh.com/nlp817/ PPO theory: ua-cam.com/video/3uvnoVjM8nY/v-deo.html Proximal policy optimization explained: ua-cam.com/video/HrapVFNBN64/v-deo.html
The difference between GPT and ChatGPT (NLP817 12.2)
Переглядів 1252 місяці тому
Lecture notes: www.kamperh.com/nlp817/notes/12_llm_notes.pdf Full playlist: ua-cam.com/play/PLmZlBIcArwhOVLRdimL3lS9F_33fzh9jU.html Course website: www.kamperh.com/nlp817/
Large language model training and inference (NLP817 12.1)
Переглядів 1862 місяці тому
Lecture notes: www.kamperh.com/nlp817/notes/12_llm_notes.pdf Full playlist: ua-cam.com/play/PLmZlBIcArwhOVLRdimL3lS9F_33fzh9jU.html Course website: www.kamperh.com/nlp817/ Andrej Karpathy's LLM video: ua-cam.com/video/zjkBMFhNj_g/v-deo.html Byte pair encoding: ua-cam.com/video/20xtCxAAkFw/v-deo.html Transformers: ua-cam.com/play/PLmZlBIcArwhOPR2s-FIR7WoqNaBML233s.html
Extensions of RNNs (NLP817 9.7)
Переглядів 832 місяці тому
Lecture notes: www.kamperh.com/nlp817/notes/09_rnn_notes.pdf Full playlist: ua-cam.com/play/PLmZlBIcArwhOSBWBgRR70xip-NnbOwSji.html Course website: www.kamperh.com/nlp817/ Andrej Karpathy's blog: karpathy.github.io/2015/05/21/rnn-effectiveness/
Solutions to exploding and vanishing gradients (in RNNs) (NLP817 9.6)
Переглядів 532 місяці тому
Lecture notes: www.kamperh.com/nlp817/notes/09_rnn_notes.pdfFull playlist: ua-cam.com/play/PLmZlBIcArwhOSBWBgRR70xip-NnbOwSjiCourse.html website: www.kamperh.com/nlp817/ Gradient descent: ua-cam.com/video/BlnLoqn3ZBo/v-deo.html Colah's blog: colah.github.io/posts/2015-08-Understanding-LSTMs/
Vanishing and exploding gradients in RNNs (NLP817 9.5)
Переглядів 872 місяці тому
Lecture notes: www.kamperh.com/nlp817/notes/09_rnn_notes.pdf Full playlist: ua-cam.com/play/PLmZlBIcArwhOSBWBgRR70xip-NnbOwSji.html Course website: www.kamperh.com/nlp817/ Vector and matrix derivatives: ua-cam.com/video/xOx2SS6TXHQ/v-deo.html
Backpropagation through time (NLP817 9.4)
Переглядів 1902 місяці тому
Lecture notes: www.kamperh.com/nlp817/notes/09_rnn_notes.pdf Full playlist: ua-cam.com/play/PLmZlBIcArwhOSBWBgRR70xip-NnbOwSji.html Course website: www.kamperh.com/nlp817/ Vector and matrix derivatives: ua-cam.com/video/xOx2SS6TXHQ/v-deo.html Computational graphs for neural networks: ua-cam.com/video/fBSm5ElvJEg/v-deo.html Forks in neural networks: ua-cam.com/video/6mmEw738MQo/v-deo.html
RNN definition and computational graph (NLP817 9.3)
Переглядів 1002 місяці тому
Lecture notes: www.kamperh.com/nlp817/notes/09_rnn_notes.pdf Full playlist: ua-cam.com/play/PLmZlBIcArwhOSBWBgRR70xip-NnbOwSji.html Course website: www.kamperh.com/nlp817/
RNN language model loss function (NLP817 9.2)
Переглядів 962 місяці тому
Lecture notes: www.kamperh.com/nlp817/notes/09_rnn_notes.pdf Full playlist: ua-cam.com/play/PLmZlBIcArwhOSBWBgRR70xip-NnbOwSji.html Course website: www.kamperh.com/nlp817/
From feedforward to recurrent neural networks (NLP817 9.1)
Переглядів 2493 місяці тому
Lecture notes: www.kamperh.com/nlp817/notes/09_rnn_notes.pdf Full playlist: ua-cam.com/play/PLmZlBIcArwhOSBWBgRR70xip-NnbOwSji.html Course website: www.kamperh.com/nlp817/
Embedding layers in neural networks
Переглядів 2584 місяці тому
Full video list and slides: www.kamperh.com/data414/ Introduction to neural networks playlist: ua-cam.com/play/PLmZlBIcArwhMHnIrNu70mlvZOwe6MqWYn.html Word embeddings playlist: ua-cam.com/play/PLmZlBIcArwhPN5aRBaB_yTA0Yz5RQe5A_.html
Git workflow extras (including merge conflicts)
Переглядів 1094 місяці тому
Full playlist: ua-cam.com/play/PLmZlBIcArwhPFPPZp7br31Kbjt4k0NJD1.html Notes: www.kamperh.com/notes/git_workflow_notes.pdf
A Git workflow
Переглядів 2894 місяці тому
Full playlist: ua-cam.com/play/PLmZlBIcArwhPFPPZp7br31Kbjt4k0NJD1.html Notes: www.kamperh.com/notes/git_workflow_notes.pdf
Evaluating word embeddings (NLP817 7.12)
Переглядів 2255 місяців тому
Full playlist: ua-cam.com/play/PLmZlBIcArwhPN5aRBaB_yTA0Yz5RQe5A_.html Lecture notes: www.kamperh.com/nlp817/notes/07_word_embeddings_notes.pdf Course website: www.kamperh.com/nlp817/
Skip-gram with negative sampling (NLP817 7.10)
Переглядів 5855 місяців тому
Skip-gram with negative sampling (NLP817 7.10)
Continuous bag-of-words (CBOW) (NLP817 7.9)
Переглядів 1535 місяців тому
Continuous bag-of-words (CBOW) (NLP817 7.9)
Skip-gram as a neural network (NLP817 7.7)
Переглядів 4035 місяців тому
Skip-gram as a neural network (NLP817 7.7)
Skip-gram model structure (NLP817 7.5)
Переглядів 1715 місяців тому
Skip-gram model structure (NLP817 7.5)
Skip-gram loss function (NLP817 7.4)
Переглядів 2215 місяців тому
Skip-gram loss function (NLP817 7.4)
One-hot word embeddings (NLP817 7.2)
Переглядів 1275 місяців тому
One-hot word embeddings (NLP817 7.2)
What can large spoken language models tell us about speech? (IndabaX South Africa 2023)
Переглядів 1427 місяців тому
What can large spoken language models tell us about speech? (IndabaX South Africa 2023)
Hidden Markov models in practice (NLP817 5.13)
Переглядів 2069 місяців тому
Hidden Markov models in practice (NLP817 5.13)
Why expectation maximisation works (NLP817 5.11)
Переглядів 1519 місяців тому
Why expectation maximisation works (NLP817 5.11)
And since B is the max element, this justifies the interpretation of the log-sum-exp as a 'smooth max operator'
Really well explained, thanks.
Great video.
Drake Forges
I'm confused about the deriative of a vector function at 5:40, i think the gradient of a function f:Rn→Rm should be a matrix of size m×n. not sure about it
Brown Paul Wilson Deborah Harris Shirley
2:59 actually the pseudo algo you are using is 0 index.
Wonderful Lecture. Thank you
Cuz world wars!
Allen Shirley Miller Elizabeth Thomas Linda
For AnyOne having any doubts in relation bt NLL abd Cross entropy . this is a must watch !!!
This helped a lot. Fantastic intuitive explanation.
Super happy that it helped! :)
Hall Margaret Jones Angela Wilson Larry
Great video series! The algorithm video was the one that finally got me to "get" DTW!
Lopez Sharon Davis George Taylor Laura
Your content is amazing !!!
Thanks Aditya!
So if I have a list of categorical inputs, where the order indeed imply their closeness, then I should not use one-hot encoding, but just use numerical values to represent the categories, is that right?
love this series! you explained the concepts really well and dive into details!
So super grateful for the positive feedback!!!
You look like Benedict Cumberbatch
The nicest thing that anyone has ever said!
Thanks for posting Herman, super insightful!
Thanks a ton for the feedback! :)
You are good
The nicest thing anyone has ever said ;)
Awesome, very great video!!
I am not a student at your university, but I am glad that you are such a good prof.
Very happy you find this helpful!! 😊
I learned a lot as an Azerbaijani student. Thanks a lot <3
Really great explanations. I also really like your calm way of explaining things. I get the feeling that you distill everything important before recording the video. Keep up the great work!
Thanks a ton for this!! I enjoy making the videos, but it definitely takes a bit of time :)
Thank you
bro just keep teaching, that is great!
These videos are sorely underrated. Your explanations are concise and clear, thank you for making this topic so easy to understand and implement. Cheers from Pittsburgh.
Thanks so much for the massive encouragement!!
Working in NLP myself, I very much enjoy your videos as a refresher of the current ongoings. Continuing from your epilogue, will you cover the DPO process in detail?
Thanks for the encouragement @Aruuuq! Jip I still have one more video in this series to make (hopefully next week). It won't explain every little detail of the RL part, but hopefully the big stuff.
your way of explanation is very good
Thomas 🤣
good sir
thank you very much professor.
One of the best explanations on PCA relationship with SVD!
Why is it prefered to solve the problem as minimize the cross entropy over minimize de NLL? Are there more efficient properties doing that?
Thank you, really great explanation, I think I can understand it now.
Thanks for lecture.
With regards to the clock analogy (0:48): "If you know where you are on the clock then you will know where you are in the input". Why not just a single clock with very small frequency? A very small frequency will guarantee that even for large sentences there will be no "overlap" at the same position in the clock for different positions in the input.
The best explanation!
Great explanation!! Thank you so much for uploading!
Great video. That meow from the cat though
Thanks ! great video
This is one of the better explanations of how the heck we go from maximum likelihood to using NLL loss to log of softmax. Thanks!
Great Explanation
Thank you
Sticking to a simple Git workflow is beneficial, particularly using feature branches. However, adopting a 'Gitflow' working model should be avoided as it can become a cargo cult practice within an organization or team. As you mentioned, the author of this model has reconsidered its effectiveness. Gitflow can be cognitively taxing, promote silos, and delay merge conflicts until the end of sprint work cycles. Instead, using a trunk-based development approach is preferable. While this method requires more frequent pulls and daily merging, it ensures that everyone stays up-to-date with the main branch.
Thanks a ton for this, very useful. I think we ended up doing this type of model anyway. But good to know the actual words to use to describe it!
It very clear explanation, thank you very much!
Does this algorithm work with negative instance? I mean can i use vectors with both negative and postive values?
Good explanation. Thank you Herman
Hello Herman, first of all a very informative video! I have a question: How are the weight matrices defined? Are the matrices simply randomized in each layer? Do you have any literature on this? Thank you very much!
This is a good question! These matices will start out being randomly initialised, but then -- crucially -- they will be updated through gradient descent. Stated informally, each parameter in each of the matrices will be wiggled so that the loss goes down. Hope that makes sense!