Herman Kamper
Herman Kamper
  • 161
  • 390 149

Відео

Reinforcement learning from human feedback (NLP817 12.3)
Переглядів 2642 місяці тому
Lecture notes: www.kamperh.com/nlp817/notes/12_llm_notes.pdf Full playlist: ua-cam.com/play/PLmZlBIcArwhOVLRdimL3lS9F_33fzh9jU.html Course website: www.kamperh.com/nlp817/ PPO theory: ua-cam.com/video/3uvnoVjM8nY/v-deo.html Proximal policy optimization explained: ua-cam.com/video/HrapVFNBN64/v-deo.html
The difference between GPT and ChatGPT (NLP817 12.2)
Переглядів 1252 місяці тому
Lecture notes: www.kamperh.com/nlp817/notes/12_llm_notes.pdf Full playlist: ua-cam.com/play/PLmZlBIcArwhOVLRdimL3lS9F_33fzh9jU.html Course website: www.kamperh.com/nlp817/
Large language model training and inference (NLP817 12.1)
Переглядів 1862 місяці тому
Lecture notes: www.kamperh.com/nlp817/notes/12_llm_notes.pdf Full playlist: ua-cam.com/play/PLmZlBIcArwhOVLRdimL3lS9F_33fzh9jU.html Course website: www.kamperh.com/nlp817/ Andrej Karpathy's LLM video: ua-cam.com/video/zjkBMFhNj_g/v-deo.html Byte pair encoding: ua-cam.com/video/20xtCxAAkFw/v-deo.html Transformers: ua-cam.com/play/PLmZlBIcArwhOPR2s-FIR7WoqNaBML233s.html
Extensions of RNNs (NLP817 9.7)
Переглядів 832 місяці тому
Lecture notes: www.kamperh.com/nlp817/notes/09_rnn_notes.pdf Full playlist: ua-cam.com/play/PLmZlBIcArwhOSBWBgRR70xip-NnbOwSji.html Course website: www.kamperh.com/nlp817/ Andrej Karpathy's blog: karpathy.github.io/2015/05/21/rnn-effectiveness/
Solutions to exploding and vanishing gradients (in RNNs) (NLP817 9.6)
Переглядів 532 місяці тому
Lecture notes: www.kamperh.com/nlp817/notes/09_rnn_notes.pdfFull playlist: ua-cam.com/play/PLmZlBIcArwhOSBWBgRR70xip-NnbOwSjiCourse.html website: www.kamperh.com/nlp817/ Gradient descent: ua-cam.com/video/BlnLoqn3ZBo/v-deo.html Colah's blog: colah.github.io/posts/2015-08-Understanding-LSTMs/
Vanishing and exploding gradients in RNNs (NLP817 9.5)
Переглядів 872 місяці тому
Lecture notes: www.kamperh.com/nlp817/notes/09_rnn_notes.pdf Full playlist: ua-cam.com/play/PLmZlBIcArwhOSBWBgRR70xip-NnbOwSji.html Course website: www.kamperh.com/nlp817/ Vector and matrix derivatives: ua-cam.com/video/xOx2SS6TXHQ/v-deo.html
Backpropagation through time (NLP817 9.4)
Переглядів 1902 місяці тому
Lecture notes: www.kamperh.com/nlp817/notes/09_rnn_notes.pdf Full playlist: ua-cam.com/play/PLmZlBIcArwhOSBWBgRR70xip-NnbOwSji.html Course website: www.kamperh.com/nlp817/ Vector and matrix derivatives: ua-cam.com/video/xOx2SS6TXHQ/v-deo.html Computational graphs for neural networks: ua-cam.com/video/fBSm5ElvJEg/v-deo.html Forks in neural networks: ua-cam.com/video/6mmEw738MQo/v-deo.html
RNN definition and computational graph (NLP817 9.3)
Переглядів 1002 місяці тому
Lecture notes: www.kamperh.com/nlp817/notes/09_rnn_notes.pdf Full playlist: ua-cam.com/play/PLmZlBIcArwhOSBWBgRR70xip-NnbOwSji.html Course website: www.kamperh.com/nlp817/
RNN language model loss function (NLP817 9.2)
Переглядів 962 місяці тому
Lecture notes: www.kamperh.com/nlp817/notes/09_rnn_notes.pdf Full playlist: ua-cam.com/play/PLmZlBIcArwhOSBWBgRR70xip-NnbOwSji.html Course website: www.kamperh.com/nlp817/
From feedforward to recurrent neural networks (NLP817 9.1)
Переглядів 2493 місяці тому
Lecture notes: www.kamperh.com/nlp817/notes/09_rnn_notes.pdf Full playlist: ua-cam.com/play/PLmZlBIcArwhOSBWBgRR70xip-NnbOwSji.html Course website: www.kamperh.com/nlp817/
Embedding layers in neural networks
Переглядів 2584 місяці тому
Full video list and slides: www.kamperh.com/data414/ Introduction to neural networks playlist: ua-cam.com/play/PLmZlBIcArwhMHnIrNu70mlvZOwe6MqWYn.html Word embeddings playlist: ua-cam.com/play/PLmZlBIcArwhPN5aRBaB_yTA0Yz5RQe5A_.html
Git workflow extras (including merge conflicts)
Переглядів 1094 місяці тому
Full playlist: ua-cam.com/play/PLmZlBIcArwhPFPPZp7br31Kbjt4k0NJD1.html Notes: www.kamperh.com/notes/git_workflow_notes.pdf
A Git workflow
Переглядів 2894 місяці тому
Full playlist: ua-cam.com/play/PLmZlBIcArwhPFPPZp7br31Kbjt4k0NJD1.html Notes: www.kamperh.com/notes/git_workflow_notes.pdf
Evaluating word embeddings (NLP817 7.12)
Переглядів 2255 місяців тому
Full playlist: ua-cam.com/play/PLmZlBIcArwhPN5aRBaB_yTA0Yz5RQe5A_.html Lecture notes: www.kamperh.com/nlp817/notes/07_word_embeddings_notes.pdf Course website: www.kamperh.com/nlp817/
GloVe word embeddings (NLP817 7.11)
Переглядів 2335 місяців тому
GloVe word embeddings (NLP817 7.11)
Skip-gram with negative sampling (NLP817 7.10)
Переглядів 5855 місяців тому
Skip-gram with negative sampling (NLP817 7.10)
Continuous bag-of-words (CBOW) (NLP817 7.9)
Переглядів 1535 місяців тому
Continuous bag-of-words (CBOW) (NLP817 7.9)
Skip-gram example (NLP817 7.8)
Переглядів 1585 місяців тому
Skip-gram example (NLP817 7.8)
Skip-gram as a neural network (NLP817 7.7)
Переглядів 4035 місяців тому
Skip-gram as a neural network (NLP817 7.7)
Skip-gram optimisation (NLP817 7.6)
Переглядів 2075 місяців тому
Skip-gram optimisation (NLP817 7.6)
Skip-gram model structure (NLP817 7.5)
Переглядів 1715 місяців тому
Skip-gram model structure (NLP817 7.5)
Skip-gram loss function (NLP817 7.4)
Переглядів 2215 місяців тому
Skip-gram loss function (NLP817 7.4)
Skip-gram introduction (NLP817 7.3)
Переглядів 2675 місяців тому
Skip-gram introduction (NLP817 7.3)
One-hot word embeddings (NLP817 7.2)
Переглядів 1275 місяців тому
One-hot word embeddings (NLP817 7.2)
Why word embeddings? (NLP817 7.1)
Переглядів 3355 місяців тому
Why word embeddings? (NLP817 7.1)
What can large spoken language models tell us about speech? (IndabaX South Africa 2023)
Переглядів 1427 місяців тому
What can large spoken language models tell us about speech? (IndabaX South Africa 2023)
Hidden Markov models in practice (NLP817 5.13)
Переглядів 2069 місяців тому
Hidden Markov models in practice (NLP817 5.13)
The log-sum-exp trick (NLP817 5.12)
Переглядів 7129 місяців тому
The log-sum-exp trick (NLP817 5.12)
Why expectation maximisation works (NLP817 5.11)
Переглядів 1519 місяців тому
Why expectation maximisation works (NLP817 5.11)

КОМЕНТАРІ

  • @jcamargo2005
    @jcamargo2005 5 днів тому

    And since B is the max element, this justifies the interpretation of the log-sum-exp as a 'smooth max operator'

  • @aditya3984
    @aditya3984 5 днів тому

    Really well explained, thanks.

  • @aditya3984
    @aditya3984 5 днів тому

    Great video.

  • @RuthClark-f1j
    @RuthClark-f1j 9 днів тому

    Drake Forges

  • @awenzhi
    @awenzhi 10 днів тому

    I'm confused about the deriative of a vector function at 5:40, i think the gradient of a function f:Rn→Rm should be a matrix of size m×n. not sure about it

  • @BradleyCooper-f1n
    @BradleyCooper-f1n 16 днів тому

    Brown Paul Wilson Deborah Harris Shirley

  • @andrefreitas9936
    @andrefreitas9936 18 днів тому

    2:59 actually the pseudo algo you are using is 0 index.

  • @Gwittdog
    @Gwittdog 18 днів тому

    Wonderful Lecture. Thank you

  • @warpdrive9229
    @warpdrive9229 20 днів тому

    Cuz world wars!

  • @SarahSanchez-b2w
    @SarahSanchez-b2w 21 день тому

    Allen Shirley Miller Elizabeth Thomas Linda

  • @RahulSinghChhonkar
    @RahulSinghChhonkar 22 дні тому

    For AnyOne having any doubts in relation bt NLL abd Cross entropy . this is a must watch !!!

  • @Josia-p5m
    @Josia-p5m 23 дні тому

    This helped a lot. Fantastic intuitive explanation.

    • @kamperh
      @kamperh 23 дні тому

      Super happy that it helped! :)

  • @EdwardHernandez-l4z
    @EdwardHernandez-l4z 24 дні тому

    Hall Margaret Jones Angela Wilson Larry

  • @nschweiz1
    @nschweiz1 25 днів тому

    Great video series! The algorithm video was the one that finally got me to "get" DTW!

  • @JoellaAlberty-z5c
    @JoellaAlberty-z5c 25 днів тому

    Lopez Sharon Davis George Taylor Laura

  • @adityasonale1608
    @adityasonale1608 Місяць тому

    Your content is amazing !!!

    • @kamperh
      @kamperh 29 днів тому

      Thanks Aditya!

  • @tgzhu3258
    @tgzhu3258 Місяць тому

    So if I have a list of categorical inputs, where the order indeed imply their closeness, then I should not use one-hot encoding, but just use numerical values to represent the categories, is that right?

  • @tgzhu3258
    @tgzhu3258 Місяць тому

    love this series! you explained the concepts really well and dive into details!

    • @kamperh
      @kamperh Місяць тому

      So super grateful for the positive feedback!!!

  • @viswanathvuppala4526
    @viswanathvuppala4526 Місяць тому

    You look like Benedict Cumberbatch

    • @kamperh
      @kamperh Місяць тому

      The nicest thing that anyone has ever said!

  • @molebohengmokapane3311
    @molebohengmokapane3311 Місяць тому

    Thanks for posting Herman, super insightful!

    • @kamperh
      @kamperh Місяць тому

      Thanks a ton for the feedback! :)

  • @Alabsi3A
    @Alabsi3A Місяць тому

    You are good

    • @kamperh
      @kamperh Місяць тому

      The nicest thing anyone has ever said ;)

  • @cuongnguyenuc1776
    @cuongnguyenuc1776 Місяць тому

    Awesome, very great video!!

  • @pleasebitt
    @pleasebitt Місяць тому

    I am not a student at your university, but I am glad that you are such a good prof.

    • @kamperh
      @kamperh Місяць тому

      Very happy you find this helpful!! 😊

  • @rahilnecefov2018
    @rahilnecefov2018 2 місяці тому

    I learned a lot as an Azerbaijani student. Thanks a lot <3

  • @rrrmil
    @rrrmil 2 місяці тому

    Really great explanations. I also really like your calm way of explaining things. I get the feeling that you distill everything important before recording the video. Keep up the great work!

    • @kamperh
      @kamperh 2 місяці тому

      Thanks a ton for this!! I enjoy making the videos, but it definitely takes a bit of time :)

  • @liyingyeo5920
    @liyingyeo5920 2 місяці тому

    Thank you

  • @rahilnecefov2018
    @rahilnecefov2018 2 місяці тому

    bro just keep teaching, that is great!

  • @josephengelmeier9856
    @josephengelmeier9856 2 місяці тому

    These videos are sorely underrated. Your explanations are concise and clear, thank you for making this topic so easy to understand and implement. Cheers from Pittsburgh.

    • @kamperh
      @kamperh 2 місяці тому

      Thanks so much for the massive encouragement!!

  • @Aruuuq
    @Aruuuq 2 місяці тому

    Working in NLP myself, I very much enjoy your videos as a refresher of the current ongoings. Continuing from your epilogue, will you cover the DPO process in detail?

    • @kamperh
      @kamperh 2 місяці тому

      Thanks for the encouragement @Aruuuq! Jip I still have one more video in this series to make (hopefully next week). It won't explain every little detail of the RL part, but hopefully the big stuff.

  • @OussemaGuerriche
    @OussemaGuerriche 2 місяці тому

    your way of explanation is very good

  • @shylilak
    @shylilak 2 місяці тому

    Thomas 🤣

  • @MuhammadSqlain
    @MuhammadSqlain 2 місяці тому

    good sir

  • @TechRevolutionNow
    @TechRevolutionNow 2 місяці тому

    thank you very much professor.

  • @ozysjahputera7669
    @ozysjahputera7669 2 місяці тому

    One of the best explanations on PCA relationship with SVD!

  • @martinpareegol5263
    @martinpareegol5263 2 місяці тому

    Why is it prefered to solve the problem as minimize the cross entropy over minimize de NLL? Are there more efficient properties doing that?

  • @chetterhummin1482
    @chetterhummin1482 3 місяці тому

    Thank you, really great explanation, I think I can understand it now.

  • @zephyrus1333
    @zephyrus1333 3 місяці тому

    Thanks for lecture.

  • @adosar7261
    @adosar7261 3 місяці тому

    With regards to the clock analogy (0:48): "If you know where you are on the clock then you will know where you are in the input". Why not just a single clock with very small frequency? A very small frequency will guarantee that even for large sentences there will be no "overlap" at the same position in the clock for different positions in the input.

  • @ex-pwian1190
    @ex-pwian1190 3 місяці тому

    The best explanation!

  • @frogvonneumann9761
    @frogvonneumann9761 3 місяці тому

    Great explanation!! Thank you so much for uploading!

  • @Le_Parrikar
    @Le_Parrikar 3 місяці тому

    Great video. That meow from the cat though

  • @kobi981
    @kobi981 3 місяці тому

    Thanks ! great video

  • @harshadsaykhedkar1515
    @harshadsaykhedkar1515 4 місяці тому

    This is one of the better explanations of how the heck we go from maximum likelihood to using NLL loss to log of softmax. Thanks!

  • @shahulrahman2516
    @shahulrahman2516 4 місяці тому

    Great Explanation

  • @shahulrahman2516
    @shahulrahman2516 4 місяці тому

    Thank you

  • @yaghiyahbrenner8902
    @yaghiyahbrenner8902 4 місяці тому

    Sticking to a simple Git workflow is beneficial, particularly using feature branches. However, adopting a 'Gitflow' working model should be avoided as it can become a cargo cult practice within an organization or team. As you mentioned, the author of this model has reconsidered its effectiveness. Gitflow can be cognitively taxing, promote silos, and delay merge conflicts until the end of sprint work cycles. Instead, using a trunk-based development approach is preferable. While this method requires more frequent pulls and daily merging, it ensures that everyone stays up-to-date with the main branch.

    • @kamperh
      @kamperh 4 місяці тому

      Thanks a ton for this, very useful. I think we ended up doing this type of model anyway. But good to know the actual words to use to describe it!

  • @basiaostaszewska7775
    @basiaostaszewska7775 4 місяці тому

    It very clear explanation, thank you very much!

  • @bleusorcoc1080
    @bleusorcoc1080 4 місяці тому

    Does this algorithm work with negative instance? I mean can i use vectors with both negative and postive values?

  • @kundanyalangi2922
    @kundanyalangi2922 4 місяці тому

    Good explanation. Thank you Herman

  • @niklasfischer3146
    @niklasfischer3146 4 місяці тому

    Hello Herman, first of all a very informative video! I have a question: How are the weight matrices defined? Are the matrices simply randomized in each layer? Do you have any literature on this? Thank you very much!

    • @kamperh
      @kamperh 4 місяці тому

      This is a good question! These matices will start out being randomly initialised, but then -- crucially -- they will be updated through gradient descent. Stated informally, each parameter in each of the matrices will be wiggled so that the loss goes down. Hope that makes sense!