Multi-Armed Bandit : Data Science Concepts

Поділитися
Вставка
  • Опубліковано 28 лис 2024

КОМЕНТАРІ • 146

  • @jc_777
    @jc_777 2 роки тому +20

    enough exploration for good youtube lecture on ml. i should keep exploit this guy. 0 regret guaranteed :)

  • @Sad-mm8tm
    @Sad-mm8tm 4 роки тому +113

    I hope you will continue making videos forever. Your explanations are the best I've ever seen anywhere + the wide choice of topics gives me food for thought when dealing with my own optimization problems.

    • @ritvikmath
      @ritvikmath  4 роки тому +4

      Thank you :) I'm happy to help

    • @VahidOnTheMove
      @VahidOnTheMove Рік тому +1

      If he makes videos forever, we'll get zero regrets.

    • @Xaka-waka
      @Xaka-waka 9 місяців тому

      ​@@ritvikmathdon't let this channel die man

  • @marcelobeckmann9552
    @marcelobeckmann9552 3 роки тому +18

    Your explanations, didactics, and dynamism are amazing, way better than several university professors. Well done!

  • @faadi4536
    @faadi4536 3 роки тому +10

    What an amazing explanation. I am taking a Machine Learning Course and he tried to explain the concept using Bandits but couldn't quite really grasp it in detail. I understood what we are trying to figure out but wasn't quite their yet. You have made it so much easier. Kudos to You Brother.

  • @bilalbayrakdar7100
    @bilalbayrakdar7100 2 роки тому +4

    Bro I completed my CS degree with your help and now I got accepted for master and you are still here to help. You are a true man, thx mate

  • @abdulsami5843
    @abdulsami5843 3 роки тому +11

    A thing I absolutely like is how palatable you make these concepts, not too mathematical/theoratical and not overly simplified, just the right balance ( € - greedy is set right 😉)

  • @savasozturk00
    @savasozturk00 5 місяців тому

    After watching 5 videos, finally I found the best lecture teller for this topic. The examples are great, Thanks.

  • @AnasHawasli
    @AnasHawasli 21 день тому

    Thank you so much for this simple explanation
    It was impossible for me to understand this concept without your video
    NOT EVERYONE SPENT HIS LIFE IN A CASINO I am not familiar with this armed bandit trash
    Here is a sub!

  • @softerseltzer
    @softerseltzer 4 роки тому +21

    Love your videos, the quality just keeps going up!
    PS. the name of the slot machine is "One-armed bandit", because of the long arm-like lever that you pull to play.

    • @irishryano
      @irishryano 4 роки тому +3

      ....And the bandit bc it’s the WORST odds in every casino

    • @spicytuna08
      @spicytuna08 2 роки тому

      i guess the slot machine is a bandit cause it keeps robbing money from the players.

  • @111dogger
    @111dogger 3 роки тому +2

    This is the best explanation I have come across so far for the Upper Bound Confidence concept. Thank you!

  • @itssosh
    @itssosh 2 роки тому +4

    It would be great if you made a whole playlist where you explain the statistics for machine learning by explaining the formulas in an intuitive way like you do (you make me understand them all). For example, explain the various distributions and their meaning, statistical tests (p-value), etc. Thank you so much for the work you do and the knowledge you share!

  • @malice112
    @malice112 Рік тому

    What a great and easy to understand explanation of MAB - thank you for this!!!!

  • @spicytuna08
    @spicytuna08 2 роки тому

    we need to a person like you to democratize these important concepts cannot express how grateful i am to understand these important concepts which i have struggled in the past.

  • @SURBHIGUPTA-o4w
    @SURBHIGUPTA-o4w 4 місяці тому

    Thanks Ritvik! this is the best explanation I have come across so far!

  • @Dr.RegulaSrilakshmi
    @Dr.RegulaSrilakshmi 7 місяців тому

    U r just awesome ,any person who doesn't have any knowledge of Reinforcement learning can understand,Keep up the spirit...cheers

  • @shahnazmalik6553
    @shahnazmalik6553 4 роки тому +2

    Your teaching method is highly appreciated. Please make lectures on statistics and machine learning algorithms

  • @heteromodal
    @heteromodal 3 роки тому +2

    Great video, and it's really nice listening to you! Thank you :)

  • @hameddadgour
    @hameddadgour 2 роки тому

    I just realized that I need to explore more to maximize my happiness. Thank you Multi-Amed Bandit :)

  • @adanulabidin
    @adanulabidin 7 місяців тому

    What an amazing explanation! Thank you so much. Keep making such videos.

  • @jonathanarias2729
    @jonathanarias2729 2 роки тому +4

    Why 330 is the response in the explotation example? Should t be;
    3000-2396=604??

  • @SDKIM0211
    @SDKIM0211 2 роки тому +2

    Love your videos. To understand the average regret value for exploitation, which extra material should we refer to? Why not 604?

  • @maxencelaisne4141
    @maxencelaisne4141 4 роки тому +2

    Thank you so much, I passed my exam thanks to your explanation :)

  • @traiancoza5214
    @traiancoza5214 3 роки тому

    Perfectly explained. Genius.

  • @michaelvogt7787
    @michaelvogt7787 4 місяці тому

    multi-armed bandit is a misnomer really... it should be multi-one-armed-bandit problem. slot machines were called one-armed bandits because they have a single arm that is pulled, and the odds of winning are stacked against the player making them bandits. the goal is not so much about finding out which to play, which would become more apparent given enough plays, but instead to determine which mix of N plays to spread out across the group, settling in on the best mix to achieve exploration in balance against exploiting the best returning bandit. i am a career research scientist pioneering in this field for 40 years... i am always reviewing videos to back-share with students and learners and YOURS have Returned the greatest value for my Exploration, and I will be Exploiting YOURs by sharing them the most with my students. its the best compliment i can think of. cheers. dr vogt ;- )

  • @sahanar8612
    @sahanar8612 Місяць тому

    Great Explanation!. Thank you 😊

  • @gabrieldart9943
    @gabrieldart9943 Рік тому

    This is so cool! Thanks for your clear explanation.

  • @vahidsohrabi94
    @vahidsohrabi94 3 роки тому

    I'm grateful to you because of this great tutorial.

  • @jroseme
    @jroseme Рік тому

    This was a useful supplement to my read of Reinforcement Learning by Sutton & Barto. Thanks.

  • @CaioCarneloz
    @CaioCarneloz 2 роки тому

    The way you explain is stunning, what a awesome lesson.

  • @kunalkasodekar8562
    @kunalkasodekar8562 4 місяці тому

    Perfect Explanation!

  • @khaalidmcmillan9260
    @khaalidmcmillan9260 Рік тому

    Well said, needed a refresher after not seeing this for a while and this nailed it. Hopefully you've gone into more advanced topics like MAB reinforcement learning

  • @A.n.a.n.d.k.r.
    @A.n.a.n.d.k.r. Рік тому

    Awesome cool technique just got hooked to this

  • @llescarini
    @llescarini 3 роки тому +1

    Subscribed since few days, your videos are more than excellent! Amazing skill for teaching, thanks a lot.

  • @NoNTr1v1aL
    @NoNTr1v1aL 3 роки тому +1

    Amazing video!

  • @anaydongre1226
    @anaydongre1226 4 роки тому +1

    Thanks so much for explaining this in detail !!

  • @francisliubin
    @francisliubin Рік тому

    Thanks for the great explanation. What is the essential difference between contextual bandit (CB) problem vs multi-arm bandit (MB) problem? How does the difference impact the strategy?

  • @rikki146
    @rikki146 Рік тому

    i cannot thank you enough for makin this excellent vid!

  • @DarkNinja-24
    @DarkNinja-24 2 роки тому

    Wow, great example and amazing explanation!

  • @jinpark9871
    @jinpark9871 4 роки тому +1

    Thanks, your work is really awesome.

  • @abogadorobot6094
    @abogadorobot6094 3 роки тому

    WOW! That's was brilliant! Thank you!

  •  3 місяці тому

    Awesome! Thank you! You helped me a lot!

  • @dr.nalinfonseka7072
    @dr.nalinfonseka7072 Рік тому

    Excellent explanation!

  • @fridmamedov270
    @fridmamedov270 10 місяців тому

    Simple and accurate. That is it. Thanks!!!

  • @aryankr
    @aryankr Рік тому

    Thank you for a great explanation!!

  • @nassehk
    @nassehk 4 роки тому +2

    I am new to your channel. You have a talent in teaching my friend. I enjoy your content a lot. Thanks.

  • @soundcollective2240
    @soundcollective2240 2 роки тому

    Thanks, it was quite useful, heading to your Thompson Sampling video :)

  • @raphaeldayan
    @raphaeldayan 3 роки тому

    Amazing explanation, very clear, thank you Sr

  • @stanislavezhevski2877
    @stanislavezhevski2877 4 роки тому +11

    Great explanation, can you leave a link to the code, which you used in simulations ?

    • @ritvikmath
      @ritvikmath  4 роки тому +5

      Thanks! I have a follow up video on Multi-Armed Bandit coming out next week and the code will be linked in the description of that video. Stay tuned!

  • @nastya831
    @nastya831 3 роки тому +1

    thanks man, this is truly helpful! 6 min at 2x and I got it all

  • @nintishia
    @nintishia 3 роки тому

    Very clear explanation. Thanks for this video.

  • @michaelvogt7787
    @michaelvogt7787 5 місяців тому

    Nicely done.

  • @SamuelOgazi
    @SamuelOgazi 9 місяців тому

    Thank you so much for the clarity in this video!
    However, I thought the regret for the exploit-only strategy would be 3,000 - 2396 = 604.
    Kindly clarify.

  • @welidbenchouche
    @welidbenchouche Рік тому

    This is more than enough for me

  • @amirnouripour5501
    @amirnouripour5501 2 роки тому

    Thanks a lot. Very insightful!

  • @Status_Bleach
    @Status_Bleach Рік тому

    Thanks for the vid boss. How exactly did you calculate the average rewards for the Exploit Only and Epsilon-Greedy strategies though?

  • @yongnaguo8772
    @yongnaguo8772 3 роки тому

    Thanks! Very good explanation!

  • @krittaprottangkittikun7740
    @krittaprottangkittikun7740 3 роки тому

    This is so clear to me. Thank you for making this video!

  • @rutgervanbasten2159
    @rutgervanbasten2159 Рік тому

    really nice job! thank you

  • @zahrashekarchi6139
    @zahrashekarchi6139 Рік тому

    Thanks a lot for this video!
    Just one thing I would like to find out here is where we store the result of our learning? like some policy or parameter to be updated?

  • @josemuarnapoleon
    @josemuarnapoleon 3 роки тому

    Nice explanation!

  • @warreninganji7881
    @warreninganji7881 4 роки тому

    crystal clear explanation worth a subscription for more👌

  • @velocfudarks8488
    @velocfudarks8488 3 роки тому

    Thanks a lot! Really good representation!

  • @georgiak7877
    @georgiak7877 Рік тому

    This is amazing !

  • @abdulrahmankerim2377
    @abdulrahmankerim2377 2 роки тому +1

    Thanks!

  • @뇌공학박박사
    @뇌공학박박사 Рік тому

    Best example ever!!!

  • @TheMuser
    @TheMuser Рік тому

    Nicely explained!

  • @dr.kingschultz
    @dr.kingschultz 2 роки тому

    You are very good! Please explore more this topic. Also include the code and explain it

  • @bobo0612
    @bobo0612 3 роки тому +4

    Hi! Thank you for your video. I have a question at 6:28. Why the roh is not simply 3000 - 2396?

    • @senyksia
      @senyksia 3 роки тому +1

      2396 was the happiness for that specific case, where restaurant #2 was chosen to exploit. 330 is the (approximate) average regret for every case.
      So 3000 - 2396 would be correct if you were only talking about that unique case.

    • @myoobies
      @myoobies 3 роки тому

      @@senyksia Hey, what do you mean by average regret for every case? I'm still having trouble wrapping my head around this step. Thanks!

    • @madmax2442
      @madmax2442 3 роки тому

      @Bolin WU I know it's 8 months already but I wanted to know whether you got the answer or not. I also have the same doubt.

  • @avadheshkumar1488
    @avadheshkumar1488 3 роки тому

    excellent explanation!!! thanks

  • @rifatamanna7895
    @rifatamanna7895 4 роки тому +1

    It was awesome technique
    👍👍 thanks

  • @sbn0671
    @sbn0671 10 місяців тому

    Well explained!

  • @seowmingwei9426
    @seowmingwei9426 3 роки тому

    Well explained! Thank you!

  • @vijayjayaraman5990
    @vijayjayaraman5990 5 місяців тому

    Very helpful. How is the regret 300 in the second case? Shouldn't it be 3000 - 2396 = 604?

  • @Trucmuch
    @Trucmuch 4 роки тому +3

    Slot machines were not called bandit but one-arm bandit (they "stole" your money and the bulky box with one lever on its side kind of looked like a one-arm man.
    So the name of this problem is kind of a pun, a slot machine with more than one levers you can pull (here three) is a multi-armed bandit. ;-)

    • @ritvikmath
      @ritvikmath  4 роки тому

      Wow I did not know that, thanks !!

  • @snehotoshbanerjee1938
    @snehotoshbanerjee1938 7 місяців тому

    Best explanation!!

  • @alirezasamadi5804
    @alirezasamadi5804 2 роки тому

    You explained so good

  • @hypebeastuchiha9229
    @hypebeastuchiha9229 2 роки тому

    My exam is in 2 days and I'm so close to graduating with the highest grades.
    Thanks for your help!

  • @TheMuser
    @TheMuser Рік тому

    I have explored and finally decided that I am going to exploit you!
    *Subscribed*

  • @shahulrahman2516
    @shahulrahman2516 5 місяців тому

    Great video

  • @davidkopfer3259
    @davidkopfer3259 4 роки тому

    Very nice explanation, thanks!

  • @annahuo6694
    @annahuo6694 3 роки тому +1

    Great videos ! Thanks for your clarification. It's much clearer for me now. But I just wonder how you calculate the 330 regret in the case of exploitation only ?

    • @ritvikmath
      @ritvikmath  3 роки тому +1

      Good question. You can get that number by considering all possible cases of visiting each restaurant on the first three days. Something like, consider the probability that of the first three days of visits, what is the probability that restaurant 1 is best, vs. probability restaurant 2 is best, etc. You can do this via pencil and paper but I'd recommend writing a simple computer simulation instead.

    • @annahuo6694
      @annahuo6694 3 роки тому

      @@ritvikmath Thank you for this prompt response. I think I get the idea from the epsilon greedy formula (option number 3 in the example). Thank you a lot, your video is really helpful :)

  • @bassamry
    @bassamry Рік тому

    very clear and simple explaination!

  • @jams0101
    @jams0101 3 роки тому

    awesome video ! thanks so much

  • @tariqrashid6748
    @tariqrashid6748 3 роки тому

    Great explanation

  • @wenzhang5879
    @wenzhang5879 Рік тому

    Could you explain the difference between the MAB problem and the ranking and selection problem? Thanks

  • @yitongchen75
    @yitongchen75 4 роки тому +1

    Cool explanation. Can you also talk about Upper Confidence Bound Algorithm relating to this?

    • @ritvikmath
      @ritvikmath  4 роки тому +1

      Good timing! I have a video scheduled about UCB for Multi-Armed Bandit. It will come out in about a week :)

  • @PhilipKirkbride
    @PhilipKirkbride 4 роки тому

    Related to regret, we never really know the true distributions (since we can only infer from taking samples). Would you basically just use your estimated distributions at the end of the 300 days as the basis for calculating regret?

  • @yannelfersi3510
    @yannelfersi3510 7 місяців тому

    can you share the calculation for the regret in case of exploitation only?

  • @debashishbhattacharjee1112
    @debashishbhattacharjee1112 Рік тому

    Hello Ritvik
    This was a very helpful video. You have explained a concept so simply. Hope you continue making such informative videos.
    Best wishes.

  • @quanghoang3801
    @quanghoang3801 Рік тому

    Thanks! I really wish the RLBook authors could explain the k-armed bandit problem as clearly as you do, their writing is really confusing.

  • @victorkreitton2268
    @victorkreitton2268 2 роки тому

    What ML books do you recommend or use?

  • @TheFobJang
    @TheFobJang Рік тому

    Would you say exploit only strategy is the same as the eplore-then-commit strategy (also know as explore-then-exploit)?

  • @thinkanime1
    @thinkanime1 Рік тому

    Really good video

  • @sau002
    @sau002 Рік тому

    Brilliant

  • @sunIess
    @sunIess 4 роки тому +7

    Assuming a finite horizon (known beforehand), aren't you (in expectation) better off doing all the exploration before starting to exploit?

    • @ritvikmath
      @ritvikmath  4 роки тому +3

      You've just made a very good point. One strategy I did not note is an epsilon-greedy strategy where the probability of explore in the beginning is very high and then it goes to 0 over time. This would likely be a good idea.

  • @calinobrocea7502
    @calinobrocea7502 2 роки тому

    Hello, thank you for the awesome explanation, it really helped me a lot. But I want to ask you one additional question on this topic. Do you know some method of tuning the epsilon parameter? I tried searching on google, but I did not find anything helpful. Thank you!

  • @manabsaha5336
    @manabsaha5336 3 роки тому

    Sir, video on softmax approach.

  • @shantanurouth6383
    @shantanurouth6383 3 роки тому

    I could not understand how it turned out to be 330, could you explain please?

  • @sampadmohanty8573
    @sampadmohanty8573 4 роки тому

    I knew everything from the start. Ate at the same place for 299 days and got pretty bored. So watched youtube and found this video. Now I am stuck at this same restaurant on the 300th day to minimize my regret. Such a paradox. Just kidding. Amazing explanation and example.

  • @Phil-oy2mr
    @Phil-oy2mr 4 роки тому

    In the exploit only case, would there be a way to compute the regret mathematically without a simulation?

    • @softerseltzer
      @softerseltzer 4 роки тому +1

      You could calculate the probability of picking one restaurant over the others and then sum over the expected rewards weighed with the aforementioned probabilities. So for example if one of the restaurants is clearly much better, you will most likely pick it in the initial one-shot exploration phase so it's probability will be close to 1.
      The probability of picking one restaurant over another could perhaps be derived using cumulative distribution functions of the initial reward distributions. One could imagine a simple example with discrete instead of continuous distributions. Say with any restaurant having only three options: a certain probability for a bad meal (reward 1), a mediocre meal (reward 2) and a good meal (reward 3).