Hyperparameter Optimization - The Math of Intelligence #7

Поділитися
Вставка
  • Опубліковано 27 лип 2017
  • Hyperparameters are the magic numbers of machine learning. We're going to learn how to find them in a more intelligent way than just trial-and-error. We'll go over grid search, random search, and Bayesian Optimization. I'll also cover the difference between Bayesian and Frequentist probability.
    Code for this video: github.com/llSourcell/hyperpa...
    Noah's Winning Code:
    github.com/NoahLidell/math-of...
    Hammad's Runner-up Code:
    github.com/hammadshaikhha/Mat...
    More learning resources:
    www.iro.umontreal.ca/~bengioy...
    thuijskens.github.io/2016/12/...
    jmhessel.github.io/Bayesian-O...
    arimo.com/data-science/2016/b...
    dhnzl.files.wordpress.com/201...
    blog.revolutionanalytics.com/2...
    • Video
    nlpers.blogspot.nl/2014/10/hy...
    neupy.com/2016/12/17/hyperpara...
    Join us in the Wizards Slack channel:
    wizards.herokuapp.com/
    And please support me on Patreon:
    www.patreon.com/user?u=3191693
    Thanks to Veritasium (bayesian animation) & Angela Schoellig (drone clip)
    Follow me:
    Twitter: / sirajraval
    Facebook: / sirajology Instagram: / sirajraval Instagram: / sirajraval
    Signup for my newsletter for exciting updates in the field of AI:
    goo.gl/FZzJ5w
    Hit the Join button above to sign up to become a member of my channel for access to exclusive content! Join my AI community: chatgptschool.io/ Sign up for my AI Sports betting Bot, WagerGPT! (500 spots available):
    www.wagergpt.co

КОМЕНТАРІ • 138

  • @JulianHarris
    @JulianHarris 5 років тому +5

    Holy crap I can say with confidence this is the funniest introduction to hyperparameter optimisation there will ever be. Ever. Genius work. You don't call any more, but that's ok. Live your live, enjoy it! Be free! Be yourself!

  • @DaredevilGotU
    @DaredevilGotU 5 років тому +1

    Every time I visit the page, I learn a new technique. Thanks Siraj

  • @surajthapa4160
    @surajthapa4160 3 роки тому +4

    You are one of the rare educators who can make smile their viewers in between learning which makes learning flowless. I believe without any stop I can watch your 1 hr long content too. Thanks for making learning easy and funny.

  • @dexterdev
    @dexterdev 6 років тому +3

    Thanks for the effort you are taking for these videos. I respect it. :)

  • @tumul1474
    @tumul1474 5 років тому +7

    dude u making learning so awesome !! great work

  • @phil.s
    @phil.s 6 років тому

    Thank you for this video, i am currently testing Scikit Optimize to optimize the network i am currently working on.
    It supports Bayesian optimization and is simple to implement where as hyperas likes to give errors.

  • @randompast
    @randompast 5 років тому

    Good explanation, illuminated a few things for me, thank you.

  • @elsmith1237
    @elsmith1237 6 років тому +1

    This is amazing! Thanks for the video :)

  • @Ninja-iq2xt
    @Ninja-iq2xt 6 років тому

    Loved this pace, atleast it makes us understand whats going on, as compared to the previous vidoes, which are quantity over quality.

  • @peretz7
    @peretz7 6 років тому +19

    Your geeky / cringey jokes are the best! Don't stop. Seriously.

  • @rajatmaheshwari916
    @rajatmaheshwari916 6 років тому +59

    Et u Brute Force... I laughed so hard at this point.

  • @Skythedragon
    @Skythedragon 6 років тому +51

    Yo dang, I heard you like optimizers, so I made an optimizer to optimize your optimizer

  • @_jiwi2674
    @_jiwi2674 5 років тому +1

    Hi Siraj, thanks tons for the video! I am unsure of what you meant by utility of the expectation of function f. You said it tells us which region of domain of f are best to sample from, but I can't quite follow what you mean by that. Would highly appreciate some help with this!

  • @irenesforeheadforlyve5820
    @irenesforeheadforlyve5820 5 років тому

    Is it possible for me to optimize the neurons inside convolution layer for image classification?

  • @justinwhite2725
    @justinwhite2725 2 роки тому

    Here I was thinking early in the video 'would a Monte Carlo approach work?' when you got into talking about exploration /exploitation I think it might.
    This higher level math you are doing here I don't get (or maybe I need someone else to explain it) but Monte Carlo is something I've used before and I think it might be good enough.
    You could seed in a set of likely values and let it add new ones when it heads to an upper or lower bound. The nice thing about Monte Carlo is that it would explore possibilities as the model matures and switch over to something if it winds up performing better.
    This obviously works better for integer parameters than for gradual values.

  • @deepak3303
    @deepak3303 6 років тому

    For a classification model, how to optimize hyper parameters using CAP curve analysis?

  • @vipulsonar4561
    @vipulsonar4561 5 років тому

    Cool..!!
    Can you plz tell me some best algorithm which can be used for video summarisation....!!

  • @adityashinde6202
    @adityashinde6202 6 років тому

    How about using evolutionary algorithms to search for optimum values of hyperparameters? I'm not sure how well it works in comparison to bayesian optimization though.

  • @manmoffatt8497
    @manmoffatt8497 4 роки тому

    Love the energy Siraj

  • @ketanpandey2561
    @ketanpandey2561 5 років тому +1

    What about the genetic algorithm? Can they be used to optimize hyperparameters? For example using TPOT libaray.

  • @deniscandido4116
    @deniscandido4116 6 років тому

    Is this already implemented on some library like sklearn or keras? I never read about this before and looks very promising

  • @keturananny6559
    @keturananny6559 3 роки тому

    I enjoyed this so much!

  • @s3wannabesaliha238
    @s3wannabesaliha238 4 роки тому

    Pretty cool video...good job Siraj.. thankyou...

  • @masteronepiece6559
    @masteronepiece6559 6 років тому

    Great video .
    Thanks .

  • @larryteslaspacexboringlawr739
    @larryteslaspacexboringlawr739 6 років тому

    thank you for hyperparameters video

  • @manwa5192
    @manwa5192 6 років тому

    What are you doing in Amsterdam brother? You work there now?

  • @kaikewesleyreis
    @kaikewesleyreis 6 років тому

    There's going to be a video about Feedforward neural net?

  • @antonylawler3423
    @antonylawler3423 6 років тому

    I've only ever seen the Kernel Trick glossed over. I'd love it if you could find an opportunity to spend a few minutes on it.

  • @powerrabbit
    @powerrabbit 6 років тому

    This is the coolest channel on UA-cam!

  • @gbengaomotara2102
    @gbengaomotara2102 5 років тому

    Any schemes for initializing the likelihoods?

  • @jasneetsingh4018
    @jasneetsingh4018 6 років тому

    Docker tutorial please...muchh needed!!

  • @zihanqiao2850
    @zihanqiao2850 4 роки тому

    Love your videos.

  • @benbenjamin5
    @benbenjamin5 6 років тому

    Hey man, I've got a sorta unrelated question.. Have you heard of useaible and what do you think, I've heard some pretty crazy stuff but I can't really find much on it.. Is it legit? Anyway thanks, great video as always.

  • @abhilashjoy
    @abhilashjoy 10 місяців тому

    Give this man a raise!

  • @RaymondWong
    @RaymondWong 6 років тому

    For tuning hyperparameter, how does bayesian optimization compares to PSO? Any risk of overfitting when tuning the hyperparameters?

    • @lucasnildaimon7598
      @lucasnildaimon7598 5 років тому

      To answer the second question, yes, overfitting still is an open problem in Hyperparameter Optimization. You can find some information about some adopted methods that try to avoid this in Section 1.6.4 of this book: www.automl.org/wp-content/uploads/2019/05/AutoML_Book_Chapter1.pdf

  • @gunjannaik7575
    @gunjannaik7575 6 років тому

    How can we use this to predict new parameters?

  • @Nola1222Piano
    @Nola1222Piano 6 років тому +3

    Want to make a neural network that converts fiction books to moviescripts. And then based on the character descriptions in the book find tge best actor in a db. And based on the information in the book find good filming locations. Im very new to AI and dont know anything. Is this possible with AI? Should I train on 3 different datasets and how? And what NN should I use to do all of that at the same time?

    • @SirajRaval
      @SirajRaval  6 років тому +1

      would be great, use IMDB dataset

  • @vijaykoravi7583
    @vijaykoravi7583 6 років тому

    hey siraj can you tell us about replika..

  • @thoughtsofapeer
    @thoughtsofapeer 6 років тому

    Hi Siraj, I was thinking if you would like to make a video for all of us new CS-students out there on "Good to know basics".
    I am from Denmark, so we dont have quite the same educational system. I am coming from the equivalent to high school and have just been accepted to the Danish University of Technology where I will study Software technology. This is a bachelor which I will get in three years, then continuing with a to-year candidate/masters. I have no prior knowledge on programming or discrete math what-so-ever 😱
    ty
    Edit: I will be starting September 5th :D

  • @rahulahoop1
    @rahulahoop1 3 роки тому +2

    thank you for making data science entertaining for reals, would you be able run some more examples with the concepts as you explain them in future videos?:)

  • @FinanceLogic
    @FinanceLogic Рік тому

    Bayes is not as random as it seems you think around 5 minutes in. But I did learn a lot here. thanks.

  • @planktonfun1
    @planktonfun1 6 років тому

    frequentist, bayesian their result are almost same for the first 20% of result data, but bayesian also includes uncertainty so there's that.

  • @xumeixi382
    @xumeixi382 6 років тому

    Great video! really makes me laugh

  • @luck3949
    @luck3949 6 років тому +11

    Can we train a neural network to optimize hyperparameters?

    • @SirajRaval
      @SirajRaval  6 років тому +5

      ive never read a paper thats done that, but totally possible! All functions are neural networks if you stare at them long enough, you should definitely try it out

    • @freediugh416
      @freediugh416 6 років тому +1

      -----------> . (tactical dot in case OP wants to share results)

    • @AkashSwamyBazinga
      @AkashSwamyBazinga 6 років тому

      Yes, i guess. Try using GPyOpt which is basically a black-box function optimization library written in python.

    • @deeplearningpartnership
      @deeplearningpartnership 6 років тому

      Yes.

    • @hfkssadfrew
      @hfkssadfrew 5 років тому

      If you have hundreds of hyper parameter, this would be better than GP, but usually we don’t.

  • @swamysriman7147
    @swamysriman7147 3 роки тому

    So.....Gradient Descent is a special case of Bayesian Optimization ri8?

  • @solidsnake013579
    @solidsnake013579 5 років тому +4

    i was drinking my tea when i heard biggie and 2pac. jesus almost spitted my tea out

  • @phillipotey9736
    @phillipotey9736 3 роки тому

    Just figured something out with nodes. Length amount of nodes is cleverness, height of nodes is smartness.

  • @karankatiyar5414
    @karankatiyar5414 6 років тому

    how do we do it in tensorflow ?

  • @quebono100
    @quebono100 6 років тому

    Why should be the TF/IDF a better strategy instead of Bag-of-Words? I Think it depend on the application.

  • @floopybits8037
    @floopybits8037 2 роки тому

    Nice explanation

  • @qunchongqa
    @qunchongqa Місяць тому

    interesting and useful

  • @hammadshaikhha
    @hammadshaikhha 6 років тому

    I am looking for clarification on the homework this week because I think I have gotten confused between bayesian regression and bayesian optimization for finding hyper parameters. Is it correct to say that in a linear regression the hyper parameter is the gradient descent learning rate, and not the slope coefficients. So we first use bayesian optimization to find a good learning rate, and then run gradient descent to estimate the coefficient parameters? If this is true, I imagine we still want to minimize the sum of square errors?
    Someone let me know if I am on the right track, thx.

    • @SirajRaval
      @SirajRaval  6 років тому

      Hey Hammad! Great question. You can choose to do either, both are really cool ideas. Example of Bayesian regression: github.com/tdomhan/pyblr & for bayesian optimization for linear regression, what you said is correct, its used to first find the optimal learning rate, while gradient descent estimates the coefficient parameters.

    • @hammadshaikhha
      @hammadshaikhha 6 років тому

      Thanks for the clarification Siraj. I am going to do the Bayesian linear regression notebook, hopefully someone else does the Bayesian optimization to find gradient descent parameter.

    • @laidbackmedia
      @laidbackmedia 6 місяців тому

      Bias routines involve illusions
      Diverge or continue

  • @deepak3303
    @deepak3303 6 років тому

    why not use a binary search algorithm to eliminate the half of the possible hyper prameter rather than brute force?

  • @user-ro4mi2td1p
    @user-ro4mi2td1p 6 років тому

    Cool skunk, thank you

  • @FinanceLogic
    @FinanceLogic Рік тому

    It just clicked how a random forest really works less than 1 minute into this video. i feel sick because the world is so interesting.

  • @williamchamberlain2263
    @williamchamberlain2263 5 років тому

    Isn't the Kernel Trick that you don't really transform the data points at all? You just use a similarity function that is equivalent to the inner product calculation that _would_ happen after transforming to a high-dimension space with some kernel: the Kernel Trick is that there is no kernel.

  • @shreysharma7806
    @shreysharma7806 5 років тому

    From where Bayesian Optimization get the initial value of C and gamma?

    • @lucasnildaimon7598
      @lucasnildaimon7598 5 років тому

      It's a prior belief, so it means that you or the person coding should assume their initial values.

  • @heri_prieto
    @heri_prieto 6 років тому

    Siraj, where can I go to get the latest in deep learning publications so I can then replicate the results?? Thank you! You are the shit!

  • @GuillaumeVerdonA
    @GuillaumeVerdonA 6 років тому

    HAHAHA that "mmm look at that Gaussian" meme has a pic from a McGill prof I knew

  • @rawiasammout5833
    @rawiasammout5833 6 років тому

    please need to understand svm and pso

  • @alvincay100
    @alvincay100 6 років тому +8

    Just when you think you've heard every pronunciation of Gaussian possible...

    • @SirajRaval
      @SirajRaval  6 років тому

      haha always something new

  • @chicken6180
    @chicken6180 6 років тому

    💯

  • @tommyeastman2999
    @tommyeastman2999 6 років тому

    that song is a jam

  • @chengjunli4518
    @chengjunli4518 6 років тому +1

    can i get the subtitle ,thanks

  • @colox97
    @colox97 6 років тому

    quantum computers may be very very useful tor this kind of task, they'd parallelize the entire process and allow REAL BIG data to be handled buch better.
    isn't this a P problem?
    am i right?

  • @normannborg
    @normannborg 3 роки тому

    came here for Hyperparameters Optimization, found SVM explaination

  • @edouarddelaire1939
    @edouarddelaire1939 6 років тому

    I've already seen people using genetics algorithms in order to find Hyperparameters. but i thinks that's not very efficient :/

  • @blindfoldchess7762
    @blindfoldchess7762 5 років тому +1

    Yey

  • @Egop3105
    @Egop3105 6 років тому +3

    For a tutorial on how to install and use Spearmint (an awesome Bayesian Optimization library by Jasper Snoek) check out this link: bitbucket.org/uhasseltmachinelearning/spearmint

  • @AIwithAniket
    @AIwithAniket 2 роки тому

    Man have to make new videos 🙌. Don't lose hope

  • @chasegraham246
    @chasegraham246 6 років тому +3

    1:35 Getting kind of edgy, Siraj.

  • @matthewdaly8879
    @matthewdaly8879 6 років тому +4

    What about gradient descent?

    • @eloyeligon6676
      @eloyeligon6676 6 років тому

      The problem is how do you calculate the gradient

  • @JosephQPham
    @JosephQPham 6 років тому

    humor and intelligence

  • @CrazyGamerSidh
    @CrazyGamerSidh 6 років тому +1

    Background 🙃

  • @michaelvarney.
    @michaelvarney. 6 років тому

    Gauss, as in louse, not Gauss, as in boss.

  • @shuvendubikash3792
    @shuvendubikash3792 6 років тому +6

    You have a problem. Your videos have no learning sequence. I don't understand where to start and where to go

    • @diogojvc
      @diogojvc 6 років тому +6

      You have to train your biological neural network to learn new things based on past experiences. xD
      The best way to start is to ... start. I mean start by building the most basic thing and then as you watch new videos you start to mess with some new things, at least is what i have been doing. That being said the most important videos to start the most basic thing are the "math of intelligence" videos. Hope it helps and good luck ;)

    • @hammadshaikhha
      @hammadshaikhha 6 років тому +2

      I felt the same way when I first found this channel and was watching random videos in no order. Currently you are watching the 7th video in this series, have you watched 1-6 already? He does have a sequence, and its becoming better and getting connected together over time. If you go to his channel and look at play lists, he has 1) Python for Data Science, 2) Math of Intelligence, I think these would be the starting points.

    • @shuvendubikash3792
      @shuvendubikash3792 6 років тому

      @Akujin yes it does.
      But what you mean by "math of intelligence"? . This playlist or anything else

    • @diogojvc
      @diogojvc 6 років тому +1

      Yes, i meant the playlist.

    • @SirajRaval
      @SirajRaval  6 років тому +1

      what ran domness said

  • @jmoz
    @jmoz 4 роки тому

    Equally interesting and ridiculous.

  • @killordie2412
    @killordie2412 6 років тому

    Can you share your collection of memes please?

    • @chicken6180
      @chicken6180 6 років тому

      No you see his memes change over time he doesnt even find memes anymore he has software to crawl the web and predict which memes siraj will most enjoy

    • @SirajRaval
      @SirajRaval  6 років тому

      what spark said

    • @Ur.Podcast_Buddy
      @Ur.Podcast_Buddy 3 роки тому

      @@SirajRaval waiting for new video

  • @Privacy-LOST
    @Privacy-LOST 4 роки тому +1

    If you think Siraj is exciting, have a look at this awesome dude on the same topic :
    ua-cam.com/video/con_ONbhD2I/v-deo.html 😂

  • @titoadesanya9369
    @titoadesanya9369 Рік тому

    lmaooo et tu brute force

  • @MrPanthershah
    @MrPanthershah 3 роки тому

    I somehow find all that animation distracting to get the point across. Mehhh

  • @yb801
    @yb801 6 років тому

    I suck

  • @nickmcneely5601
    @nickmcneely5601 6 років тому +11

    G-owwww-sian, not G-awwwww-sian.

    • @nickmcneely5601
      @nickmcneely5601 6 років тому

      Igotattitude93 That's what I said.

    • @nickmcneely5601
      @nickmcneely5601 6 років тому

      Igotattitude93 I know plenty of Americans who say it correctly. Same for Euler. Shit, even Nietzsche.

    • @_sudipidus_
      @_sudipidus_ 6 років тому +3

      Pronunciation depends on your hyperparameter selection :P

    • @SirajRaval
      @SirajRaval  6 років тому

      thank you

  • @exec9292
    @exec9292 2 роки тому +1

    who did u copy to make this video lol

  • @ismaelgoldsteck5974
    @ismaelgoldsteck5974 6 років тому

    Boi I'm early

  • @averageengineeer
    @averageengineeer 6 років тому

    Headache !!! :(

    • @SirajRaval
      @SirajRaval  6 років тому

      please clarify, what specifically gave you a headache? thanks

  • @AkarshSundareswar
    @AkarshSundareswar 6 років тому +5

    First :p

    • @SirajRaval
      @SirajRaval  6 років тому +2

      congrats

    • @AkarshSundareswar
      @AkarshSundareswar 6 років тому +7

      I want to thank my parents, teachers, brother, sister and my dog for this great opportunity. Without them, this would not be possible.

  • @vaptua4109
    @vaptua4109 6 років тому

    Second

  • @PCCoooler
    @PCCoooler 3 роки тому

    I can't tell you how much I hate this guy, but this is the only video that explains what I want to know :'(