Predicting Horse Race Winners Using Advanced Statistical Methods

Поділитися
Вставка
  • Опубліковано 6 тра 2024
  • Conditional Logistic Regression with Frailty applied to predicting horse race winners in Hong Kong.
    www.helios.ai
    Since first proposed by Bill Benter in 1994, the Conditional Logistic Regression has been an extremely popular tool for estimating the probability of horses winning a race.
    I propose a new prediction process that is composed of two innovations to the common CLR model and a unique goal for parameter tuning . First, I modify the likelihood function to include a "frailty" parameter borrowed from epidemiological use of the Cox Proportional Hazards model. Secondly, I use a LASSO penalty on the likelihood, where profit is the target to be maximized. (As opposed to the much more common goal of maximizing likelihood.)
    Finally, I implemented a Cyclical Coordinate Descent algorithm to fit the model in high-speed parallelized code that runs on a Graphics Processing Unit (GPU), allowing me to rapidly test many tuning parameter settings.
    Historical data from 3681 races in Hong Kong were collected and a 10-fold cross validation was used to find the optimal outcome. Simulated betting on a hold out set of 20% of races yielded a return on investment of 36.73%.
  • Наука та технологія

КОМЕНТАРІ • 58

  • @tylergramling424
    @tylergramling424 4 роки тому +1

    this was way before its time. Thanks for the great upload!!

  • @1minnows
    @1minnows 5 років тому +6

    This is all nice, but tell me which horse is going to win the first race at Aqueduct tomorrow.

  • @vwazp
    @vwazp 6 років тому +1

    Mr. Silverman, thanks for your talk. I'm wondering if you can give any suggestions on possible sources to turn to for a novice without a statistical background and wants to bet on horses using statistically proven methods. Thank you.

  • @thegoodbetdotcom3069
    @thegoodbetdotcom3069 5 років тому +7

    Five years old but still an interesting talk. I thought about adding some thoughts on a few points but I realise there are so many different ways to predict races that my thoughts probably won't make a difference. Bill Benter's story (and his associates) is pretty far out considering the technology of the time when he was doing his thing with horse racing. Maybe he still is betting along with his academic pursuits, I don't know. Personally I bet on horses every day using my own spreadsheet formulas but the latest thing is with the help of a data science guy we are developing a deep supervised-learning ann using the parameters I know work best for the data sets. It's working somewhat but time will tell how well that goes. As for ROI, accuracy of prediction and staking methods, I believe that when someone is getting tangible results over a period of time they probably won't be telling others how it's done. Even if they do I remember something Bill Benter said which really resonated with me and that was many people don't want to roll up their sleeves and do the hard work. The countless hours I have put into writing formulas or making data sets, lol, I don't even want to think about it. Anyway, it's all fun :)

  • @mattwilsn
    @mattwilsn 5 років тому +1

    Noah,
    Got two questions for you;
    1) Does using conditional probability give you any advantage over just using the probability and ranking the horses grouped by race?
    2) Are you able to provide any detail on the features that you've used?
    I'm looking at doing something similar for my MSc dissertation.
    thanks,
    Matt

  • @gabrielbejenaru2549
    @gabrielbejenaru2549 4 роки тому +1

    The problem we , all the gamblers face in the end is: how race is going to develop knowing that the betting companies know prior to the start 1.) The amount of bets placed on a particular horse and 2.) The value of this bets..
    Basically every thing can be controlled (manipulated) for the benefit of betting companies, otherwise this companies will cease to exist. If I misspelled something then please Pardon my French, but English is not my first speaking language

  • @rdomer2010
    @rdomer2010 7 років тому

    Thanks for a great overview of your modeling. Are you using any open source libraries to do your conditional logistic regression and the LASSO optimization? Did you write this in C++ for the MAC? Thanks for any information you can provide on your algorithms.

    • @NoahSilverman
      @NoahSilverman  7 років тому +2

      This project was all custom code. Some in R and some in C++

  • @shanwu2739
    @shanwu2739 5 років тому

    Super work , I am a Chinese and it's very interest in Hong Kong racing research. How I can learn that .and using your data for it

  • @SergeantKeel
    @SergeantKeel 7 років тому +2

    Hi Noah, thanks for the great talk. I was wondering how you came up with 186 variables?! And how many of these did LASSO manage to get rid of?

    • @NoahSilverman
      @NoahSilverman  7 років тому +3

      Thanks James,
      The 186 variables were deduced from reading a ton of literature on the subject, speaking to experts, and a lot of trial and error. The LASSO I used was an L2, so some variables were pushed to small number, but none to 0

    • @TrueSaintly
      @TrueSaintly 7 років тому +1

      You can break down handicapping factors in a variety of ways.
      Average prizemoney won by jockey, horse's success from outside barrier, jockey/trainer strike rate for the last 12 months. A lot can be made redundant but one of the more successful high profile horse players Alan Woods used something like 130+ factors.

  • @Ricatellez682
    @Ricatellez682 4 роки тому

    I need more information about econométrica method and betting Sports, please

  • @robertspence8638
    @robertspence8638 7 років тому +3

    Hey Noah, excellent talk. How did you get the .3 to .4 correlation between the odds and rank outcomes? Is that a number that you computed or something that comes from the academic literature? if you could provide a reference I'd be very grateful. Thanks.

    • @NoahSilverman
      @NoahSilverman  7 років тому +2

      Empirical correlation from dataset. If you want a formal "academic literature" reference, see my paper published on the topic.

  • @vishwajithkp1418
    @vishwajithkp1418 8 років тому

    Dear Dr. Noah Silverman.!!
    Thanks for uploading such a informative video, for my knowledge it is little hard to understand. I want to know how the parameters for Benter correction in Harville formula can be obtained.
    Thanks in advance.

    • @NoahSilverman
      @NoahSilverman  8 років тому +1

      +VISHU JITH You,'ll have to find that one on your own.

    • @vishwajithkp1418
      @vishwajithkp1418 8 років тому

      +Noah Silverman Thanks for your immediate response. Sorry, that was a typological mistake. I mean how the parameters for Benter correction can be obtained.

  • @chevalierdeloccident5949
    @chevalierdeloccident5949 6 років тому +1

    Judging from the video description the betting public underestimates the winning chance of a horse in 2 out of 10 races in Hong Kong, enough to overcome the track takeout over the long term. Is that correct? What betting strategy was simulated? Flat betting the bare minimum or a fixed proportion of the bankroll? This is important to know because horse racing typically doesn't encourage the implementation of a Kelly Strategy with a large bankroll relative to the size of the parimutuel pool.

    • @NoahSilverman
      @NoahSilverman  6 років тому +2

      For that academic study, I used a fairly standard Kelly strategy. In "real life", it would be something more complex to manage risk

  • @damien2198
    @damien2198 8 років тому

    Thank you so much
    Since then, have you played with LSTM or Conv on this project or similar ? any better results ?

    • @NoahSilverman
      @NoahSilverman  8 років тому

      I have not. The challenge with any ANN is setting up the conditional probability (the probabilities for horses in a race must sum to 1.0)

  • @vishwajithkp1418
    @vishwajithkp1418 8 років тому

    Dear Dr.N Silverman can you please help me to find the parameters for benter correction in harvile formula. How to get maximum likelihood estimator on a sample of past data.

  • @michaelbarson9898
    @michaelbarson9898 9 років тому

    Hello Noah, I have been doing similar things with a Benter style two step regularised conditional regression on Australian races and have read your dissertation thoroughly. My question is, using the frailty/strength term from the odds has a effect similar to using the Kelly criterion? You are weighting horses that your model favours more than the public (odds) with a greater final probability? Are you then placing a uniform bet across all races? Wouldn't that be the same as finding a win probability that is un-weighted by the odds and using a Kelly bet to modify your stake to maximize your winnings?

    • @NoahSilverman
      @NoahSilverman  9 років тому

      ***** The two are not mutually exclusive. You can use weights in training AND Kelly for betting. They're separate things.

    • @michaelbarson9898
      @michaelbarson9898 9 років тому

      Noah Silverman Thanks for replying. I suppose you can do both, and I guess they both do a similar thing. Interesting to see how a Kelly strategy works for your already weighted system, could be more robust due to the regression but also more non-linear as similar information is being used twice. Thanks again!

  • @jbeaz11
    @jbeaz11 8 років тому +1

    Noah Silverman, how can i get a copy of your study and use it to apply to U.S. horse racing.

    • @NoahSilverman
      @NoahSilverman  8 років тому +1

      +Joe Beasley Data Science Ltd offers consulting services for the gaming markets.

    • @jbeaz11
      @jbeaz11 8 років тому +1

      +Noah Silverman what's their website address?

    • @NoahSilverman
      @NoahSilverman  8 років тому

      +Joe Beasley www.datascience.io

    • @NoahSilverman
      @NoahSilverman  7 років тому +1

      New website: www.helios.ai

  • @tonzafundetsme
    @tonzafundetsme 8 років тому

    Noah, was the quoted ROI calculated off closing prices?

    • @NoahSilverman
      @NoahSilverman  8 років тому

      Daniel Wishart I don't actually remember. This talk was from several years ago, and things have advanced significantly beyond the work presented.

  • @joshcolbert5613
    @joshcolbert5613 4 роки тому

    Is this only optimal at Hong Kong could this be used a Fonner Park in Nebraska?

  • @dennismontoro7312
    @dennismontoro7312 7 років тому

    are you saying you would combine the public's implied odds (strength) with your coefficients? you're using public odds as a coefficient?

    • @NoahSilverman
      @NoahSilverman  7 років тому

      Lot of racing models use the public odds as *one* of several factors. There is information in there.

    • @NoahSilverman
      @NoahSilverman  7 років тому

      And, to clarify: We have "factors" in the model, and then use machine learning techniques to estimate the coefficients (weights of the factors). So, the public odds is a "factor" not a coefficient

    • @dennismontoro7312
      @dennismontoro7312 7 років тому

      so this differs from benter slightly as he suggested running a second logit model with combined public estimate and your fundamental estimate?

    • @NoahSilverman
      @NoahSilverman  7 років тому

      There are many ways to do this.

    • @dennismontoro7312
      @dennismontoro7312 7 років тому

      Noah Silverman last question, do you think your model's rsquare outperforming public model rsquare is a good indicator of potential success (along with OOS testing for ROI)?

  • @bodylove2009ab
    @bodylove2009ab 4 роки тому

    by the way, benter had hired journalists so they could get him some insider info.

  • @samiab6077
    @samiab6077 4 роки тому

    at 4:05 if I remember my 8th-grade math correctly does ∝ mean that there is a constant in the formula or am I an idiot?

  • @acwchangs
    @acwchangs 7 років тому +3

    What will happen if a quantum computer give you a optimized result in fraction of second, and ruined the whole industry?

    • @NoahSilverman
      @NoahSilverman  7 років тому +7

      Nice fantasy, but things don't work that way. Just because a machine is "quantum" doesn't mean it has infinite insight into any phenomenon in the world.

    • @acwchangs
      @acwchangs 7 років тому +2

      but if you have a model, then all the way out is get a optimized answer, which i think the quantum machine D-wave in google can do the rest of answer, isn't it?

  • @pwnycny
    @pwnycny 6 років тому +3

    Unless someone has inside info about a race, there is no reliable way of predicting the outcome of a thoroughbred horse race. There are too many variables, not the least of which is the horse itself, whose temperament and condition at post time is known only to the horse, and the horse is keeping that a secret. The fact that even the most successful jockeys win only a small fraction of their races is proof that, presuming that the races are legitimate, the outcome is not a sure bet. Recently, in a maiden claiming race, the 75 to 1 longshot won by two lengths while the 6 to 5 favorite came in eighth. Predicting races is entertaining, but don't expect the horses to cooperate. They have other concerns that have nothing to do with money.

    • @NoahSilverman
      @NoahSilverman  6 років тому +11

      I respectfully disagree (of course)

    • @MikeKleinsteuber
      @MikeKleinsteuber 5 років тому +2

      Tell Bill Benter that there's no reliable way of predicting the outcome of a horse race lol

    • @3DComputing
      @3DComputing 5 років тому +2

      "Predicting races is entertaining, but don't expect the horses to cooperate. They have other concerns that have nothing to do with money." LOL FOFL My coffee nearly came out of my nose. GOOD ONE

    • @lklim3914
      @lklim3914 5 років тому

      You would have to use Kelly and the law of large numbers to mitigate uncertainty and bad luck. Is that what you would do Noah?

    • @wesley621375
      @wesley621375 5 років тому

      The reason why people can win money in Hong Kong field is that the pool is a pari-mutual pool with many punters without intelligent that there are rooms of different between the probability and odds

  • @Crispytastyduck
    @Crispytastyduck 5 років тому

    Soo.... Have you made your billions yet?