@KC.06 To test the accuracy of a model you've got to either backtest the data or test it going forward. It's important to test the data the model hasn't been "trained" on, as otherwise the results will be bias. Hope that helps lad.
Thank you for this video. I used your model along with ELO and the implied probability from the opening line to handicap UFC 298. Your model worked pretty well. Your model was 10/12, ELO was 10/12 and the opening line was 11/12. I thought results of your model might be improved by using the most recent fights but that wasn't the case. Using fight data from the last two years the model correctly predicted 9/12 and using the last three fights the model correctly predicted 10/12 again. Thank you again.
Over the long term, I had difficulty making this model profitable in UFC betting. The MMA judging rules changed in 2017 (1 Jan 2017) with the new unified MMA rules. The data set that you used includes 1993-2021. I suspect the results of the model would improve if you used only data from 2017 and later. Rerunning the regression, is on my list of things to do. When I get to it, I'll post the results here. Thanks again.
Haha thanks for the support lad 🙏 That was the aim when building this model, but I haven’t yet found a good website to pull this data all at once into Excel. 😞
@@excel_ladz Lad, I checked the Kaggle dataset files but I wasn't able to find the same exact stats that you've shown right at the beginning of the video. Did you change the name of the columns or something?
Hi lad, the square root function is to calculate a fighter's 'true expected striking accuracy'. For example, Fighter Red has a 60% accuracy and their opponent has a defence rate of 50%. The SQRT function determines that Fighter 1's accuracy decreases to 54.7%, as if Fighter Red lands 60%, Fighter Blue defends 50%, then a midpoint has to be found. This is done for both fighters, so each fighter has a 'true expected striking accuracy'. The difference is then subtracted, just as is done for the other stats, to see the advantage the Red Fighter has over the Blue Fighter 👍
72% seems very impressive, but I’m a little confused where you got the data from. You said it was from the ufc stats website, which is what I’m using for my model as well, however you don’t say where exactly. If you’re using data from the ufc fighter’s page, this is updated after every fight so if the fight already happened you are using data from the future relative to the fight you’re trying to predict. While 72% is pretty good, I’d argue we can do a bit better if we allow ourselves to use data from the future.
For sure 🔥 As long as the predictors are relevant (a good rule of thumb is having the initial p values for each predictor, after running regression, to be below 0.05) then you absolutely do that 👍 I suppose you could also manipulate events that could have multiple events into two: e.g. under/over 220 points 😃
Great video. But where did you get the statistics at the beginning like SLpM and Str Acc. The data set on Kaggle does not contain those numbers. Did you somehow extract it from somewhere else
Hi lad, thanks for watching 🔥 I selected the latest 1,500 rows from the ‘data.csv’ dataset. All I grabbed from this dataset was the two fighters and the winner. I then grabbed the Red and Blue Fighter’s 8 different UFC Stats from the ‘raw_fighter_details.csv’, and added them into the table. I hope that helps lad 👍
Very impressive, I’ll watch a few more times. Question though, is the model basically making a prediction off of the fighter with better stats basically? Would be interesting to find a way to quantitate out-ring edges ; such as strength of gym, previous strength of schedule, and here lately new dads have been locking it down I’m curious if finding a way to quantify things like mentioned above would yield a tighter prediction Very cool stuff and super cool of you to upload all of it and let us learn with you!
Love the video, thank you! 1. For column C, how did you determine who was going to win? 2. If I want to add more variables to correlate, can I just add them, so long as there is a differential to the stat? (i.e. KO rate diff., SUB rate diff, Height Diff and reach Diff)? 3. How can I add a weighted variable to this, such as age?
This is great stuff. Thanks for sharing! Could you explain why zeroing out the stats affords the red fighter a few percentage points? Does the calculation of actual data (values > 0) remove this seeming bias? Additionally, when swapping the red fighters stats to the blue fighter (and vice versa), the percentages don't equate to the same values. This seems problematic. Is there an explanation for this, or a solution?
Also I really think taking stats before say 2010 is almost a misdirection. Judging, fight rules and basically everything was totally different in modern mma-so I wonder if it could skew the data
If you had the data for when the fight ended such as the round and how it ended Sub, KO/TKO, or Dec could you use this same process to determine what round and by how the fight would be decided based on the previous data?
Great videos, i was just wondering what type of math you applied in the nba prediction videos, i saw you use poisson for fb and logistic regression for ufc. Thanks
Hi lad, in the NBA Model I used the binomial distribution to simulate a player’s points. Specifically, the BINOM.INV function to simulate a player’s expected shots, their distribution of shots (ie the number of threes, twos and fts) and finally the number of shots made out of those attempted 🏀
Hi lad, that's right if the average odds are below $1.39. However there's a good article by Pinnacle Sports saying that the bookie favourite in UFC Fights wins 66% of the time, so any model with an accuracy above this is considered good 👍
G'day lads, if you have a question let me know here 🔥
Need a spreadsheet built? Visit: www.excelladz.com
How did you get SLPM of every fighter it is not in the dataset provided
i can see slpm for individual fighters but not R = however many and B = however many
Is It possible for excel to say how accurate every prediction is
@KC.06 To test the accuracy of a model you've got to either backtest the data or test it going forward. It's important to test the data the model hasn't been "trained" on, as otherwise the results will be bias. Hope that helps lad.
Thank you for this video. I used your model along with ELO and the implied probability from the opening line to handicap UFC 298. Your model worked pretty well. Your model was 10/12, ELO was 10/12 and the opening line was 11/12. I thought results of your model might be improved by using the most recent fights but that wasn't the case. Using fight data from the last two years the model correctly predicted 9/12 and using the last three fights the model correctly predicted 10/12 again. Thank you again.
Excellent video as always. Thank you!
It’s just so based on matchups but I reckon if you combine the model with your own eye test it could be very profitable
Over the long term, I had difficulty making this model profitable in UFC betting. The MMA judging rules changed in 2017 (1 Jan 2017) with the new unified MMA rules. The data set that you used includes 1993-2021. I suspect the results of the model would improve if you used only data from 2017 and later. Rerunning the regression, is on my list of things to do. When I get to it, I'll post the results here. Thanks again.
I agree or at least since Nov 17 2000.... You could run both models and see which is best at predicting 2021 results.
This is really good. Thank you.
love this video I always wanted something like this !!!!
Lad, updating the fighters stats automatically would be a nice improvement to the model (power query maybe). Anyway, you rock!
Haha thanks for the support lad 🙏 That was the aim when building this model, but I haven’t yet found a good website to pull this data all at once into Excel. 😞
@@excel_ladz Yeah, I just noticed that the Kaggle dataset contains stats from 1993 to 2021, so my previous comment doesn't make sense.
@@excel_ladz Lad, I checked the Kaggle dataset files but I wasn't able to find the same exact stats that you've shown right at the beginning of the video. Did you change the name of the columns or something?
Any reason (or references) for why you need to subtract square root functions when you're calculating differences in accuracy? (E.g. 4:39)
Hi lad, the square root function is to calculate a fighter's 'true expected striking accuracy'. For example, Fighter Red has a 60% accuracy and their opponent has a defence rate of 50%. The SQRT function determines that Fighter 1's accuracy decreases to 54.7%, as if Fighter Red lands 60%, Fighter Blue defends 50%, then a midpoint has to be found. This is done for both fighters, so each fighter has a 'true expected striking accuracy'. The difference is then subtracted, just as is done for the other stats, to see the advantage the Red Fighter has over the Blue Fighter 👍
Looks good again. Can you also make a system like this for darts that would be cool.
My best wishes for the new year ;)
72% seems very impressive, but I’m a little confused where you got the data from. You said it was from the ufc stats website, which is what I’m using for my model as well, however you don’t say where exactly. If you’re using data from the ufc fighter’s page, this is updated after every fight so if the fight already happened you are using data from the future relative to the fight you’re trying to predict. While 72% is pretty good, I’d argue we can do a bit better if we allow ourselves to use data from the future.
You're on fire! Great video! Can Logistic regression be used for other binary outcome sports, say basketball for example?
For sure 🔥 As long as the predictors are relevant (a good rule of thumb is having the initial p values for each predictor, after running regression, to be below 0.05) then you absolutely do that 👍 I suppose you could also manipulate events that could have multiple events into two: e.g. under/over 220 points 😃
@@excel_ladz Great, thanks Lad.
Great video. But where did you get the statistics at the beginning like SLpM and Str Acc. The data set on Kaggle does not contain those numbers. Did you somehow extract it from somewhere else
Hi lad, thanks for watching 🔥 I selected the latest 1,500 rows from the ‘data.csv’ dataset. All I grabbed from this dataset was the two fighters and the winner. I then grabbed the Red and Blue Fighter’s 8 different UFC Stats from the ‘raw_fighter_details.csv’, and added them into the table. I hope that helps lad 👍
how did you know which one was red and which one was blue in that dataset they dont differentiate?@@excel_ladz
did you figure it out?
Very impressive, I’ll watch a few more times.
Question though, is the model basically making a prediction off of the fighter with better stats basically?
Would be interesting to find a way to quantitate out-ring edges ; such as strength of gym, previous strength of schedule, and here lately new dads have been locking it down
I’m curious if finding a way to quantify things like mentioned above would yield a tighter prediction
Very cool stuff and super cool of you to upload all of it and let us learn with you!
Love the video, thank you!
1. For column C, how did you determine who was going to win?
2. If I want to add more variables to correlate, can I just add them, so long as there is a differential to the stat? (i.e. KO rate diff., SUB rate diff, Height Diff and reach Diff)?
3. How can I add a weighted variable to this, such as age?
Maybe even something like wins and losses against what type of fighting style etc
I was wondering the same thing
Great movie! Could you please show us how to create a model for tennis matches?
This is great stuff. Thanks for sharing! Could you explain why zeroing out the stats affords the red fighter a few percentage points? Does the calculation of actual data (values > 0) remove this seeming bias? Additionally, when swapping the red fighters stats to the blue fighter (and vice versa), the percentages don't equate to the same values. This seems problematic. Is there an explanation for this, or a solution?
Also I really think taking stats before say 2010 is almost a misdirection. Judging, fight rules and basically everything was totally different in modern mma-so I wonder if it could skew the data
Thanks for the vid. Can you explain why calcuating the win probability with the EXP() function?
That thumbnail 🥵
Does the dataset need to be updated with more recent fights as time goes on or is it just plug and play and time proof please ?
If you had the data for when the fight ended such as the round and how it ended Sub, KO/TKO, or Dec could you use this same process to determine what round and by how the fight would be decided based on the previous data?
Is this model available for download? 👀
Yep! See the first link in the description 🔥
I wonder if you can do one for greyhound races, UK & AUS?
Great videos, i was just wondering what type of math you applied in the nba prediction videos, i saw you use poisson for fb and logistic regression for ufc. Thanks
Hi lad, in the NBA Model I used the binomial distribution to simulate a player’s points. Specifically, the BINOM.INV function to simulate a player’s expected shots, their distribution of shots (ie the number of threes, twos and fts) and finally the number of shots made out of those attempted 🏀
@@excel_ladz Thanks lad
would this account for intangibles like power or speed or finishes before the 5th round or is it just stat for stat
Do you have this in googlesheets form? that way the data is live and auto updates?
how did you copy all the data into excel quickly from the csv
For football and soccer please
72% doesn't mean anything if the odds are worse than -300...
Hi lad, that's right if the average odds are below $1.39. However there's a good article by Pinnacle Sports saying that the bookie favourite in UFC Fights wins 66% of the time, so any model with an accuracy above this is considered good 👍