A Mathematician's Guide to the World Cup

Поділитися
Вставка
  • Опубліковано 16 лис 2022
  • In 2010 Paul the Octopus 'correctly' predicted results in the 2010 World Cup. However, these days the experts are the analysts who trawl through the reams of data about players and teams. And where there is data there is mathematics. And, particularly, mathematical models.
    Joshua Bull is a mathematical modeller. He was also the winner of the 2020 Fantasy Football competition from over eight million entrants. So when it came to the Oxford Mathematics 2022 World Cup predictor, Josh fitted the bill perfectly. Honing in on the data, applying his modelling skills, and adding a pinch of the assumptions that inform modelling (disclaimer: he is an Ipswich Town fan), Josh has come up with the answers - or rather, likely outcomes. See what you think.
    PS: some people have commented that Josh has got the last 16 wrong because those combinations cannot come out of the groups. However, he explicitly says in the video that this is an overall prediction not a specific one. For a specific one and more forecasts and thoughts, please go to our social media pages via this link: www.maths.ox.ac.uk/ for links).

КОМЕНТАРІ • 848

  • @leonardogoes2031
    @leonardogoes2031 Рік тому +1870

    I think the most efficient way to test this is using the exact model for last world cups and see if it works

    • @manaharchowdhury2402
      @manaharchowdhury2402 Рік тому +35

      I think they have done it.

    • @w.s1097
      @w.s1097 Рік тому +12

      @@manaharchowdhury2402 and what was the results from those past cups.

    • @upendra8050
      @upendra8050 Рік тому +19

      According to them the focus is data since 2018 of all international matches as well as xG

    • @Grimeyhoob
      @Grimeyhoob Рік тому +119

      Yeah that’s a backtest of the model.
      Thing is though - just because an outcome is most likely doesn’t necessarily mean that it will eventuate. So you really need to backtest it over a long enough period of time for Central Limit Theorem to apply on the normally distributed noise he spoke of.

    • @psychic8872
      @psychic8872 Рік тому +10

      You should also avoid overfitting

  • @pswire1117
    @pswire1117 Рік тому +461

    Great video, my highly scientific analysis is similar- it's called "The Gabriel Martinelli factor". It works like this- you analyze the team and figure out whether it has Gabriel Martinelli on the squad roster. If it does, it means you're going to win the World Cup.

    • @tomasbrazy1887
      @tomasbrazy1887 Рік тому +29

      Arsenal fans been down bad so long this is how you fanboy your good players 😂

    • @suhasguddeti2375
      @suhasguddeti2375 Рік тому +8

      Arsenal winning the World Cup 👏 replace Martinelli with Depay

    • @advancewarstournamentseries
      @advancewarstournamentseries Рік тому

      Martinelli não fez nada no primeiro jogo haha, sinto muito

    • @ceticamente
      @ceticamente Рік тому +4

      @@advancewarstournamentseries teve menos de 10 min em campo e ainda fez algumas boas jogadas... o q vc ta falando?

    • @michaelthe1713
      @michaelthe1713 Рік тому

      Stupidity model

  • @mohamedelhady190
    @mohamedelhady190 Рік тому +820

    There is a HUGE mistake
    Each group should provide 2 teams to the round of 16
    In your model Argentina group only one team
    Brazil group 3 teams
    This means you have to redo the round of 16th and beyond
    But really
    Great work

    • @norf8
      @norf8 Рік тому +11

      What do you mean? Argentina and Mexico + Brazil and Switzerland.
      *Edit* - I think there is an updated table floating on linkedin where there is belgium and brazil in finals

    • @OxfordMathematics
      @OxfordMathematics  Рік тому +144

      @@norf8 Yes, Josh makes the same point towards the end of the video - this is just a general guide. You can find his final model here: twitter.com/UniofOxford/status/1593564445715881984

    • @8YvY
      @8YvY Рік тому +20

      @@OxfordMathematics Man where is Poland or Mexico here??? you mised that?

    • @l1mbo69
      @l1mbo69 Рік тому +2

      @@8YvY Mexico is there on the right.. poland doesn't advance from group stage

    • @funhasnoname
      @funhasnoname Рік тому +9

      Yes and there is another issue some teams from the same group are facing in stages as early as quarter final while they should be on opposite sides of the table -- for instance France and Danemark

  • @fajarwp8148
    @fajarwp8148 Рік тому +20

    I really appreciate the way you built assumptions and improved them when something unlikely showed up.
    The model i believe can be more deterministic

    • @Anoyzify
      @Anoyzify Рік тому

      Deterministic is not subject to gradation. It either is or isn’t. 🤓
      Cheers,
      S. Cooper

  • @mr.d5050
    @mr.d5050 Рік тому +97

    The next step is finding teams with best betting payoff to prediction and layer several bets to maximize expected payout. Cool stuff. Thanks for posting.

    • @bengoacher4455
      @bengoacher4455 Рік тому +14

      Mathematician: Look at all this data and predictions we can draw from it, isn't it pretty.
      The internet: Cool lets use it to make bets

    • @uwotm8
      @uwotm8 Рік тому +3

      @@bengoacher4455 might aswell make some money from it. Otherwise, what's the point 🤷‍♂️

  • @PedroHenrique-mb1hh
    @PedroHenrique-mb1hh Рік тому +6

    Thats why i love youtube. Just random videos like this make my day. Great work by the way, fantastic analysis

  • @ger128
    @ger128 Рік тому +3

    This is a really insightful and entertaining video -- a rare combination when it comes to explaining probabilistic models!

  • @davidclark9406
    @davidclark9406 Рік тому +160

    First of all xG-(xG allowed) would have been more useful than straight xG in capturing defensive capabilities of teams. But frankly, international team xG is just not a large or uniform enough (given significant split between friendlies and competitive matches) data set for this to work as discussed. The overvaluing of Belgium and undervaluing of England are two examples of this
    using current player market value would almost definitely be more predictive than weighting straight xG
    That said, it’s a fun video and teaches the iterative process of this kind of modeling very well, so A+ for maths communication
    but if anyone is here looking for a betting edge, lol I guarantee the odds makers have thought about this more and modeled it more thoroughly, so maybe don’t

    • @jpa_fasty3997
      @jpa_fasty3997 Рік тому +21

      Agree with your diagnosis but don't agree with player market value being a more valuable measure. Firstly, English players are priced higher on average at equal skill level due to the EPL homegrown rule amongst other things. Secondly, value says nothing about how a team will perform together on a pitch, hence why a midfield of Scholes Beckham Gerrard Lampard never won anything. Thirdly, value says nothing about a fella wearing a waistcoat putting all the best attacking players on the bench in favour of 12 defenders. 😄

    • @smooth_operator65
      @smooth_operator65 Рік тому +1

      @@jpa_fasty3997 rather than market value I would suggest something like the median wage of a 16 player rotation or so 🤔

    • @frederikbrandt424
      @frederikbrandt424 Рік тому +1

      @@smooth_operator65 Doesn’t work either.. Because English players are also severely overpaid for the same reasons as JPA mention.

    • @smooth_operator65
      @smooth_operator65 Рік тому

      @@frederikbrandt424 yes sure, then again this would be easier to controll for than in case of market value

    • @danielrowson3379
      @danielrowson3379 Рік тому +9

      Use them all! xG, player value, scores, away/home, weather etc. And throw it into a nice regression model (maybe xgboost) and validate out of sample and out of time.
      N.b. It still won’t beat the bookies.

  • @GeorgeZoto
    @GeorgeZoto Рік тому +46

    Thank you for putting this together and for sharing it, great intuitive explanations and graphs :)

  • @erick.tokuda7299
    @erick.tokuda7299 Рік тому +36

    Mathematical modelling in a clear and fun way! Very good

  • @mothra3477
    @mothra3477 Рік тому +4

    14:45 that chart aged well

  • @daianaflor3025
    @daianaflor3025 Рік тому +3

    Joshua Bull sabe mucho de investigaciones matemáticas pero cero conocimiento de fútbol.Saludos desde ARGENTINA

  • @tonywilde1709
    @tonywilde1709 Рік тому +4

    As a long-suffering Wolveerhampton Wanderers fan, the first few slides of your excellent presentation got me thinking about our recent 4-0 capitulation at home to Leicester City. According to the statistics I found online, the xG for that game was Wolves 1.62 - 0.99 Leicester, despite the real scoreline!
    So judging by the distribution model given @ 5:36 there appears to have been a 1-2% chance of Leicester scoring four goals that day, and indeed around 0.7% [35% x 2%] chance of the game ending 4-0. When your luck's out...!

  • @goldsteinresist
    @goldsteinresist Рік тому +1

    Great video and great job Joshua putting all this together!

  • @Pax_Veritas
    @Pax_Veritas Рік тому +1

    I really enjoyed your video despite being highly sceptical from the start. To get an accurate prediction there are several other factors that must be included and could realistically be modelled:
    1) The individual player ratings and their performance as a group (e.g. Liverpool had a terrible defence up until the signing of V. van Dijk. The Liverpool defence was improved again by Alisson, this made them into a league-winning team). Individual ratings can be crudely taken from transfermarkt via their market value, which is a reflection of both the team they play for and the league in which they play as well as their positions. The individual ratings could then be further modified by performance stats - goals, assists, key passes, chances created, tackles per 90, interceptions, clean sheets, etc.
    The key point is one player being added or removed can radically change the performance of a collective. Take the star player and give them a "star rating" (weighting) in each area of the pitch (GK, Def, Def Mid, Att Mid, Creator/Winger and Striker/goal scorer). Obvious examples in this area include Kane for England, Kane having scored proportionally more goals for England than any other player for any other major nation. Take away Kane and England's xG and win % is going to fall off a cliff, similar to how England somewhat collapsed when Rooney got injured in previous major tournaments. If Thiago Silva is taken out of Brazil, VvD taken out of Netherlands, Mbappe taken out of France, Messi from Argentina, KDB from Belgium, etc, these are going to create disproportionately large movements in their respective tournament chances. (see form below)
    2) The continuity and form of the players and the collective - e.g. Liverpool after van Dijk got injured and after his return became defensively brittle and conceded the first goal in 9 out of 11 premier league matches. If the star players are out of form or have been recently injured, these factors will weigh heavily on team performance. We have numerous examples from history but again Rooney and Ronaldo (Brazil) have had disproportionate influences on their team's chances of success. This may be particularly useful when measuring defensive prowess as defenders require continuity and a run of games in order to develop an understanding and water-tight defence. Changing either the GK or the key centre back can weaken defensive continuity/form significantly.
    3) The leadership and mentality of the players. This could be measured by how often they contribute to a team gaining points from a losing or drawing position and how often they lose points from a winning/drawing position. In this respect Liverpool, Man City and the Man Utd teams under Fergie could be used as benchmarks for the positive (coming from behind, scoring late goals). The Arsenal, Barcelona, Netherlands, Brazil and Argentina teams of various eras could be used as negative examples of a collective of excellent individual players who lack leadership, leading to them dropping points from winning positions, having a number of aberrant results (losing against weak teams) and performing particularly badly in matches against the strongest opposition and main rivals (e.g. Arsenal getting bullied by Chelsea or Barcelona losing to Real Madrid and rivals in the Champions League, Netherlands, Brazil and Argentina collapsing in World Cups)
    4) The ability of the manager/coaches. This could be measured in a number of ways including player development under their leadership, the consistency of their form, performance against weaker and stronger teams, points gained/dropped from losing/drawing/winning positions late in games, performance against rivals and performance in important matches. All of these criteria and more have been legislated for (often poorly) in games such as Championship/Football Manager. There are also ratings for managers from transfermarkt and fourfourtwo amongst others.
    5) A measure of the UEFA and FIFA coefficients is a must - it's virtually certain the winner of the World Cup will come from the top 11 currently ranked teams (Germany are historically low at 11th). In the last 70 years there have only been 7 WC winners and one or more of those 7 teams have made the final in every WC except for the notable exception of the Netherlands (never winners) and of course Croatia who unexpectedly made the final in 2018. If we are using the past to predict the future then there are only 8 possible WC winners - Brazil, Argentina, Germany, Spain, England, Italy, Netherlands and France. That immediately reduces your odds of a random choice winner from 1/32 to 1/8. The UEFA coefficients could be incorporated by assigning a player in each national team the coefficient of their team and league. This would apply even to Argentina and Brazil as the majority/ALL of their major players play in European competitions or did so in the past
    6) The odds given by the major bookmakers in each country - self explanatory but the bookies rarely get it wrong
    7) There are numerous other fine-tuning factors such as the performance of star players vs star players of other teams - e.g. does Messi have VvD on toast, does Fernando Torres perform well against Nemanja Vidic. If a star player has long-shooting prowess is this negated by a star GK? E.g. is KDB more of a danger when shooting against Pickford (a smaller GK) or against Neuer (a much larger GK)? Which systems temporally perform better against other systems? Does 4-4-2 perform better against 3-5-2, 4-5-1, 3-4-3, etc?
    8) New rules for this WC in particular the added time. There's never been so much added time in football matches in history. A game used to average around 94 minutes but in this WC they are averaging 106 minutes. This will highly favour teams with greater fitness as the majority of goals are scored late in games when players are tired and make mistakes. Fitness and the strength of the bench will be more important at this WC than any football tournament in history.
    9) The weather/climate is a major factor with certain teams better adapted to playing in hot/arid/humid/cold/wet conditions. The WC winner and finalist statistics heavily favour nations with similar climatic conditions to the host WC nation - i.e. when the WC is played in South America/North America or Asia, then South American teams win. When the WC is played in Europe then European teams win. There have only been two exceptions since 1930 when Germany won the WC in Brazil (2014) and when Brazil won the WC in Sweden (1958), although Brazil did have by far the best team including Pele, Garrincha and several other exceptional players in 1958
    Based on the above I would pick Spain based on what I've seen so far, although Brazil are favourites for a reason.

  • @ErtjonMecka
    @ErtjonMecka Рік тому +3

    Great job. It was very fun to watch.

  • @ThaGamingMisfit
    @ThaGamingMisfit Рік тому +4

    Loving this ! I think what this mostly proves is that England supporters tend to overestimate the chances of their team ! Be honest, they got the lucky side of the table in both last tournaments, that's why I think the model before you made the tweaks is actually closer to reality than the later ones. Same with my country Belgium btw and that proved to be correct as they went out in the group stage :D

  • @DiegoSaavedraDavila
    @DiegoSaavedraDavila Рік тому +5

    currently getting my butt kicked by an intro to probability class and you mentioning poission/normally distributed and seeing how mean/variance showed up in graphs was actually really refreshing and gave me some hope that it isn't just theory

    • @ConClasher3
      @ConClasher3 Рік тому

      Frrrr really cool to see it in practice even if this is just a quick simulation

    • @advancewarstournamentseries
      @advancewarstournamentseries Рік тому

      Well it's still just theory since you're only trying to predict the future, there's no garantee it will turn out like that, but yeah, the results will most likely be among those lines, so it has practical use in the end (though I would argue anyone that follows soccer a little bit could predict just as good, if not better, the results)

  • @mbogitechconpts
    @mbogitechconpts Рік тому +24

    Everyone and everything may now be working against this model including VAR Decisions and officiating officials. Injuries, red cards, weather and so many other unforseen factors may play big part too in who will eventually win the world Cup. I wonder of biases or noise was introduced to account for all these but we are keen to see the model performance. Great effort 👌 Joshua and team University of Oxford .

    • @Grimeyhoob
      @Grimeyhoob Рік тому +10

      Those noisy elements are assumed to follow a normal distribution, I.e. over a long enough sample of matches they average out to zero.
      That’s how the simulations account for it.

    • @mbogitechconpts
      @mbogitechconpts Рік тому

      @@Grimeyhoob Yeah kind of agree with you. Thanks 😊.

    • @randellberry6846
      @randellberry6846 Рік тому +1

      Spot on!

    • @mbogitechconpts
      @mbogitechconpts Рік тому

      @@randellberry6846 Hahahhaha

    • @googm
      @googm Рік тому +1

      bruv the offside calls in qatar v. ecuador today would have been called at maybe half the frequency in a normal match, in my opinion.

  • @gfr253
    @gfr253 Рік тому +16

    I appreciate your effort in adjusting parameters but this simulation is still biased as it didn't consider many other parameters such as the team combinations for the knock out stage up to the final, the dates of those matches, and many other parameters. From my prediction Portugal should get to the final and face one of Brazil, Netherlands or Argentina. My last pick for the final would still be Portugal given the fact they would have one more day to rest and considering they would have faced weaker opponents up to the final compared to those other 3 teams I mentioned above.

  • @tomassterbinsky7433
    @tomassterbinsky7433 Рік тому +2

    Awesome! Is there a chance to get hold to the model? I'm quite curious of its technical parts. Please let me know.

  • @theuzlivid
    @theuzlivid Рік тому +8

    This aged poorly

  • @luisfelipetrigo
    @luisfelipetrigo Рік тому

    Fascinating, thanks for sharing.

  • @somanyraquazas3167
    @somanyraquazas3167 Рік тому +5

    Im sure it would be hopelessly complicated, but for this world cup the form of individual players in the last few months is going to be very important so it would be cool to factor that in

  • @sigfreed11
    @sigfreed11 Рік тому +1

    Is it possible to look at a previous set of games as a fixed output and then put in bunch of data sets (possession, tackles in opponents 3rd, etc) and have the model vary their weighting to get the fixed outcome? That could be interesting to see how the model would define the significance of each stat and could provide a relatively accurate model?

  • @nazz2k8
    @nazz2k8 Рік тому +2

    Is it possible to incorporate a Bayesian Bradley-Terry model ?

  • @bobp1069
    @bobp1069 Рік тому

    Good start! We also now need to introduce a new term to factor the effects of VAR.

  • @NeoJackBauer
    @NeoJackBauer Рік тому +2

    Nice video. How about comparing how well this model agrees with results from previous tournaments

  • @joaopedrofernandes910
    @joaopedrofernandes910 Рік тому

    Amazing video!!!! Congrats

  • @brandonofviolet
    @brandonofviolet Рік тому

    The residuals are nice in the xG and Rating difference. Not a lot of outliers. Any chance the p-values that were calculated could be shared?

  • @tyewaichun
    @tyewaichun Рік тому +3

    You'll be re-iterating the model at the group stage ? Squad composition /injuries may affect your model too.

  • @cagemitchellgaming461
    @cagemitchellgaming461 Рік тому

    Where do you get this type of data? Is it straight xG data, or did you find data sets of shots and locations and whether or not they went in?

  • @feliperodriguesphd9007
    @feliperodriguesphd9007 Рік тому +9

    Hi Josh, great video! Simple yet coherent model for sure. As a Brazilian, I'll take those odds!
    Did you went so far as to test it against the previous world cups? I wonder how the actual results fare against the expected outcomes! Cheers, mate!

    • @OxfordMathematics
      @OxfordMathematics  Рік тому +5

      Josh's focus is data since 2018 of all international matches as well as xG.

    • @Grimeyhoob
      @Grimeyhoob Рік тому +1

      @@OxfordMathematics May be interesting to see if there’s any mean reverting behaviours over a longer time horizon with teams. E.g. how we see Brazil and Germany eventually come back on top after any fallow period. Some kind of behavioural element and pedigree.

    • @tonybryan5068
      @tonybryan5068 Рік тому

      Estava vindo perguntar isso

    • @advancewarstournamentseries
      @advancewarstournamentseries Рік тому

      @@Grimeyhoob Yeah, that could be really interesting, the cyclical behaviour on performance of each team

  • @Razvan199736
    @Razvan199736 Рік тому +6

    Saudi Arabia: hold my kebab

  • @scottbromley8299
    @scottbromley8299 Рік тому +1

    if you used your model on passed tournaments, how often were your correct?

  • @freefireusashop5795
    @freefireusashop5795 Рік тому +5

    Almost 50% prediction has become wrong of R16 😂😂😂.... Nothing to say for the rest

  • @dan_rad
    @dan_rad Рік тому +10

    I love that despite how complex this model is, when you look at your final model, intuition gets your the exact same outcome.

    • @lucasng4712
      @lucasng4712 Рік тому

      no

    • @australianpatriot
      @australianpatriot Рік тому

      argentina suck and england are the best

    • @JohnM-ch4to
      @JohnM-ch4to Рік тому +2

      Shows that our brain is more complex than any model and 'intuition' is one of the greatest calculations we make without noticing

    • @dan_rad
      @dan_rad Рік тому +1

      @@lucasng4712 I mean if you know football obviously. And by final model I mean the one they put on twitter.

    • @lucasng4712
      @lucasng4712 Рік тому

      @@JohnM-ch4to no

  • @rpx1979
    @rpx1979 Рік тому

    Great job. Well done!

  • @hisao1291
    @hisao1291 Рік тому

    What I’d like to see when the World Cup ends is how likely the actuel iteration of the tournament was to happen, if that makes sense. Maybe not so precise as to exact amount of goals scored in every match but in terms of which team won against which other team. I’m interested because there have been pretty unexpected results this World Cup !

  • @bennaceurmostefa502
    @bennaceurmostefa502 Рік тому

    I want to know if you are planning to open source the prediction model? Thanks

  • @kenmilne5133
    @kenmilne5133 Рік тому

    was r the primary thing you used to make the graphs from the data?

  • @lanzer22
    @lanzer22 Рік тому +6

    Great analysis and at least identified one of the two final contestants. It was surprising to see Brazil losing losing so early, but what can you do when penalty kicks define the outcome?

  • @szymonmarek1558
    @szymonmarek1558 Рік тому

    Are you going to update the forecast after group matches?

  • @sowki_tv
    @sowki_tv Рік тому +6

    Great explanation and analysis. I found an error in the final tournament outcome prediction, there are 3 teams from group G and only one team from group C advancing to the play-offs. Wonder if this would change a lot in the ladder

    • @juandanielcastanierrivas9545
      @juandanielcastanierrivas9545 Рік тому +3

      It’s because it’s not a prediction of matches, it’s a prediction of likelihood of getting to certain stage of the tournament. His model predicted 3 countries of a group having more chances of going through to the round of 16 than the second of the other group. He definitely should add that restriction. That results only tells us group G is more competitive than group C.

    • @OxfordMathematics
      @OxfordMathematics  Рік тому +1

      Josh does acknowledge this. A more precise prediction here:twitter.com/OxUniMaths/status/1593933134256553989

  • @trapdooroodpart
    @trapdooroodpart Рік тому

    loved this! thank you

  • @Abelard0
    @Abelard0 Рік тому +2

    I am guessing there is a similar method the Oakland's Athletic used on the movie Money Ball, which supposedly is based on real life. Anyhow, I think football results (or any other sports) can be predicted on a short term with a fairly high percentage, but for a long tournament like the world cup is almost impossible to predict a winner, especially when you have knock out games, which have a lot of different factors that influence on the score e.g. luck, a referee bad decision, a red card, fatigue, injury, player's emotions, etc. I think any of these can't be predicted. Anyway, thank you for this, I learned something new and I appreciate it.

    • @salimalkharsa6627
      @salimalkharsa6627 Рік тому

      Yeah what they did was basically heavily use the xG equivalent stat to build a team. Their idea was “Why spend $20M on 1 guy who has 1.5xG when you can get 3 players who have 1.5xG combined and cost $15M total and cover more positions” In baseball that stat is Slugging rate for batters and ERA for pitchers

  • @EB-zn4hs
    @EB-zn4hs Рік тому +20

    (20:13) You have three teams coming out of Group G (Brazil, Switzerland, and Serbia) and only one coming out of Group C (Argentina).

    • @PoeticPoker262
      @PoeticPoker262 Рік тому +2

      I'm 3 minutes into the video and can tell already how irrelevant and embarrassing the results were going to be. Thank you for showing me there's no point watching more.

    • @OxfordMathematics
      @OxfordMathematics  Рік тому +7

      Maybe stick with it and let it unfold. It is about how models need changing but by how much is the challenge. take care

    • @PoeticPoker262
      @PoeticPoker262 Рік тому +2

      Don't think FIFA are going to have 3 teams qualify from one group and only 1 out of another though...unless England finish 3rd 😂

    • @EB-zn4hs
      @EB-zn4hs Рік тому +1

      @@OxfordMathematics I get what you mean. Everything else in the knockout stage looks similar to what I predicted on my office pool.
      I was just pointing out that something went wrong in the group stage.

    • @OxfordMathematics
      @OxfordMathematics  Рік тому +3

      @@EB-zn4hs Yes. thanks and understood. We just don't want people reading the comments to think Josh is making a mistake. He explicitly says it is a general prediction later in the video. Enjoy the games.

  • @hbobenicio
    @hbobenicio Рік тому

    I'm brazilian and I liked your simulation very much :)
    I'm cheering for your simulation!

  • @thomascox6524
    @thomascox6524 Рік тому +1

    Hello, from a footballing point of view, it is interesting to see england so low down, despite having not terrible results in the 2018-2022 period. I feel this may be as they are a more defensive team, so won't score a high xG and "xG against" should be considered in the model - perhaps by plotting "xG for" team A vs "xG against" team B over the dataset and finding a correcting factor, similar to 14:13
    Fun video and good insight into how to develop models from the ground up!

    • @JohnnyMaverik
      @JohnnyMaverik Рік тому

      They have recently had some pretty terrible results, notably in the nations league where they got relegated.

  • @TheKazzerscout
    @TheKazzerscout Рік тому +1

    How does xG realistically take into account goalkeeper positioning and quality for each shot? Does it consider that? Or how other players block the goalkeepers vision, distract him etc.

  • @mrcaljoe1
    @mrcaljoe1 Рік тому

    how did you find the data on international mean xG over a period of time. I can only find tournament based xG

  • @dylan522p
    @dylan522p Рік тому

    Please update this throughout the tournament maybe once or twice maybe after group stage

  • @jamaldini3
    @jamaldini3 Рік тому +1

    i can now say that your major is not math, its art

  • @siaahmadi413
    @siaahmadi413 Рік тому

    This was such an excellent presentation. Finally, I understand what xG really means and how betting websites calculate their winning odds.

  • @takis4897
    @takis4897 Рік тому

    Great analysis, I loved it. I wanted to suggest that there is a multiplier that affects how much a match result should be taken into account. For example if it is a world cup game we take the actual xG fromt he match result, but if it is a Nations league game it will be xG times 0.8 and if it is a friendly game then maybe xG times 0.4, because teams tend to not play at their maximum effort when the game is not as important. In that way, if England had beaten Portugal in Nations League it will be shown in the final model that it was more important than if they beat Portugal again, in a friendly for example.

    • @OxfordMathematics
      @OxfordMathematics  Рік тому

      Yes, these are important issues and, we hope it makes people appear of how models, which want to hone in on the important things, have a lot to choose from. Watch our social media for updates and analysis after the round of 16.

  • @stevencooke6451
    @stevencooke6451 Рік тому

    Another thing missing is accounting for the fact that some teams have players who exceed the xG. A Kylian Mbappe will score more often from a given location on the field that might Jonathan David (of my country's team, Canada). Teams with better defenders and keepers will lower the success rate of shots that on average have a high xG.
    I'm assuming that stats exist that compares a player's shot success with the average from a given location relative to the goal.

  • @dougiesavage4483
    @dougiesavage4483 Рік тому +3

    I think adding an expected goals conceded aswell as xG would be better, some teams will be focused on not conceding instead of scoring. Very cool video tho dude.

  • @Loesoeman
    @Loesoeman Рік тому +2

    ‘Ill remain unbiased here’ While wearing an Ipswich shirt 😂😂

  • @MarcoPastorello
    @MarcoPastorello Рік тому +1

    Hi, I hope this is a problem with the flags in the picture and not with the model.... in the graphics at 19'38'' we have just one team of the group C (Argentina) advancing from the first phase and playing agains Switzerland - and we have three teams from group G (Brazil, Serbia and Switzerland) advancing...

    • @OxfordMathematics
      @OxfordMathematics  Рік тому +1

      Josh acknowledges this later in the video. This is not a specific game by game prediction. That is here:for the last 16: twitter.com/OxUniMaths/status/1593933134256553989

  • @ninja4x2mcoc22
    @ninja4x2mcoc22 Рік тому

    this was so entertaining, making me love maths

  • @clivedoyisi1898
    @clivedoyisi1898 Рік тому

    Which statistical packake are you using.

  • @physicspete6264
    @physicspete6264 Рік тому

    Holding up pretty well so far

  • @l.lawliet164
    @l.lawliet164 Рік тому

    Great idea to make more people interested on math, we love the world cup so we do who can predict the results.

  • @ckq
    @ckq Рік тому

    Nice, but if you follow 538, they saw that draws are ~10% more likely than expected by a poisson model.
    Also I'd consider a blend between xG and actual goals and do some testing on if the model is well calibrated.

  • @rajathurairatheesan9906
    @rajathurairatheesan9906 Рік тому

    Can we download the model document?

  • @benstallone6784
    @benstallone6784 Рік тому

    After all that brilliant analysis I mostly enjoyed the fudge FCH factor

  • @MrGA555
    @MrGA555 Рік тому

    question, is predicting 9/16 teams to the round of 16 a good result? Also, on of your finalist was already was already knocked out. Although, you have two quarter-final games predicted correctly.

  • @jackwakefield05
    @jackwakefield05 Рік тому

    including the conversion rates relative to xG for different countries would also improve the prediction - for example harry kane scores about 90% of penalties, which are 0.75 xG. but another player may score below 75% of the time

  • @szydlinho
    @szydlinho Рік тому +1

    Great job for explaining that in simple way. I found some inaccuracy in model. You predicted that from Group G three teams advanced to the next round. Brazil, Switzerland and Serbia. Just one too much. Except that it seems very likely :)

    • @OxfordMathematics
      @OxfordMathematics  Рік тому +1

      Josh acknowledges this. Fuller prediction: twitter.com/OxUniMaths/status/1593933134256553989

  • @moutazelias7246
    @moutazelias7246 Рік тому +3

    Your Model Has an obvious error. You have 3 teams that qualify from brazil group(serbia brazil and swiss) but only argentina from argentina group. This cannot happen as per the rules of the game.

    • @OxfordMathematics
      @OxfordMathematics  Рік тому

      Yes, Josh says later in the video that he is not being match specific in this video. Go to our social media for the precise prediction.

  • @trini8042
    @trini8042 Рік тому +1

    this video aged really well

  • @billa38000
    @billa38000 Рік тому +3

    Actually, this was a brillant presentation ! I dont see how the results could be further improved, appart from very complex calculation according to each players.
    Any chance you could share some of your data and code on github ?

  • @yihanwang2233
    @yihanwang2233 Рік тому

    Is there a detailed write-up for this model?

  • @sirprintalot
    @sirprintalot Рік тому +2

    20:04 England are playing Belgium in the Last 16 in your example, but Group A and Group B teams are set to play each other in the Last 16 (so England will play Netherlands, Senegal, Ecuador or Qatar). You might need to reconsider the structure of the knockout bracket as part of your prediction.

    • @IsidroChannel
      @IsidroChannel Рік тому

      He says that these don't represent match results but rather which placings the teams reach

    • @OxfordMathematics
      @OxfordMathematics  Рік тому

      Yes, Josh says later in the video that he is not being match specific in this video. Go to our social media for the precise prediction.

  • @aleksandrvashchuk1045
    @aleksandrvashchuk1045 Рік тому

    Very interesting! I bet the betting brokers use something like this.
    But few remarks:
    1. Group “Argentina - Mexico” - only 1 team qualified to round of 16 as per your model
    2. Group “Brazil - Cameron” - 3 teams qualified
    3. The match-ups of round of 16 do not match (e.g. Brazil should have played against Uruguay in round of 16 as per your model)

    • @OxfordMathematics
      @OxfordMathematics  Рік тому +1

      Josh says this in the video - this is the chance of each team going to each stage, not the specific group results. Take care

  • @seanmccloskey3816
    @seanmccloskey3816 Рік тому

    Questions: Why use ELO to adjust xG instead of xG to adjust ELO, and then use ELO as your primary predictive variable without Poisson sampling? Or use both in multivariate calculation? If you’re using xG to predict scores and outcomes and weighting more recent results, why not update xG during each round of the simulation? I would put a heavy weight on those results since they not only take tournament momentum into account, those results would be based on current rosters. Also, I think shots on goal is a better stat than xG. Lots of shots on goal lead to rebounds or continuations that result in goals.

  • @blitzer2062
    @blitzer2062 Рік тому +1

    If I understand correctly, for each game you model a stochastic XG (based on ELO ratings) and then stochastic goals scored, based on modelled XG. So you have two levels of stochastic-ness. Is there any analysis that this gives better predictions than simply estimating stochastic goals directly from ELO?

    • @seanmccloskey3816
      @seanmccloskey3816 Рік тому

      I had similar question. Why not use xG to adjust ELO and then use ELO as the primary independent variable? Or better yet, multivariate regression using both? xG seems quite flawed for many reasons but one not mentioned is that many goals in soccer are scored on rebounds. I don’t see how xG takes that into account. There are lots of other stats that could be useful: # of shots on goal, time of possession, # of corner kicks. Also why not have your model update xG during the tournament? This would not only account for momentum, but would help mitigate the problem of your dataset being largely built on outdated team rosters.

  • @joemoyes06
    @joemoyes06 Рік тому

    Hi Josh, thanks so much for this video! I really enjoyed it. I would be very grateful if you could shed some light on how the xG values were adjusted given the rating difference? I understand conceptually what is being achieved by doing this, but I'm struggling to visualise the computational steps taken from xG to adjusted xG.

    • @napier995
      @napier995 Рік тому

      He won't reply to you if you support Norwich.

    • @OxfordMathematics
      @OxfordMathematics  Рік тому +1

      Josh may put out his full model if he has time. Keep an eye on his social media: @JoshuaABull on Twitter

  • @djdjdnje4014
    @djdjdnje4014 Рік тому

    Could you try to make a model with xG + xGA, please? It could be interesting to watch if something will change.

  • @tonybryan5068
    @tonybryan5068 Рік тому

    Is there somewhere I can see the results and number of goals for all matches?

  • @sayednab
    @sayednab Рік тому

    Which program is it based on? Can you share the coding?

  • @eliasmantilla523
    @eliasmantilla523 Рік тому +1

    Hey! Where can I find the code?!

  • @AbuAmaanKhan
    @AbuAmaanKhan 6 місяців тому

    Can I have the slides of this presentation? I have a presentation coming up this week and my topic is the same. Please send me the ppt.

  • @Cleisthenes2
    @Cleisthenes2 Рік тому +2

    Great stuff. Can you say why you assume a team's games will be in something like a Poisson distribution rather than a normal distribution? Or is the Fish distribution just a version of the normal distribution that allows for discrete variables? (As you can tell I know very little about statistics).

    • @CoombesJD
      @CoombesJD Рік тому +3

      Source: Physics degree.
      "All models are wrong, but some are useful" -George Box.
      Theoretical answer:
      Poisson distributions are good for measuring "counting statistics" - it counts the number of events that happen in a time frame - Physics uses it to count photons entering detectors.
      I tried to write a better explanation from memory talking about combining Bernoulli Trials. Roughly, The model pretends "either a goal happens in this minute, or it doesnt" again and again, combining results after 90 minutes.
      The Poisson distribution is the limit of the Binomial distribution when you make sensible assumptions for this football problem.
      Poisson is the screwdriver, this problem looks like a screw. Watching the goal line and counting how many goals go in, this is the right tool for the job.
      'Mutual Information' on yt has a bunch of really good videos about how different distributions relate to help choose the right tool for the job.
      Experimental answer:
      Go to your Google account and open a collab notebook.
      Go to Google dataset search
      datasetsearch.research.google.com/
      and grab international goals scored per game.
      Take the international goals scored per game data and split it 80%/20%.
      For the 80% training set, plot the graph of goals scored per game in Python & matplotlib, fit the parameter for a poisson to the data, plot the graph using python's statsmodels library, and take a Kolmogorov-Smirnov test to measure 'goodness of fit'.
      We have the goodness of fit for the poisson distribution to 80% of our data.
      Now put this block of code inside a big for loop that measures goodness of with the other 50 distributions in the statsmodels library.
      Choose the distribution from the 50 with the highest goodness of fit, and then test this holds up on the remaining 20% data held back from earlier.
      I reckon if you actually did this, there would probably be models with better goodness of fit, but are less parsimonious for a 20 minute maths communication lecture on youtube :)

    • @Cleisthenes2
      @Cleisthenes2 Рік тому +1

      @@CoombesJD Thanks! Very helpful. I also just realized that what I said above was nonsense because obviously lots of discrete variables are normally distributed.

  • @mintberrycrunch6657
    @mintberrycrunch6657 Рік тому

    what a great mathematical tool the FCH factor is, incoming Fields medal predicted

  • @Gerald-iz7mv
    @Gerald-iz7mv Рік тому

    do you need to retrain the model after Argentina first game against saudi arabia? how does the prediction change?

  • @eduardoribeiro383
    @eduardoribeiro383 Рік тому +7

    Brazilian banker here. I just LOVED the concept of the FCH constant. But you have to adjust it to the FPAEOTT (Football prefers anything else other than tea), add the CITB (Cachaca is the BEST) and NWR (Neymar will Rock). I measured it. Funny is that my numbers ended in a 123,667% chance of yellow jersey winning.

    • @00vulture
      @00vulture Рік тому

      KKKKKKKKKK

    • @eduardoribeiro383
      @eduardoribeiro383 Рік тому +1

      @@00vulture Oppps The cachaça just took a Kale on the head. Congrats Croatia. This WC is killing any math. Just like football should be. Now I am for the Gin (The Dutch are the real inventors of Gin, for those who do not know)

  • @pollystyrene99
    @pollystyrene99 Рік тому

    Joshua, do you update this model with each match played?

  • @graphitegalore
    @graphitegalore Рік тому

    thanks for the vid, i feel like this model could be more robust by implementing more parameters instead of focusing on only xG

  • @ayushmishra7558
    @ayushmishra7558 Рік тому

    One more factor that should come into play is that matches in the Euros and the World Cup should be weighted more than friendlies because certain teams do better in those tournaments than in friendlies

  • @jarurotetippayachai8220
    @jarurotetippayachai8220 Рік тому +4

    This prediction cannot resist the curse of cats. 😂 Cats win.

  • @shahzad2
    @shahzad2 Рік тому

    Does Betting companies use same kind of models?

  • @bic7boi
    @bic7boi Рік тому

    Would it not be useful to also factor in a side's probability of conceding alongside their probability of scoring?

  • @ehriol1405
    @ehriol1405 Рік тому

    Quarterfinals, Netherlands vs Argentina, on the other side Brazil vs Spain. So according to your calculations. Semifinal Argentina - Brazil. This means that the other finalist must come from France and Belgium. But here the margins are fairer.

  • @rossmorebaz
    @rossmorebaz Рік тому +1

    Very interesting .. and huge respect to Joh for winning fantasy football .. no mean feat from 8 milliin players ..I think the model is largely accurate but we must also take some very important things into account ... It doesnt really matter what's happened in the last 4 years .. all that matters is the results next 4 weeks,. In a world cup momentum is key,.. and a good start is essential ... anything can happen on the day , and any of the top teams could win or lose on penalties ,.. and what about injuries to key players , red cards , VAR disallowed goals / offsides etc ..for example if Kevin De Bruyne gets injured then Belgium aren't getting to the final .. but absolutely fascinating nonetheless

    • @callumyoung7785
      @callumyoung7785 Рік тому +2

      But every team is just as likely to get a key injury/red card. Doesn't affect the odds too much.

    • @Grimeyhoob
      @Grimeyhoob Рік тому +2

      That’s a solid point: this comes back to how the model is very sensitive to the latest and most recent form.
      So the model can give very different outputs if it ingests results 2 matches down the line from 2 matches before.

    • @smenci
      @smenci Рік тому

      It depends on what you mean by momentum. Argentina was stronger with Maradona, less before and after. That is a real effect and is covered by the 4 years range and the time based weighing he's talking about.
      The other momentum some talk about, like winning streaks or NBA players with hot hands, do not exist. There are studies that show that these streaks just happen. Plain statistics explains them.

  • @tol-mol-ke-bol
    @tol-mol-ke-bol Рік тому +1

    @19:40 MINOR ERROR. Pause the frame at 19:40. Winners and Runners-up of Group A and B play each other in the 'Round of 16', but this simulation here does not follow that trend. For example, the simulation shows Ecuador playing Brazil in 'R of 16" (but Ecuador is in Group A; Brazil is in Group G; and so these two should not meet in 'R of 16'). The simulation takes Iran & England from Group-B, and makes them play with Denmark & Belgium (Group D) respectively in the 'R of 16'. So, essentially the program has minor error in how group-stage to knockout-stage happens, but this is critical since the computer-calculations are based on wrong matches. A minor fix, and we might see different result. who knows !! cheers

    • @OxfordMathematics
      @OxfordMathematics  Рік тому +1

      Josh acknowledges this later in the video. This is not a specific game by game prediction. That is here:for the last 16: twitter.com/OxUniMaths/status/1593933134256553989

    • @tol-mol-ke-bol
      @tol-mol-ke-bol Рік тому

      @@OxfordMathematics got it. all good then. cheers.

  • @luanbatista5128
    @luanbatista5128 Рік тому +2

    Wasn't expecting paul dano explaining to me how brazil is mathematically going to win the world cup

  • @jemhor
    @jemhor Рік тому

    Hi Joshua, thanks for the knowledge shared. It was really an eye opening one.
    I don't know if there's an error in the knockout stage that you modelled. Teams in the same group are not supposed to meet each other until the Final I see the image where you have some teams in the same group meeting each other in the Quarter Finals. Some groups have up to 3 teams representing in the knockout stages.
    Other than that, it was a fun video and that shows the power of mathematics used in real world representation

    • @OxfordMathematics
      @OxfordMathematics  Рік тому

      Yes. Josh acknowledges this in the video. Full last 16 here: twitter.com/OxUniMaths/status/1593933134256553989

  • @comradeozzbug
    @comradeozzbug Рік тому

    I wonder if there’s been a last minute recalculation after the Argentina Saudi Arabia result

  • @jairomejia616
    @jairomejia616 Рік тому +2

    Only seeing an expert explaining his decision making to create an AI model worth every single minute of it. Thanks for the fun explanation, I want to know why the xG gets a Poisson distribution, I did not quite understand why was not a normal distribution. Thanks!

    • @mullachv
      @mullachv Рік тому

      Expected number of goals is a positive integer, and so Poisson (rather than normal)

    • @fireworker8205
      @fireworker8205 Рік тому

      The normal distribution is a continuous probability distribution. It does not make much sense to predict that a team will score 3.17 goals I guess xD Draws would also be impossible in that way. But no idea why the Poisson distribution would be the most obvious discrete distribution here.