Three Statistical Tests Every Game Developer Should Know

Поділитися
Вставка
  • Опубліковано 20 сер 2024
  • In this 2016 GDC session, Insomniac Games' Elan Ruskin gives a how-to on statistics for answering questions like "does this new camera control scheme make players happier?", "how many players do I need to test this design change on to prove whether it works better?" and "does the framerate really get faster when I do this thing or is it just a fluke of measurement?
    Register for GDC: ubm.io/2gk5KTU
    Join the GDC mailing list: www.gdconf.com/...
    Follow GDC on Twitter: / official_gdc
    GDC talks cover a range of developmental topics including game design, programming, audio, visual arts, business management, production, online games, and much more. We post a fresh GDC video every day. Subscribe to the channel to stay on top of regular updates, and check out GDC Vault for thousands of more in-depth talks from our archives.

КОМЕНТАРІ • 211

  • @iestynne
    @iestynne 3 роки тому +335

    I'm proud to have worked with Elan for several years. As you can tell, he always puts a great deal of effort into preparing for his presentations. Amazingly though, this is actually his normal level of conversational speed, clarity and humor :)

    • @snarfymcsnarfface2323
      @snarfymcsnarfface2323 3 роки тому +3

      I thought he was just nervous or trying to fit in in a small time lol

    • @0netwoguy54
      @0netwoguy54 3 роки тому +18

      Wait what do you mean "normal"? Does he have a turbo mode???

    • @Dekharen
      @Dekharen 3 роки тому +12

      @@0netwoguy54 GAS GAS GAS

  • @_lime.
    @_lime. 3 роки тому +25

    13:00, this is a really good one. With Minecraft, Mojang came to a realization that very few players had ever been to the Nether (based on the percent of the population that had the achievement "We need to go deeper!" which is received upon entering the Nether). The ended up realizing that very few non-hardcore players (players that didn't consume game related content outside of the game, like videos, guides, articles, etc...) knew that the Nether existed.
    This is why the added obsidian monoliths and broken portals around the Overworld to give you hints.

    • @Sarmachus
      @Sarmachus 2 роки тому

      Where did they say this? I’m having a hard time finding it.

    • @_lime.
      @_lime. 2 роки тому +6

      @@Sarmachus Sorry I can't remember exactly. I saw it in a game development video a year or so ago, and I believe he flashed a tweet from one of the MC devs on the screen.
      Regardless of it's authenticity it still serves as a good example and valuable lesson.

    • @Sarmachus
      @Sarmachus 2 роки тому +2

      @@_lime. Thanks for clarifying

  • @ToriTheChicken
    @ToriTheChicken 6 років тому +324

    Some of the GDC talks are very badly presented for UA-cam videoes. Not this one. This was great, in just about every way.

  • @ReeseEifler
    @ReeseEifler 6 років тому +506

    Not only is this an amazingly useful talk, it's essentially a perfect presentation. Dope shit.

    • @smort123
      @smort123 3 роки тому +4

      @CruzZ fake news

    • @dontfk
      @dontfk 3 роки тому +3

      @CruzZ what are you talking about. This guy provided a ton of real world examples where statistics could help solve a problem. That doesn’t mean people will always use statistics for good though, he even mentions that in the presentation with an example. Just because big gaming companies suck at stats doesn’t mean his presentation wasn’t phenomenal!

    • @ailurusfulgens1849
      @ailurusfulgens1849 3 роки тому +4

      @@dontfk Big gaming companies most definitely do not suck at stats, if anything, that's the one thing they master above all else. It's just that most statistics are not relevant to the players enjoyement. They are very relevant to shareholders tho.

    • @dontfk
      @dontfk 3 роки тому +2

      @@ailurusfulgens1849 You're right, I used poor word choice there. What I meant by that was that they don't always use their stats for good intentions

  • @infcaat
    @infcaat 5 років тому +217

    wow, he is a fantastic speaker. charismatic, to-the-point, funny and practical.

  • @PrimerBlobs
    @PrimerBlobs 2 роки тому +70

    "Any actual statisticians are totally cringing." Yep. It's not just pedantry. People will literally not know what their test means, and then they will judge whatever change they make in hindsight anyway.

    • @aleksaa24
      @aleksaa24 9 місяців тому

      funny seeing you here, love your vids

    • @Attewir
      @Attewir 8 місяців тому +1

      Easier to digest and more accurate statistics content on PrimerBlobs's channel
      And the currently 1.7 million subscribers agree

  • @kittykittylicization
    @kittykittylicization 3 роки тому +5

    As a Biologist (MS)... i was indeed shouting at my screen when you were talking about P values....and then you called it out so im happy now.

  • @colinstreck710
    @colinstreck710 3 роки тому +12

    That was fantastic. Their presentation skills are off the charts.

  • @ZZaarraakkii
    @ZZaarraakkii 2 роки тому +6

    @12:53 A thing to note is that in that example, people have been playing the "hard" puzzle before and the "easy" puzzle is a novelty, which may cause players to spend more time on it for the experiment, without it being the better solution long term.

  • @Vospi
    @Vospi 6 років тому +108

    As an educator and a grateful listener: that was bril-li-ant.

  • @NunSuperior
    @NunSuperior 6 років тому +22

    Thanks for the talk. I had to learn this stuff on the job at Big Software, Inc. when we started measuring PC boot time impact. There were large variances between each boot.

  • @ArsenicDrone
    @ArsenicDrone 3 роки тому +6

    While I wouldn't do everything identically, I didn't have any large complaints, which is not generally what happens listening to quick statistics intros. A good talk.

  • @Discipol
    @Discipol 6 років тому +268

    Excelent presentation. THIS was simplified? I am afraid of the scenic route xD I wish to know more, and more practical applications on game dev.

    • @2Cerealbox
      @2Cerealbox 6 років тому +82

      Extremely simplified. Statistics is, like, a whole field of mathematics.

    • @stephenborntrager6542
      @stephenborntrager6542 6 років тому +9

      It matters a lot for procedural generation, as statistical distribution is a huge part of random number generation. It can also be used to approximate various things... replacing physics in some cases. Sometimes called an "analytical" solution, you can see this show up on some games oceans, etc. The ocean is based on statistical analysis of real oceans, instead of trying to actually simulate fluid dynamics. I'm sure there are more uses than that, especially outside the game.

    • @LoudSodaCaleb
      @LoudSodaCaleb 5 років тому +15

      Yeah, that thing he did was called hypothesis testing. That took me a good 20 hours during a single week to figure out how to do it by hand at school. Finding out that it could be done in a minute in excel blew my mind.

    • @joshelguapo5563
      @joshelguapo5563 3 роки тому +9

      As a data scientist... it's a LOT. But really, you don't need all the math to do it practically. You really just need to know the basic definitions, and what the test does. And there you go you got analysis. If you're a game dev, assuming you got some programming experience, you can already do a lot of these things in the language R, with very little effort, and even very easily build some machine learning models.

    • @jonaza2105
      @jonaza2105 2 роки тому

      I essentially got most of this stuff during my semester of statistics class. As he said, he pretty much blazes through it, you mostly need time to understand when what is used, why to use it, what the downsides of using it are, etc and lastly of course, HOW to use it.

  • @JohnDoe-mx1sq
    @JohnDoe-mx1sq 3 роки тому +3

    This video has existed for almost 4 years and it feels like not a single game dev has ever watched it.
    Their sales division has warehouses of supercomputers simulating human brain functions trying to figure out how crap a game can be before you will buy it, and just how much you will spend on DLC just to play the game at all.

  • @phillipA123
    @phillipA123 4 роки тому +13

    a semester of stats in 30min. thanks guy.

  • @Brindlebrother
    @Brindlebrother 2 роки тому +5

    People are awful at five-star ratings whether that be a game, book, movie, show, item, etc. Basically, people will give 4-5 if the product was at all fun or engaging, or a 1 if there was a problem/complaint/issue or any offense taken.
    Good video. Statistics are fun.

    • @SakuraWulf
      @SakuraWulf 2 роки тому

      Chik-fil-a is not a five-star establishment, people >_>

  • @jonasnockert
    @jonasnockert 4 роки тому +13

    Love this talk! I spent quite some time trying to derive the 8.14 confidence interval in the first example and finally had to install Excel to verify. I couldn't see it at first but the slides actually mix five and six observations. At ~7:38 there are five observations. At 8:19, the confidence interval is calculated using six observations, i.e. T_DIFFMEANS(A2:A7, ...) rather than the 2 x 5 observations shown on the left.

  • @maximeflageole770
    @maximeflageole770 5 років тому +15

    More interesting and useful presentation about statistics I've ever watched.

  • @summonsays2610
    @summonsays2610 3 роки тому +3

    God, statistics is why I can't ever tell anyone I am sure of something.
    "Hey does this code work like X?"
    "Well, I was there during requirements gathering, I wrote the code, deployed it, and no one has changed it since. So I think so!"
    "Yes or no?" .... uhhhhhh

  • @CineGoodog
    @CineGoodog 2 роки тому +3

    I took an entire statistics course on college and I can remember almost everything he said

  • @hamsandwich780
    @hamsandwich780 4 роки тому +3

    One of the best explanations of the T test I have ever seen, read, or perceived in any medium.

  • @KillerBearsaw
    @KillerBearsaw 3 роки тому +4

    Absolutely fantastic presentation, would love to hear him speak more

  • @zikarisg9025
    @zikarisg9025 3 роки тому +1

    Excellent, used this to explain the p-Value to some colleagues, since our data science team is not able to explain their models that well...

  • @perfectloveweddings
    @perfectloveweddings 5 років тому +13

    You talk exactly like Jesse Eisenberg from the Social Network when he's coding. It's fantastic.

  • @FreekHoekstra
    @FreekHoekstra 6 років тому +52

    at 19:00 mins, do you care about the median? I think thats a rather brazen assumption!
    sometimes its better to have some people who are really invested and really care, and thus are willing to spend on your product, rather then a lot of people that will play for free but don't care enough to spend money, or come back repeatedly.
    Great talk overall though!!

    • @AngleSideSideThm
      @AngleSideSideThm 4 роки тому +5

      This depends on assumptions; the assumption here probably is "I am optimizing my game for ability for at least most of the initial group to make it through".

    • @Aidiakapi
      @Aidiakapi 4 роки тому +15

      That wasn't actually the point he was trying to make. Especially with a small sample size, outliers greatly skew the mean.
      As for the point of a few dedicated players willing to spend money, that only works if it's a game that does not depends on having an active online community.

    • @stuartconrod8364
      @stuartconrod8364 3 роки тому +8

      I know I'm really late to finding your comment, but I thought the same thing! Also, Mark Rosewater (of Magic: The Gathering) has a presentation on UA-cam about Game Design and in HIS opinion, that highly polarized distribution is better. It's better to make something that SOME people love even if some other people hate it, instead of something that everyone gives a 'meh' to.
      In game design, I think it's the difference between "cult classic that some people love and play forever" and "totally forgettable game that disappears in two weeks". If at least some group loves it, it can spread by word-of-mouth and certain reviews. Provided your budget was appropriate to build a niche game, you can have a success... while some game that everyone merely tolerates probably makes no impact and loses money.

    • @FreekHoekstra
      @FreekHoekstra 3 роки тому +1

      @@stuartconrod8364 exactly :)
      Better to be hated by 90% ignored by 5% and loved by 5%
      Then hated by 20% ignored by 80% and loved by none.
      Who is going to spend money on a product they don’t love when they have so many alternatives. Plus all those haters are free press too!
      I think we should lean into the fans more, look at dark souls, its brutal unforgiving and very niche, but clearly doing fine.
      League of legends, unforgiving, brutal player interactions, but doing fantastically well. Counter strike, same thing.
      Yes i do think we should keep games accessible, but Not at the cost of what the fans love.
      I think for example what halo infinite is doing is great, bringing back bots to practice offline before going into the fray.
      Allows the multiplayer to be as cutthroat and great as it always was, not with unlockable weapons thate give you an edge at the start of the round,
      No everyone starts with the same weapons, and you need to earn and fight over better ones, so its a true skill matchup.
      Thats why ists so unforgiving to new players, but also why its so incredibly good.

    • @ferinzz
      @ferinzz 3 роки тому +6

      @@FreekHoekstra it really REALLY depends on how you make money off your game.
      If it's some recurring revenue, then you need to retain a decent number of players.
      If it's a game which has interaction between users, then you need a decent player pool.
      If it's a one-time purchase, you can keep it mediocre across the board.
      If it's for e-sport publicity, you better make that as balanced as possible. Make the goals easy to understand and controls simple enough to get players pouring in.
      Overall, no matter the game, a larger pool of players will bring more potential spenders, and of those players only 20% of them will be providing your entire income.
      Money keeps a business going. So making a game for only 2 people is a ridiculous endeavor unless each piece of content is a guaranteed buy and they cannot continue into the new 'season' without making their purchases... Though if only two people are playing they'll need to be spending hundreds of thousands each time you release content.
      in a free to play game competition drives purchases. You need some fodder for the big spenders to show off their purchases/power to, or they have no reason to buy the newest released item/cosmetic the day it comes out.

  • @lan1ord
    @lan1ord Рік тому

    The first talk where I needed to decrease the playback speed instead of increasing. Great material! =)

  • @ArneBab
    @ArneBab 2 роки тому +1

    Actually your boss wants to know how large the probability is of being wrong: that you pay more than you save.
    So you want the t-test of the SSDs compared to (HDD minus the time difference needed to pay for the SSDs). You’re not below 0.05 for that with your 4 runs, so your boss cannot not sure enough that she’ll be right.
    But that’s nitpicking and I really like your video :-)

  • @LoudSodaCaleb
    @LoudSodaCaleb 5 років тому +16

    His style reminds me of the professor that made me fall in love with stats.

  • @gabrote42
    @gabrote42 3 роки тому +2

    I always tell people that basic statistics and sourcing should be taught at age 11. Would reduce the number of no-argument-freds and would reduce the fake news plausibility rate

  • @jonwatte4293
    @jonwatte4293 5 років тому +80

    "p values" aren't just complicated; they're a root cause of reproduction problems in studies with small sample sizes, and a general frequentist foible. Bayesians of the world, unite!
    (Interestingly, the "pick sub-samples" illustrations could lead to an IMO much better solution!)

    • @hamm8934
      @hamm8934 3 роки тому +5

      Bayesians can play around with their Bayes factors all they like, but at the base, they’re still operating under a frequentist model if theyre gunna do any form of null hypothesis testing.
      Without a criteria to reject the null (p val), you can’t falsify a hypothesis. So collect all the data you want and build up those Bayes factors, but you’re not escaping the problem of induction. :)
      Frequentists of the world, unite (and not be undermined by a single black swan)!

    • @jonwatte4293
      @jonwatte4293 3 роки тому

      @@hamm8934 The belief that you can "reject" the null hypothesis based on a single yes/no measurement IS THE PROBLEM. (Sorry, got a little loud there.)
      Look at the PDF.
      Draw conclusions about underlying behaviors.
      Make better predictions and test again.
      Do not pretend that "there's a 96% probability in this case" and "there's a 94% probability in this case" are vastly different, binary outcomes.

    • @hamm8934
      @hamm8934 3 роки тому +4

      @@jonwatte4293 what statistician or scientist worth their salt believes that a single positive or negative outcome is sufficient? That’s a bit of a straw man. Of course you either (1) directly replicate the result or (2) perform an extension with a different operationalization of the same hypothesis. If it isn’t replicating approximately 95% of the time, it’s quite safe to say the effect isn’t there (assuming adequate power). If it is replicating approximately 95% of the time, it’s quite safe to say the effect is there.
      The point I (and other frequentists) make is you have to have a criteria of falsification for null hypothesis testing. If you don’t, the very logic of hypothesis testing collapses as you are no longer able to discern a success from a failure. You have to make a judgement call for null hypothesis testing to exist. This whole notion that Bayesian stats somehow avoids or overcomes this judgement call is a complete failure to acknowledge that you are still making a judgement call, just with a different threshold. (See chp 1 and 2 of The Logic of the Scientific Discovery).
      Get those Bayes factors as juicy as you want. It just takes 1 falsification for them to be undone. We’ll see which method is more fruitful :)

    • @neur0leptic782
      @neur0leptic782 2 роки тому +4

      ​@@hamm8934 bruh I feel like you're still being incredibly disingenuous about this whole thing. The key issue with NHST is that a p-value *only* tells you p(Data | H0 = TRUE)-that's it, full stop. The far more interesting question is p(H | D), and that's entirely beyond the realm of classical frequentist methods. 'Rejecting the null' with p < .05 doesn't mean that there's a 95% chance the null is indeed false, or that the alternative is actually true. What we should be doing is systematically pitting models against each other, and this, I think, is something Bayesian methods are exquisitely well-suited for. And sure, there are some rules of thumb when you're doing Bayesian model comparison and trying to figure out how 'meaningful' the difference between models is, but it's a laughably false equivalence to say that the process of multi-model inference (literally comparing the evidence in favor of competing models) is anything close to a binary NHST decision based on differences in means or a correlation. Not to mention you can compare models based not only on the parameters you include, but on your priors, or the underlying likelihood function... Shit, you don't even need to use Bayes Factors-it's super trivial to compare models via their posterior predictive densities using Bayesian cross-validation with PSIS-LOO. All of this ranting is basically just to say that 'all models are wrong, but some are useful'-and I think if we really want to find the best models that explain (or even better, can *generate*) our data, you're gunna have a bad time with frequentist NHST.

    • @PrimerBlobs
      @PrimerBlobs 2 роки тому

      @@neur0leptic782 Preach

  • @xGriffy93
    @xGriffy93 3 роки тому +16

    But Fred didn't hypothesise that SSDs don't make any difference to build times, he was questioning the return on investment the SSDs would bring. Or am I off the mark here?

    • @zacsnowbank7632
      @zacsnowbank7632 3 роки тому +12

      He needed to prove SSDs had any improvement at all first. After that he had a good idea on how much it improved, and eventually he proved Fred right. It would take too many daily builds for SSDs to be worth it.
      But before that, he needed to know what the difference even was, and after that he used a simple formula to see how much money it saved. Poor Fred just had some words put in his mouth to make the presentation go a little smoother at the beginning.

    • @donanderson3653
      @donanderson3653 3 роки тому +3

      To be fair, that wasn't daily builds, it was total builds, since SSDs are a one-time investment. Getting even the lowball estimate of 210 builds out of the lifetime of the SSD is probably easily achievable, so SSDs would be a worthwhile investment.

    • @nlb137
      @nlb137 3 роки тому +1

      He covered that briefly with the discussion of dev time cost and how many builds you'd have to do for the SSD to pay for itself.
      You have to have a null hypothesis to test, and "X isn't worth it" isn't possible, IIRC. It's been a while, but I think your test *has* to basically 'touch zero'; either x=0, x>0, etc. An "even if does save time, does it save *enough* time" hypothesis requires a test that is basically "is x >= y" (where y is the 'threshold' where SSDs pay for themselves). It's either easier to first prove that there *is* a time difference, then calculate the 'value' of the time difference, or it's not even possible to do it the other way (or at least not with 101 statistics).

    • @tomasxfranco
      @tomasxfranco 3 роки тому

      @@donanderson3653 Also, SSDs can speed up OS and App boot times as well as many other tasks, so it's ignoring a lot of the other benefits they give.

  • @jarrakul
    @jarrakul 3 роки тому +9

    Very good talk, even I'm kind of screaming at the use of p-values as "the chance that Fred is right." But you clearly know that, and are simplifying because p-values are confusing and don't actually measure quite what we use them to measure. Which is a good reason to switch to subjectivist statistics, but you can hardly explain how to responsibly use priors in a 30-minute talk.

  • @KrossX
    @KrossX 6 років тому +23

    Happy new year!

  • @RglMrn
    @RglMrn 11 місяців тому +1

    Incredible talk. Thank you so much!

  • @buttonasas
    @buttonasas Рік тому

    Hours played for different versions being radicalised is pretty normal and there are often very good reasons for that because games have lots of humps or steep curves or brick walls. There might be something _terribly_ wrong in the tutorial that makes x% of people just not get past that.
    And, honestly, I prefer 20% of players go "this is amazing" and the other "bad game" than everyone saying it was "just ok".

  • @PoppyGaming43
    @PoppyGaming43 3 роки тому +4

    youtube: *recommends me this video*
    me, who's literally never gonna use any of this: *interesting*

  • @alfredoeleazarorozcoquesad2988
    @alfredoeleazarorozcoquesad2988 2 роки тому

    Hi! Great talk thanks!! a QUICK TIP for A/B testing! (I'm economist) You could randomly choose who goes into experimental/control group :) That way you don't have to switch, you just have to apply the procedure to many people once, like this: 1) New player enters 2) You generate a random number (between 0 and 1 can be) 3) is it geater than 0.5? experimental, no? control 2) register their group and their target number :D Even if they play only once (you don't need multiple rounds), you can compare the means between those groups ;) Thanks again for the talk!

  • @GameTesterBootCamp
    @GameTesterBootCamp Рік тому

    As a math dummy, this talk make my brain implode.

  • @lookatnow5730
    @lookatnow5730 6 років тому +5

    Wonderful talk

  • @dominicparker6124
    @dominicparker6124 2 роки тому

    how he answered that first question was amazing, you can see he knows his shit.

  • @brandonwilbur2146
    @brandonwilbur2146 2 роки тому

    Okay UA-cam recommendations, I clicked it.

  • @julio1148
    @julio1148 4 місяці тому

    Great intro, but as an artist, I WISH it took a year to be Rembrandt lol
    Great talk too!

  • @Alex-re3qm
    @Alex-re3qm 3 роки тому +2

    This kinda stuff is what game dev tycoon is missing

  • @mrichards
    @mrichards 3 роки тому +9

    Wasn’t he wrong in choosing two tailed t-test? Since he is testing whether SSDs are faster, not just that SSD load times come from a different population than HDD’s

    • @davidfoley8546
      @davidfoley8546 3 роки тому +2

      Yes.

    • @ArsenicDrone
      @ArsenicDrone 3 роки тому

      Fair question. His reasoning was pretty sound. He would want the one-tailed t-test if it were a safe assumption that SSDs are always either faster or the same (an assumption about the underlying distribution). Making that assumption (which is a bad assumption) is not the same as being mostly interested in finding out if they are faster (which is valid, but does allow for them being slower). His test concluded that they were different distributions, and he could also see that the difference was to SSDs' benefit.

    • @mrichards
      @mrichards 3 роки тому +1

      ​@@ArsenicDrone The boss was specifically asking if SSDs were worth it (i.e. sufficiently faster that their mean speeds come from a different, faster, population than HDD mean load speeds). Wouldn't it be a mistake to intentionally test a broader hypothesis than you require just to verify your actual, narrower hypothesis by observation at the end?

    • @ArsenicDrone
      @ArsenicDrone 3 роки тому +5

      @@mrichards Ah, one of many not-so-intuitive things about statistics. It really comes down to only making the assumptions that you can justify. What the boss was interested in doesn't determine what's possible to test or what assumptions are valid. Notice that his p-value is half as large for the one-tailed test (the result is even more significant). The test got substantially more powerful, but that power doesn't come for free, it comes by making this unjustified assumption. (It's not justified because before he runs the test, he really doesn't know which outcome will happen, and it could actually be slower.)

    • @davidfoley8546
      @davidfoley8546 3 роки тому +6

      ​@@ArsenicDrone No, he really is mistaken. Whether or not it is a safe assumption that SSDs are always faster is actually irrelevant. What is relevant is that the hypothesis he's testing is a one-sided hypothesis--that SSDs are faster. If he had measured SSDs to be slower, by any magnitude, the hypothesis would have been rejected.

  • @TomiTapio
    @TomiTapio 5 років тому +2

    Worth a listen.

  • @aakk100011
    @aakk100011 4 роки тому +6

    20:12 When you say Fred being right is 3%, but we are using a two-tailed test. I think the conclusion should be Orange version is different than the old version, it's either better or worse.

  • @DawnBriarDev
    @DawnBriarDev 3 роки тому

    24:14
    I dunno, I love my coffee black and I think that study has a point xD

  • @Joeofiowa
    @Joeofiowa 6 років тому +4

    Absolutely brilliant.

  • @raventhorX
    @raventhorX 6 років тому +8

    this guy is my new idol lol.

  • @franksonjohnson
    @franksonjohnson 4 роки тому +2

    Watched the Spiderman talk then this one. Just, damn, passion. Awesome.

  • @kaloqnchyyy
    @kaloqnchyyy 3 роки тому

    the best presentation I have ever seen

  • @lushen952
    @lushen952 3 роки тому +3

    Problem with your cupcake mode example.
    Making the game easier may have a positive impact in the short term and may have a negative impact long term. Short term statistics can only measure short term results.

    • @jacobb5484
      @jacobb5484 3 роки тому

      The test was simply to determine whether difficulty had an effect on time played in either direction greater than the margin of error for the sample size.
      These are great as backup tests to ensure the results aren't just a fluke without a unreasonably large sample size.

    • @lushen952
      @lushen952 3 роки тому

      @@jacobb5484 Doesn't matter. If I'm a tester and only testing the game for 10-15 minutes if its too hard I'm going to report that it's too hard. If the game gets made easier and released and I pick it up and find that 30 mins in it's too easy, I'm going to get bored and quit.
      I think he oversimplifies the situation.

    • @jacobb5484
      @jacobb5484 3 роки тому +1

      @@lushen952 Its a simple example of a T test on a paired sample. this isn't for small engaged focus groups with detailed subjective data, but rather big data statistics such as the example of a sub mode being beta tested.
      The situation in this example the T test gives a percentage chance of either:
      A. The change had the effect of either increasing OR decreasing what's being measured by a notable amount.
      B. the data is probably skewed due to bad sampling and falls within the margin of error.
      once you rule that out, you can make further changes and run detailed tests to actually make an improvement.

  • @andrewneedham3281
    @andrewneedham3281 4 роки тому +2

    It was great, right up to the "always use the 2-tailed value." Tons of circumstances where it's better to use a one-sided t-test.

    • @davidfoley8546
      @davidfoley8546 3 роки тому +3

      In fact, his own first example should have been a one-tailed test.

    • @andrewneedham3281
      @andrewneedham3281 2 роки тому +1

      @Richard Sejour A 2 tailed test splits your significance level on both tails, so it's only half as strong as a one tailed test when showing a difference between groups IN A SPECIFIC DIRECTION.
      Frankly, a 2-tailed test is a sloppy but acceptable way to test, but it really shouldn't be used when you have a specific direction of difference between the groups in mind. A 1 tailed test has more power at the same alpha level. It's basically weakening your hypothesis to hedge your bets by using a 2-tailed test when you should be using one.
      That's why I don't like this lecture. It's a computer programmer with a SINGLE statistical tool he knows, so everything looks good to apply that tool on. It's like that old adage that if you have only a hammer, everything looks like a nail. If he were a statistician, he'd know better. But he's sitting there spouting off like he does, when in fact he's dead wrong.

    • @andrewneedham3281
      @andrewneedham3281 2 роки тому

      @Richard Sejour Sure. I never said that he shouldn't use a 2-tailed test in that situation. I merely said that it's foolish to say "Always use the 2-tailed value."
      Edit: In science, if you have a hypothesis, your hypothesis generally has directionality to it, or you've written a piss-poor hypothesis. So, frankly, I'm often using 1-tailed tests to show that X is strictly less than/strictly greater than, on some real life data, such as, "Are female babies truly smaller than male babies?" or "Did the biodiversity index for the Upper Nooksack area truly increase due to our conservation measures?" In those cases, as a scientist trying to get published in a peer reviewed paper, I'd get laughed right out of publication for trying to use a 2-tailed test in those or many other situations where I find myself relying on statistical inference. Just saying.

    • @andrewneedham3281
      @andrewneedham3281 2 роки тому +1

      @Richard Sejour That's an out and out lie that "most papers" use a 2-tailed test. In Lombardi and Hurlburt's study (2009), about 20% of papers in the field of biology and animal research were 1-tailed, with another 20% not telling whether their p-value was 1 or 2-tailed. So, anywhere from 24 to 40% in biology. However, I would defer to you that in sociology, for example, probably no more than 5% are one-tailed. And if you want to keep on this conversation just for the sake of being a blowhard pedant, I'd invite you to casually refrain. I'm not about to argue further with some stranger as to why my research sometimes uses a one-tailed test to support a claim. Sometimes deviations are only mathematically possible in one direction. It happens. Get over it.

  • @neruba2173
    @neruba2173 3 роки тому +3

    Ill throw a question out of fashion this days. How many players are having fun with my game, and thus, eager to buy anything at all at my shop.

    • @MrDavidCollins
      @MrDavidCollins 3 роки тому +1

      If your game has a shop you've already failed.

    • @drumer960
      @drumer960 3 роки тому

      @@MrDavidCollins that's just objectively wrong lots of incredibly good and fun games have shops

  • @yungthunder2681
    @yungthunder2681 3 роки тому +3

    If you're a game developer, and didn't take AP statistics, please tell me how you became a game developer?

    • @jacobb5484
      @jacobb5484 3 роки тому +2

      lots of practice by making mods, level design, digital modeling, etc.?

    • @MrDavidCollins
      @MrDavidCollins 3 роки тому +1

      I took statistics and didn't become a game developer (at a company, I just make it all myself now).
      College costs too much

  • @motbus3
    @motbus3 Рік тому

    With moderate power comes moderate responsibility

  • @iwersonsch5131
    @iwersonsch5131 3 роки тому +1

    23:17 That's 45 two-sided tests so you go look for p values below 0.00056. That gives you a 5% false positive rate overall, but I can tell you that you're almost guaranteed to find a true positive unless the classes are carbon copies of one another

    • @droidBasher
      @droidBasher 3 роки тому +3

      That works if you want all 1 vs 1 fights to be mostly fair. Think of something like Street Fighter where you can't change your character mid-match. A rock paper scissors relationship would be fair but then if you are playing rock and the opponent is paper then the match isn't a good test of skill, the game was over at the character select screen.
      Depending on your context (something like Team Fortress or StarCraft) you might need to instead find the Nash equilibrium to make sure all units have their niche. But looking purely at win rates might mislead you if your player base is not playing optimally. Even if you can trust your win rate statistics, finding the Nash equilibrium is NP complete, meaning that each new character class exponentially increases the complexity of the problem.
      And there's probably units like the SCV where the kill death ratio is exceedingly bad but you can't win without them because their role is non-combat. Or a unit like the carrier (maybe? I'm not a pro) that isn't resource efficient but is a way to force the game to end if you are already ahead in resources and tech. If that's true and you analyze the carrier per unit, it might look overpowered, if you look at it per resource it might look underpowered, but it still has a niche.
      I guess that all I'm saying is that it's a hard problem, and game theory might be useful, but could still be difficult to apply if you have a game that is interestingly complex.

  • @Fmlad
    @Fmlad 3 роки тому

    Incredible talk

  • @robelbelay4065
    @robelbelay4065 3 роки тому

    Great talk and amazing delivery :)

  • @georhodiumgeo9827
    @georhodiumgeo9827 4 роки тому +3

    This makes me so happy! Great talk I learned a lot.
    We had 100 barrels at work that were documented to have 50 kg in each. You could quickly tell none of them were empty and it looked like our written inventory was close. The account (not my boss) told me to measure all of them to see how accurate we were. I measured 8 and calculated the standard deviation.
    Jokes on you I’m not going to break my back and work my ass off to learn something I already know. I’m sorry if you don’t understand what I’m doing I’ll send you a Wikipedia link after I’m done.

  • @QuietSnake-xs5vx
    @QuietSnake-xs5vx 4 роки тому +2

    I understood only half....need to brush up on my probability

  • @Weckacore
    @Weckacore 6 років тому +4

    This is probably very helpful, but just forget everything he said if you're taking a class on stats...
    EDIT: This does an amazing job of teaching intuition and importance, good talk

  • @elizaknight6980
    @elizaknight6980 6 років тому +2

    This is enjoyable, thanks :)

  • @yottawatts9470
    @yottawatts9470 3 роки тому

    I didn't even watch this but scrolled through a few times and could tell this is an amazing presentation. Will watch later bravo.

    • @simlife445
      @simlife445 3 роки тому

      it is.. but its is not its a video on how poor ftp gamers flock because lack of money... and how to get them to spend more.... and about how bad ssd are... in 2016 but are now 40-60% cheaper per gigabyte and much much faster... bravo to skipping the description and basic computer imp in the last 5-6 years....

    • @yottawatts9470
      @yottawatts9470 3 роки тому

      @@simlife445 Moron alert. You confirmed that it is indeed a good presentation then went on some personal rant of the content you didn't like? I don't give a damn sheesh.

  • @tanagato3721
    @tanagato3721 2 роки тому

    Damn, I'm not a game developer. I have never googled this topic. I just wrote down the idea of a some computer game that accidentally came to mind and described the game mechanics in the note app on my android smartphone and youtube immediately recommended this video to me. Coincidence? Now I do not know whether it is good or bad...

  • @Brodysseus113
    @Brodysseus113 2 роки тому +1

    Something I'd like to add to the graph at 19:00, the blue analytics are healthier because it produced a stronger reaction. Those are the people who are willing to put money into your game.

  • @nimm90
    @nimm90 3 роки тому

    I still have no idea how Fred is not convinced with an upgrade that generates $34 per 100 players of profit.

  • @Adaministrator
    @Adaministrator 4 роки тому

    excellent talk

  • @andrewcamden
    @andrewcamden 2 роки тому +1

    More often than not the data you do NOT have is more important than the data you do have. For instance, I and probably millions of other people didn't buy Dead Space 3 BECAUSE it was infested with microtransactions. There is no data for that though since a lost sale literally doesn't show up on the balance sheet.
    Game devs who decide NOT to "leave money on the table" by making real games without microtransactions are actually leaving a great deal of money on the table in lost sales for which they don't have any data.
    Game devs need leadership, empathy (essential for understanding customers even if you have no moral concerns whatsoever) and common sense to make good decisions. There isn't any amount of data that can substitute for these attributes.

  • @Parker--
    @Parker-- 3 роки тому +8

    17:27 Watching him shit on gullible health journalists in the COVID timeline. It's like he knew.

  • @Preaplanes
    @Preaplanes 3 роки тому

    Guy dismissed me in the first 21 seconds.
    Won't pretend I'm not tempted to continue watching. Statistics as a science (rather than bad statistics as a political tool) is the only kind of math I can say I greatly enjoy.

  • @AdrianTache
    @AdrianTache 3 роки тому

    Statistics are a fun way to compare datasets but unfortunately sample size and methodology usually mean that whatever conclusions you draw might be completely irrelevant. And as he's saying, the more questions you ask, the more likely you are to be completely wrong.

  • @inguanara
    @inguanara 6 років тому +2

    that was awesome

  • @lmartinson6963
    @lmartinson6963 3 роки тому

    I'm pretty sure people who drink their coffee black being sociopathic is entirely factual

  • @KHamurdik
    @KHamurdik 5 років тому +2

    I feel educated

  • @roeyshapiro4878
    @roeyshapiro4878 4 роки тому

    Did anyone else look at the picture of Rembrandt that he had up there and think that it looked peculiarly similar to him?

  • @jerrygreenest
    @jerrygreenest 3 роки тому

    12:37 negative less? Wait, that's more!

  • @YT775
    @YT775 3 роки тому +1

    @15:30 "As opposed to 20 to 22", doesnt he mean 21% instead of 22% or am I missing something?

    • @dezimal9143
      @dezimal9143 3 роки тому

      If you have 20% of something... let's say all IBM shares, and you increase your holdings by 5% = now you have 22%. But when you say you have increased it by 5% percentage POINT you went from 20%=>25%.

    • @YT775
      @YT775 3 роки тому +1

      ​@@dezimal9143 I bamboozled myself. meant to say 21% sry.
      How is 5% of 20 = 2 ?

    • @dezimal9143
      @dezimal9143 3 роки тому +1

      @@YT775 Actually it isn't 2% I didn't check the math xD. And you are right it should be 21 vs 25%.

    • @YT775
      @YT775 3 роки тому +1

      Thanks, so I guess theres no hidden meaning, it was just a minor error/inaccuracy of the speaker. :)

  • @Daniels2l
    @Daniels2l 3 роки тому +3

    If I were this guys boss id be like OK, If i buy the f*ck*ng SSDs will you shut up?!!

  • @garryiglesias4074
    @garryiglesias4074 2 роки тому

    14:36 - Historical and linguistic horror: Sans-culottes MEANT with pants...

  • @tomasxfranco
    @tomasxfranco 3 роки тому +2

    Not that it's the point of the presentation, but this misses the other marginal benefits of working on SSDs all the time, not just in builds.
    Additionally, if build time doesn't change when moving to SSDs, then the bottleneck is elsewhere and could be tackled via a different component or algorithmic improvement.

    • @simlife445
      @simlife445 3 роки тому

      or that his is 5 year old gdc session(read the discription) so this data is insanely old ssd are 60% cheaper per gig and much faster

    • @13b78rug5h
      @13b78rug5h 3 роки тому

      Yeah and long build times is actually one of the biggest blockers to ci/cd, which the lack of is usually the best indicator of long lead time which is the best indicator for slow development trapping more resources inside the system, increasing the number of bugs, less feedback, less data, less experimentation and less revenue. Overall meaning slower delivery and lower quality product and/or requiring more resources to deliver. And in the end you should not generally build on your local machine but do it automatically on a build server.

  • @IntrusiveThot420
    @IntrusiveThot420 3 роки тому

    Any presentation that's got a 538 joke in there is a good presentation

  • @laureven
    @laureven 3 роки тому

    Gold on UA-cam :)

  • @Intrexa
    @Intrexa 3 роки тому +1

    "The relative risk of somebody in control group b buying pants.."
    Relative risk of buying pants? What are you making these pants out of?

  • @mano_lamancha4716
    @mano_lamancha4716 3 роки тому +2

    The question posed in the thumbnail says everything about why video game quality has plummeted in the past decade.

  • @gabrieldta
    @gabrieldta 4 роки тому

    Sony is following the monocle example right now by giving 10USD credit to random accounts. Fo'sure that's Sony's ulterior motive: Measure how much more likely people ate to engage into the store and (if they're lucky) top that 10USD to buy more expensive games... =)

  • @stefanomaggio5109
    @stefanomaggio5109 3 роки тому

    pls tell me the name of the book where i can find all this shit in detal specifically applied for game cases

  • @slavskee
    @slavskee 4 роки тому

    God - like speaker

  • @CraigNicoll
    @CraigNicoll 6 років тому +2

    Holy sh*$ a math talk I could ACTUALLY follow! BEST math GDC talk of all time. *awards him Golden UA-cam User ClickedCookie.

  • @crg78lf
    @crg78lf 3 роки тому

    Dont forget: If you buy SSD to improve build time make sure to put your SWAP memory on the SSD. If you don't have a lot of RAM the extra memory used by the compiler/linker will then go on the ssd as well, drastically improving your build time

  • @Ralke1
    @Ralke1 12 днів тому

    cool math bro

  • @poyi1013
    @poyi1013 3 роки тому +1

    I’m more confused after the video~~~

  • @frankvonfrauner
    @frankvonfrauner 3 роки тому +4

    This is why gaming is so terrible and toxic now.
    Optimizing for maximum amount of time played and money spent above all else.
    They used this same sort of statistical analysis to make cigarettes more addictive.
    Not even a consideration for the well being of their customers, and not even an acknowledgement that they're spending hundreds of millions of dollars to make their product as addictive as possible so it can be deployed against children.

    • @NoActuallyGo-KCUF-Yourself
      @NoActuallyGo-KCUF-Yourself 3 роки тому

      Well, yeah. Welcome to capitalism. Until the underlying philosophy and society of capitalism is minimized, companies _can't_ do it any other way. It is mandatory for them to optimize profit above all else. Any other consideration is secondary at best.

  • @Skronkful
    @Skronkful 3 роки тому

    The only thing that made me cringe was when he said people should ignore the one-sided p-value, when his example (and most things you'd want to test in real life) is a one-sided hypothesis. It's not necessarily that we assume/know that SSDs are faster, it's that if we find that SSDs are significantly slower, we shouldn't be rejecting the test. He is actually doing a test of size 2.5% instead of 5%.

    • @f.p.5410
      @f.p.5410 3 роки тому

      If only that was the only cringe part... His explanation of p-values is statistical illiteracy 101.
      I was really surprised when I heard him making that mistake, interpreting p-values as Pr(H0).
      I thought that this statistical concept entered pop culture (kind of like "correlation is not causation" already did)... Amazing that people like him have the confidence to give talks on statistics.

    • @f.p.5410
      @f.p.5410 2 роки тому +1

      @Richard Sejour he always says that the p-value is the probability of the boss (can't remember the name) being right. In other words, the probability of H0 being true. So that p

    • @f.p.5410
      @f.p.5410 2 роки тому

      @Richard Sejour But that's the problem, a two-tailed test is NOT the best choice here.
      Look at it from the decisional perspective, what evidence would lead you to take action? Only evidence of SSDs being faster than HDDs leads to an action (buying SSDs to replace HDDs).
      Evidence of SSDs < HDDs or even SSDs = HDDs would lead to the same (in)action, i.e. not replacing the current HDDs.
      Thus, H0 = {speed of SSDs ≤ speed of HDDs} and H1 = H0^C = {speed of SSDs > speed of HDDs}. This is a one-tailed test. Period.
      He's using two-tailed tests because obviously he lacks the statistical experience to choose a proper significance threshold, so he chooses the canonical 5% threshold and halves it with this idiotic two-tailed choice.
      It's not surprising that he's not able to choose a proper threshold (after all, it's a pretty complex art, definitely inaccessible to someone who can't even understand p-values) but it's pretty surprising that he feels qualified to give a talk on statistics...

    • @f.p.5410
      @f.p.5410 2 роки тому

      @Richard Sejour Yes, it's unfortunate that common statistical knowledge is so poor. Basically any soft science (ab)uses statistics and as a consequence, many psychology/economics/medicine/sociology professors feel entitled to teach statistics. They usually don't know what they're talking about, but online sources are filled with their misconceptions.
      But trust me when I tell you that I'm not too harsh. My comments may look excessively nitpicky, but they're absolutely fundamental to trustworthy statistics. The very idea of what hypothesis testing is and what it represents, must be 100% clear before trying to use statistics to aid in the decision making process.
      People like the speaker in the video, who take decisions using statistical tests that they don't understand, are extremely dangerous and scary. What they're basically doing is rolling a dice and hiding behind the veneer of "science" or "statistics". But don't let them deceive you, they're still just rolling a dice.
      Thankfully the speaker works in a relatively "irrelevant" field like game design, but this problem is extremely widespread in more "relevant" disciplines like psychology/medicine/economics. It's not that I'm too harsh, it's that statistics can be extremely dangerous when practiced lightly.

  • @frostknight7687
    @frostknight7687 3 роки тому

    Send this to chris

  • @MrIronJustice
    @MrIronJustice 2 роки тому +1

    I think 'every game developer' isn't really apt here... more like, 'games as a service' with F2P model developers. Making a single player game has nothing to do with any of these concepts.

    • @ruukinen
      @ruukinen 2 роки тому

      What do you mean? Even single player games can gather data about what the players do and then test those stats for valuable insights. Paradox games or Total War series comes to mind.

  • @anonymoususer3561
    @anonymoususer3561 3 роки тому

    This could have been half as long, probably

  • @rustyshackleford2022
    @rustyshackleford2022 3 роки тому +1

    I watched this for 10 minutes before I realized he wasent saying memes , I was waiting for a joke that never happened,

  • @RetiredRhetoricalWarhorse
    @RetiredRhetoricalWarhorse 3 роки тому +3

    Let's make a bet: Everyone who even understands half of this will NEVER be able to read "journalism" again without major frowning going on :D.

  • @13b78rug5h
    @13b78rug5h 3 роки тому

    The only thing you save with faster build times isn't less time it takes. You make builds less often the more time it takes, it increases your lead time from feature idea to a working feature, therefore trapping value inside the system, slowing down the feedback of data or in some cases revenue. Also opening up your project and files or whatever decreases developer productivity and gets on their nerves. But in the end, all this is a false dichotomy as you should have a build server that does all the builds automatically and not rely on local manual builds.
    Continuous integration and delivery are a cornerstone of all high performing engineering cultures for a damn good reason.