i left the system, because I felt that the organisation of academia, modeled off of catholic latin schools coming out of the dark ages, is fundamentally flawed and probably not going to be repaired from the inside alone. I think hypothesis papers should be a regular thing, before considering a results paper. I also think their needs to be insentive to publish null results and punish antisocial journals with eugenic or manipulative focuses like the journal of controversial ideas. tends to produce bad stuff.
You can not "fix" the system. The system exists for the purpose of getting attention to scientific information, real or not. In the modern sense, attention is money and fame. Without it, the information does not matter.
There's a mistaken belief that if the data does not fit the hypothesis then the experiment has "failed." The hypothesis may be unsupported by the data, but assuming all the procedures were followed and necessary controls implemented, the experiment was a success as new knowledge was gained.
How about leaving the outlier in if it helps your case? That is also p-hacking. P-hacking is not manipulation of the data, it is biased manipulation of the data.
I always thought p-hacking originally referred to the process of enumerating multiple hypotheses *after* data collection. If you generate 100 random null hypotheses, 5% will have p-values less than 0.05 by chance.
That "pesky outlier" is a HUGE discrepancy in data. It's a little frustrating to see laypeople not getting it and saying such a small range is no big deal, but I counter by saying if you had diabetes and your doctor had a 5% chance of screwing up your dose so badly you went into a diabetic coma and died, is that still no big deal?
I believe we used to call that the snowball effect. One starts with small things like removing the pesky outliers but every time we remove an outlier, another one appears. Keep taking out the outliers until there is a neat set that agrees with your hypothesis.
@@WobblesandBeananother way to look at it is that the p scale accounts for the outlier and is why there is a margin for error in the first place. Removal of outliers give a greater margin of error for the remaining data.
As long as "publish or perish" reigns supreme and nice stories get celebrated while null results and replication studies go into the wastebin, stuff like this will be the main output of the scientific system, at least in the social sciences...
...and don't forget the politics of the grant money$€£¥!!! You definitely cannot do any study that might give results that offends the politics of the "benefactors"!🤮
When I was working on my PhD in the early 90s, it was obvious to me the entire science foundation was broken. The get my doctorate, I needed a positive outcome of my model. I collected the data. I analyzed the data. I wrote up the results. Everything was biased to get a positive result. The same process occurs with getting hired and getting tenured. Positive outcomes are rewarded. Negative outcomes are ignored and cumulatively punished. I wrote a paper then suggesting that statisticians be paired with researchers to consult on methodologies (sufficient n to examine parameters, the right statistical methodology to examine data, etc.). As long as careers are determined by outcomes and researchers can cheat, they will.
Not only that: Many researchers, thinking they will help society, are subsidizing drug companies: "Pharmaceutical companies receive substantial U.S. government assistance in the form of publicly funded basic research and tax breaks, yet they continue to charge exorbitant prices for medications". The medical industry abuses the taxpayer by overcharging, and takes advantage of both the taxpayer and universities by exploiting publicly funded research. Why? It is by design. It is all to reap shareholder profits. The medical industry serves the shareholders and investors, and not the patients.
Proving a model wrong would advance science, by showing that something is wrong. Wolfgang Pauli 'That is not even wrong' (Das ist nicht nur nicht richtig, das ist nicht nur falsch!'.)
_"You pushing an unpaid PhD-student into salami slicing null-results intro 5 p-hacked papers and you shame a paid postdoc for saying 'no' to do the same."_ - quote of the week
Soon AI will research and publish papers that will be peer reviewed and found that the AI results cannot be ignored or refuted as the research will be based on a database that is catholic or universal in scope. PhD's will mean sh^t by comparison. Social science will mean sh^t also. But you already sense that, if truth be told.
Fourth problem- researchers are not rewarded for publishing negative results, and incentivised for original research, but not replicating prior studies
Yes and this may be the most crucial problem. One of the most fundamental principles in science is that a result is not valid unless it can be replicated.
We have to be careful what we wish for. If we incentivize replicating prior study results we are likely to end up in a quagmire of fake replication studies. The paper mills will spin in full capacity to create studies that "replicate" other studies. At least right now, if a study is replicated, we can be at least somewhat sure that it was actually replicated.
Part of the problem with replication is that it's unlikely to get funded. Grant issuers want to fund original research. Even if a researcher wanted to do replication studies, getting the funding would be difficult. Maybe all grants should include funding for follow-up replication, and require it to be done?
Pete you're laughing when you read out the emails but what he does is literally the bread and butter approach of data analyst and data scientist in businesses. Ive been fighting this stuff for 13 years and it's exhausting
It happens in a lot of fields that rely on data. I am even guilty of this, not p-hacking, but manipulating data to reach a desired result. I did real estate appraisals before and during the real estate crash. We routinely screwed with data sets and comparables in order to get a valuation the bank required for issuing a loan in order for the client to purchase the property. While we did have some wiggle room if the data said one number and the number that was required was only a percentage or two higher, a significant number of real estate appraisers would swing for double digit manipulation. Thankfully the trainer I had was fairly honest and we did not do as many bogus appraisals as others. Instead we would violate the portion where we gave a rough ballpark figure so the bank could decide if they wished to hire us or find another appraiser willing to be really shady. Unfortunately banks were notorious for blacklisting appraisers that tanked real estate deals by not "hitting numbers." I only had one property that I feel was extremely wrong. The person I did the appraisal for happened to be someone that was able to get us a client that sent us enough work to keep two people employed. We received about 200k a year in wages from that company.
Not sure if it will make you feel better or not, but I’ve been fighting this for 33 years. It’s beyond p-hacking, it’s outright denial of business people to accept facts that go against their opinions.
Yeah, in industry it's common practices. My boss and other collaborators always "why don't u do more slicing?" We literally do tens and hundreds of testing without adjustment. I tried to get them pre-specifing a few sub-population, but that's futile. Whenever, the experiment doesn't turn out as expected, the response I got is always more slicing.
The weird thing about the buffet study is that finding no relationship between cost and satisfaction is also an interesting finding. There is value in this kind of findings, too.
Totally agree - you can learn something from almost every study if it's done properly. BTW, "p-hacking" is a skill that is highly rewarded in the corporate world, sadly.
@@animula6908 The issue is, if given the choice, most people would choose 10 dollars. I wonder if they had a choice. Making the data imbalanced to 10 dollars over 50.
For a number of years, I was the Chair of my university's Institutional Review Board (which reviews and approves/disapproves research involving humans). The amount of crap research that we had to review from the social sciences was appalling...we had a number in which N (the size of the research group) was 1-2, from which they drew "conclusions". If there was any pushback from the IRB, they just made it a "qualitative" study or "preliminary study" to not have to do statistics. And the disregard for Federal guidelines for using research involving humans was scary. Luckily, what the IRB said could not be overruled by anyone, including the president of the university. But I made a lot of enemies across campus.
Yeah, reminds me of when I was a grad student and was the research assistant for an IE professor. He worked with a gem of a tenure-track mechanical engineering professor on a research topic, where the ME professor was responsible for the physical simulation and recording of results, and the IE professor (my boss) was responsible for the experimental design and data analysis, both of which he deferred to me. So they do a few sample runs so we can get a good idea of the variance of the results. (Two between-subject factors, and a repeated-measure, in case anyone cares.) We discovered that there was actually little variance, so we have a meeting where we (IE professor and me) are happy to tell the ME professor and the client that we can probably have as few as 3 runs per cell of our design. This ME professor then asks why we can't do just one run per cell. I was appalled that someone would say something THAT much out of left field in front of our client, who was quite knowledgeable on experimental design and, you know, basic calculations regarding degrees of freedom. I was afraid my professor was going to have a stroke or something, so I quickly just pointed out that the math wouldn't work out. I think that was the day that any respect I had for the title Ph.D. in itself died.
thank you for your work. I'm a female biomedical engineer who specializes in tissue and genetic engineering who did a lot of research during my time at my university, and we always had a running joke amongst engineers in my department, "biologists can't do math, and social scientists can't do anything period". its a real problem. social science students are NOT science students in terms of STEM education and scientific practice - I want to make that clear. there is nothing scientific about their "studies" about 90% of the time, and their curriculum in school is the embarrassment of the scientific community. most don't even take calculus, let alone physics, chemistry or advanced natural sciences. they take mostly fluff courses like very basic anthropology (which is mostly about cultural norms), literature based courses, sociology based courses (which are also cultural in content), and psychology courses, which can be practically anything in terms of content at course levels beyond the beginner level course PSY101/PSY111 which is standardized. so the base curriculum itself is severely lacking in terms of scientific education, especially when compared to the education other STEM students are receiving. due to the lack of a proper education in STEM, when it comes to social science studies, these people develop a foregone conclusion that they're trying to prove based on a fictional story they want to tell, instead of collecting existing data from a literature review or previous work THEN formulating a hypothesis. basically, they're developing an idea out of nowhere and then trying to prove it, which doesn't work, generally speaking. that's backwards. typically, you are simply studying a subject; let's say "the human behavior of purchase satisfaction". at this point, once you've decided on a subject of study, you conduct a literature review to see what has already been published on a subject. you may choose to peer review another study on that subject, or you may look through the data and methods of other previous studies to look for trends. once you have identified a study to peer review, or trends in previous literature, THEN you develop a hypothesis based on previous research on that subject or a similar subject, so that you are an educated expert on that subject before you start, and so you're making an educated guess with your hypothesis based on something tangible and real, not a wild guess based on your personal version of reality. this is also so that you don't inadvertently repeat a study that has already taken place, without realizing that you're peer reviewing, or repeating a study that has already been conducted and verified many times, so that resources are being spent to either strengthen previous findings, or develop new findings. we don't want to waste resources on useless subjects or on subjects that have already been extensively studied. then you build your study around what you've learned from all of humanity's collective previous knowledge on the subject, collect your data, and look for trends with a large randomized sample size. if you don't find trends, you go back to the literature, and try to understand what went wrong, and try again after making modifications, or you choose another niche/subject. if you do actually find significance, you perform similar studies to collect more data and publish. social science is doing the opposite. they are formulating a wild guess hypothesis based on their imagination (not anything concrete), putting together a poorly planned study due to their lack of scientific education with terrible sample sizes, no randomization, no controls, which is based on nothing but their fantasies, then collecting data in a way that doesn't make sense or is missing important aspects of data collection, then use Microsoft Excel to analyze the results for them because they don't understand the statistics themselves, and when they don't see anything significant (shocker!) due to the fact that they based their study on nothing but their daydreams, they delete outliers, eliminate dozens of data points until their sample size is tiny, or even more brazen methods of data manipulation until their study's data fits their pre-conceived narrative. if Dr. Wansink had reviewed previous literature, he would have found that human beings in the situation he set up for his study will likely experience the same level of satisfaction. why? because they don't realize that they paid more, or paid less, than other people did for the same item. this is a study where the participants are not told that they paid more or less. they are simply served a meal, at a given price, then polled on their satisfaction. two people given the same food will experience the same level of satisfaction, especially with the price difference being unknown to them. the previous literature indicates this fact. only when people are *told* that they over or under paid for an item or service do their feelings on satisfaction begin to shift. like so many other social scientists, Dr. Wansink simply wanted to write a good fiction "story" to get published and picked up by the media, instead of doing real, useful science. I've yet to find a social science study that is practically useful, fully replicated and peer reviewed, scientifically sound, and not just common sense. I've seen this more times than I can count out of both biologists and social scientists, but mostly social scientists. thank you for acting as a barrier between bad science and unwarranted funding/access to more resources including unwitting animal and human subjects. I'm sure you didn't make a lot of friends, but you did god's work. you should be proud of your integrity. cheers to you my friend.
I'm not surprised, I've heard a bunch of excuses about the social sciences as to why garbage results are acceptable. Apparently, the hard sciences are just easier to do reliably and always have been. There certainly hasn't been an effort over the course of centuries to ring as much reliability out of the methods as possible involved. It seems to me that so much of this is in part the result of considering crap results like those with a P in the .7-8 as quality work, when really, that just indicates that there's potentially a lot more to what's going on and that there should be work made to get better results as ..75 is not that much better than 50/50. It's certainly not good enough to do much with. Yes, humans can and do vary, but that doesn't excuse t he attitude that there's no need to figure out ways of ringing out more reliability and more reproducibility from the test participants you can get. Sure, the results will never be as precise or generalizable as you get from physics or chemistry, but there's a lot more that could be done if folks were expecting more when they designed and executed the experiments.
LOTS of social scientists start with lit review, and form a hypothesis after years of study. What you are talking about sounds abnormal. Also the basic Anth as science is physical anthropology which ranges from evolution to genetics, biomechanics, forensics.
Yay, you did my suggestion. I was a grad student in Cornell and I had a class taught by Brian Wansink right before this story blew up. He came across as an “oblivious diva”, but tbh I think Cornell’s administration is also to blame for both not firing him and also not helping his grad students into new labs/research once he “chose to retire”
100x this. Exposing the academics who p-hack is one thing, another is asking the question: who was their department head for so many years and so many papers, and never bothered to do any quality control on the output of their own faculty. Once you dive into the data, it is often not too difficult to spot patterns of iffy science, especially if you are in the same building and hear the rumours about their postdocs refusing to do a certain project etc. etc. Quality control should be the job of the department heads, who in general should know the subject matter quite well and should be competent scientists themself. Now the only control is quantity control which can be done by the secretary of the department.... SO a shaming of the department heads of cheating scientists would maybe help to create an environment where science quality rather than quantity is rewarded.
At least Wansunk didn't get Cornell to threaten legal action against the people who drew attention to his dumb blog post. Like another "academic diva" I could name.
You scroll quickly pas Wansink's response to the guy asking if the post was a joke. He says that he wishes his tutors had pushed him to to this when he was younger, as this way he would have published more and wouldn't "have been turned down for tenure twice." So beyond this doofus spilling the beans, isn't it also an indictment of the whole field? I mean if he's bragging about it, it probably means that this is utterly commonplace and even expected in his academic circles, no?
It is as the institutions care more about outward trappings than the essence of science, and the people in charge who don't actually do especially are like this, but also it's a language problem in that these frauds can use the same language as genuine actors when that's the only level the former care about
He doesn’t seem like one of the bad ones really. It’s more how he looked down on his employees. I don’t think it’s bad to look for what unites those who the effect does hold true for. He should follow up with another experiment to see if it’s significant or just a one off coincidence, too, but it sounds like looking for something, not fabricating it to me.
Correct. Wansink's mistake wasn't the fraud, it was failing to realize the fraud was in fact fraud and needed to be kept quiet. All of his colleagues will continue to do exactly what he does, but they know better than to open their mouths
The biggest positive change for academia (imo) is for journals to publish papers where the researchers’ hypothesis was not ultimately supported by their data (either there was no findings either way or even if the data showed an effect completely different from what the researchers predicted). I know that this is less exciting for the news media but when science is so driven by exciting results and leaves out the “boring” stuff, it heavily incentivizes dishonesty in researchers.
This requires the invention of an incentive. Say I run a journal. I want it to be a prestigious journal. I convince scientists that it is prestigious and that they want to submit their papers to my journal by maintaining a rate of citations. Lots of people cite papers published here, so publish with us! If I now begin accepting null hypothesis papers, I am accepting papers that will tend to receive far fewer citations. Makes my journal look bad. I run a business, and you’re proposing a money losing idea. Maybe journal could be required to have at least a certain percentage of their papers be null hypotheses? So it won’t punish the journals that are encouraging the honest science?
I sort of agree with your publication stance, but being able to cite a failed experiment based on "x" data and "y" hypothesis, can work wonders for metadata studies. Variables have to be accounted for, of course, but it is far easier to get grant and research money when you can say, "it's been done this way and that and failed so many times-we should look at this other hypothesis instead."
I reviewed a paper for a high impact journal once, where imho the graphs indicated results almost completely opposite to the text. This would not be obvious on a cursory read since classification of the data and the fits obscured it. I wrote this in my review but still saw the results in a less prestigious journal about a year later. To their credit they communicated more uncertainty in the final published product, but the motivated analysis was still super clear. I suspect I also engage in at least some motivated analysis, it when it becomes sufficiently widespread that a large community is chasing the same expectation it can get out of hand really fast.
I remember when this dude's shenanigans got revealed. Strengthened my resolve to ignore all "cutting edge" Psychology findings. If it holds up under rigorous scrutiny, I'll hear about it later. If it doesn't, I never needed to entertain the thought.
Something tells me there's no such thing as "cutting edge" psychology findings. If someone comes up with surprising new findings you've found your cheater.
Interesting. Books and philosophies (and music) that are "trending" really aren't in my personal radar till after a few years. Time separates the wheat and the chaff.
It's not just him. It's the entire academic system--they're the ones who TELL THEM to do that. My husband is a mathematician, & they wanted him to publish papers, regardless of how good they were. This baffled my husband, & he eventually left academia for the public sector. It burns me up that honest people like him had to leave while con artists flourish.
That's good tho. Publish data even when the results aren't what you expected or disprove your hypothesis. It means people on the future will know this has already been tried and they can make better decisions about what to try next
I don't think the garbage p hacking has any bearing on the field of mathematical publishing. Theorems are either sound or unsound, there is no 'wiggle room'.
Thats bad tho... The og comment never said that said paper disproved his(husbands) theory’s but that they were just bad - maybe too short to give a real answer on the question/problem that this paper was about, or the funding was not sufficient so that some results are heavily influenced Imagine posting a paper about how vegetables aren’t healthy and the study itself had a duration of one year, and only enough money for 2 test persons and at the end everyone would think „damn I always thought vegetables were healthy but this paper said otherwise guess I was wrong all along and of course there’s no need to remade such a study cause it’s already had been done, maybe bad but it has been done“ And at the end the test person who ate no vegetables didn’t smoke (privately) and the one who did smoked 4 packs each day but because the project hadn’t enough money and this 2 were the cheapest option you choose them (Or the Sugar/health sector/industry even encouraged wrong studies about it cause it boosts there regenue) This would be absolutely shitty and acutally happened in the past Like Cholesterin (fat in red meat etc) is soooo unhealthy and at fault for most heart diseases and not sugar/high fructose syrup and because it was backed by studies no one fought it for decades because as you said people already „tested“ it
How do you fix the system? LET PEOPLE PUBLISH NULL RESULTS. the publish or perish system combined with having to have something significant to publish incentivizes people to "make the data significant" however they may. If a study can be published even if not "successful", this behavior will likely decrease significantly
This would also need a way to easily look up the set of total results for a particular topic so researchers can get a better understanding of why something has null results vs something else having "significant" results.
@@kuanged I suppose grant money could be bestowed upon truly life changing scientific results that would bring us closer to solving world problems. Instead of just giving grants to studies that just confirm biases, maybe be a bit more prejudiced about which studies it goes to. OR, the government could be more strict. Require the results of a study to be able to be reproduced by groups not connected with the primary study. There are so many ways to overhaul a system that seems to reward shoddy work. 💁♂️
Null results shouldn't be seen as bad imo. Its definitely possible to have a null result thats just as notable or surprising as a positive or negative result.
I think the recent surge in anti-science rhetoric has forced the scientific and academic community to FINALLY crack down on bad faith actors like Brian. How can we convince society that science is trustworthy if the establishment keeps on letting this kind of nonsense slide?
This is actually absolutely hilarious. He seemed to have been almost oblivious that his way of conducting research isnt legitamte at all, but that surely cant be right?
Not seemed to be, was; he starts his blog post by saying p-hacking is bad, but a deep data dive is good, so he clearly thought what he was doing was not p-hacking.
That's how a lot of these frauds are, willfully ignorant and only dealing with the superficial language level of reality but they game the social system for status really well
A few notes from a statistician. 3:33 Having data before a hypothesis is not *necessarily* bad science. It just needs to be understood as retrospective research. This happens a lot in medicine and is a cheap way to see whether moving forward with a larger, prospective study would be worthwhile. 4:20 One issue that perpetuates P hacking like this is referring to negative studies as "failed" studies. If we don't find a link between two things, that is as worthy of publication as finding a link. 5:12 This caused me physical pain.
for a statistician you seem remarkably uninformed. with 90% errors doctors cannot successfully pass a basic test on probability and statistics. "Hypothesis Testing as Perverse Probabalistic Reasoning" Westover, Westover, Bianchi BMC Medicine. Poor patients.......
A lot of the issues with p-hacking would go away if we shift to Bayesian methods. Hypothesis from cherry picked data would have a lower prior. It's mind boggling that in 2024 we're still talking about p-value
In medicine, we often do retrospective studies because doing so would violate ethical considerations to do it prospectively. This is probably the most common reason rather than "It's just easier". However, while we have the data before the hypothesis is developed, we expect the hypothesis to be formulated before evaluating the data. However, one can easily cheat so it probably happens often.
@@DocPetron This. Exactly. The 'answer is in the data' is bad science. Which in a data obsessed world is one of the reasons why science is going wrong. The answer is in the hypothesis!
My experience in academics was that there are enough people who are willing to simply follow the incentives that it becomes imposssible for the ordinary person to participate. You either have to be a workaholic borderline genius or highly unethical. For example my attempts at collaboration were quickly rewarded by a professor with a paper-mill getting a grad studuent to publish my idea without me. I learned to always have a fake idea to tell people I am working on and keep my real idea secret until its formed enough to get it onto the public record. People publishing stuff that barely worked was routine.
A good friend and former student (I had her as an undergrad) lost 2 years of her PhD work, delaying her own degree, when she and her fellow grad students turned in their neuroscience professor/advisor for manufacturing data. After working for several years in her field of expertise, she has decided to return to her home country and go back to being an accountant. She has had trouble finding work that doesn't leave her in poverty. I've wondered how much of that trouble is attributable to her related. to her reputation as someone who won't put up with fudged data?
"You either have to be a workaholic borderline genius or highly unethical." Or a mix of both. My PhD supervisor was a genius AND highly unethical. He p-hacked his way through his entire career, because he saw that was the way to game the system (see: unethical).
Preregistration of studies seems to be a good start. I wonder if this could be pushed further into some form of open journaling. Researchers would not just log their intentions but also their important steps. You could see how the value of n changed over the history of the study and could see their justifications for excluding samples. This could also be a good way to semi-publish negative results. Something like "tried X but did not work" could not only be valuable information for other researchers but also ease the pressure to publish positive results since there is still a public display of research being done.
BTW for anyone watching, there is literally NOTHING wrong with "cutting up and slicing" the data to look for relationships and results, so long as you test whether those results hold on a different data set, ideally one gathered seperately.
Precisely. Looking for interesting stuff is how we generate hypotheses, but finding a weird artifact and then creating a reason post-hoc is stupidly weak argumentation.
I was just looking for this comment. It's an entirely valid method for identifying possible future hypotheses for further research, but shouldn't be considered conclusive in and of itself.
I was wondering about that. I'm not a scientist, but it seems like looking for patterns is what scientists _should_ be doing; albeit that should just be one step towards a future study, rather than trying to retroactively fix the previous one.
> "for anyone watching, there is literally NOTHING wrong with "cutting up and slicing" the data" J. B. Rhine of Duke University used this method -- he picked the results that best agreed with his ideas about ESP. After he passed on, his spectacular confirmations for ESP evaporated once someone put the less exciting (and sometimes negative) results back into the data. That "someone" is now called a scientist.
Rutgers University Psych department has a professor named Alan Gilchrist who told grad students that he would not sign a dissertation unless it produced results supporting his model. The department kept him for decades even though he worked with grad students but never graduated a single one until 1991 when I defended a thesis to the other three committee members. It was the first PhD granted against the recommendation of the faculty advisor. He made sure I could never get a job with my PhD. After several decades, I sent the PhD diploma to the landfill
Does any one person have such power to deny someone get a job in his field, anywhere in the country? And is the alleged victim so important to him that the director would take the trouble?
We need a journal of nonsignificant findings. Somewhere where the status quo, if supported, is documented. That would go a long way to giving credit to researchers for the work that they do when when the results aren't "interesting".
9:20 Absolutely INSANE that this guy just plainly put "hello, please do scientific malpractice on this paper and get back to me" in writing on his university e-mail and he didn't immediately face disciplinary action.
It’s pretty much institutionalized in pharma, finance, food science, energy, politics - likely many others. If someone told me this Cornell guy used to work for Kraft of similar it would all make sense.
The root problem is with institutions incentivizing constant success. That's not realistic. They will only give grants and coveted tenured positions to people who are consistently churning out desirable results. But that's not realistic, the world doesn't work that way. It's like business investors demanding that profit margins must ALWAYS be increasing, every fiscal year, forever. As if inflation wasn't a thing and there were unlimited resources, manpower, and disposable income to throw around. It's delusional thinking.
There is a slippery slope between an explorative study and p-hacking. It is legit to make an explorative study about a dataset to look at it in different ways and generate different hypotheses for later validation on another dataset. E.g. Look at small dataset A, get an idea of what could be the case, then collect large independent dataset B to test that hypothesis. After all this is how technically all hypotheses are generated, at the very moment you decide to focus on one hypothesis to test you exclude several other hypotheses you could have had. Publishing the step of looking at dataset A separately is not p-hacking. Similarly, you also can slice up dataset A and test for multiple things, but then you have to correct for literally everything you looked at for multiple testing. I.e. that something you find is significant the p-value has to be smaller and smaller the more things you tested for for it being still significant.
@@doctorlolchicken7478 Say you work for a car insurance company and they ask you to find population segments where your rates are mispriced. The data is fixed, and there are endless possible hypotheses (age, gender, car color, number of previous accidents, etc). How do you proceed?
4:20 Why was that called a failed experiment? One that produces results opposite those expected has not failed. An experiment fails only when it fails to give an answer, or if something rendered the conclusion invalid, or something similar. 9:00 P < 0.05 is very close to two standard deviations away with a normal distribution.
7:22 I think the best bit about this is that he was told how bad this looked, given an opportunity to say it was a joke, and he still replies "I meant it all as serious"!
Awesome job exposing and shaming these frauds! I think that is part of the solution to fixing the academic system. Because if nobody exposes and shames frauds, there is almost no downside to faking your data, and many people do it. On the other hand, the more fraudsters are exposed, shamed, and fired, the less enticing it will be for others to commit fraud. Keep up the good work!
a single rejection of the null hypothesis establishes and proves nothing. It just casts suspicion. All inductive or statistical hypotheses must be established by repeated observation and testing, even the absolute rejection of the null.
I wish we could reward research with valuable or interesting null results. Knowing that there may not be any relationship between the price of food and satisfaction should be worthwhile on its own. We should also incentivize replicating studies. I'm sure the worry is that people will not make new science and continously repeat old studies or papers about null results without suprising insights. But we've gone too far in the other direction where researchers are heavily incentivized to fake results, chase fads, and churn out lots of low-quality papers that are hopefully "new" or "interesting" enough to keep the researchers employed.
For me it was interesting. O thought that paying more would trigger some kind of sunk cost reaction but it looks like humans are more resiliant than I thought Which is great
Pete, i seriously love your content and suggest it to all grad students or aspiring researchers i know. you deliver the content in a unique way and the focus of your channel is so damn important. people need to know!!
Thank you, Pete for sharing a refreshing point of view - what can we do to incentivize PROFESSORS to be truthful? - how ironic that we have to even ask this question - and good for you to not only share the problem, but move us toward brainstorming solutions - this is the kind of thinking that can help people grow toward healing and restoration.
Just like there’s “jury duty” there should be “replicating analysis duty” (additional to reviewing). Right now, only competitors have incentives to critically analyze other’s research. When they do it, there is suspicion of bias (which sometimes is real). The idea is that if you’re on duty, you have to endorse, criticize or say you cannot do either.
remember that by definition, a p value of 0.05 is 5% likely to happen just by pure chance even if your hypothesis is wrong. If you test your hypothesis on 20 subsets of your data, you might find one that's p
Actually, the p-value starts with the assumption that your hypothesis is wrong (i.e. the null hypothesis is true) and tells you how frequently random chance would produce the results you have observed. So a p-value of 0.05 suggests that random chance would produce the observed results once in 20 identical experiments. The p-value tells you nothing about the probability that your results are due to chance. That is a Baysian question and p-values are frequentist statistics.
I think there are two ways. Both are not easy. First is change the current culture or system of rewarding (and punishment) in research. Our current culture celebrate and reward researchers who work on trendy topics that break new grounds, while ignoring the majority of hardworking researchers doing normal (Kuhn's terminology) science. Without the latter, the former would not have been possible to rest on firmer foundations and to have practical applications made from them that benefit humanity. Both types of research are important. Second is change or emphasise the good values that individuals hold themselves accountable for: commitment to the truth, service to others, do no harm, sacrifices, etc.
The wild part of the first dataset slicing example is that there are ways he could have looked at the data more ethically. If he did not want to just publish the null result, I could understand trimming the outliers and publishing both together in a transparent way. Or, he could have had the student investigate the data to try and inform the development of a different hypothesis for a follow up study as opposed to butchering the data for the sake of significance sausage. Null results are underrated anyways. A good experiment can still produce profound null results. One of my favorite papers I've had the pleasure of working on is largely a null result because of the value getting that result still provides.
As a former graduate student in statistics, this was very painful to watch. I believe that qualified statisticians should be consulted for statistical analysis of scientific studies, and that statisticians should be very involved in the peer review process prior to scientific journal publication.
Getting statisticians involved is a great idea, and unfortunately, won't work under the current framework. Delaying publishing, splitting authorship, and working with someone on equal footing whose primary job is to find gaps in my research?
@@kap4020 Finding gaps in your research is the primary goal of peer review. Done poorly in many journals, but most of the top ones take this seriously, fortunately. I submitted to Cell recently and the reviewers found many 'gaps', for which I am very grateful, because I actually want to write as accurate and scientifically rigorous a paper as possible. This is how science should be done.
I have mixed feelings on the idea because depending on the research one might need experts in other fields to vet one’s work. The lines between science and engineering subfields are blurred. So, the list of experts needed grows quickly. Plus, I believe most academics won’t bother to check someone’s work unless they are getting some of that research money and/or included in the author list. I can only speak for the physical science, and I know that scientists and engineers could brush up on their statistics. Either they need to take more statistics or take the standard introductory sequence that statistics majors where the fundamentals explored in more depth. I regret taking one semester programming course that a lot of non-computer science majors take instead of the two semester sequence that CS majors took. But statistics can only go so far, bad data is bad data.
If the original hypothesis doesn’t work out in a study, but a new revelation is found, why can a scientist not roll with that? Or must they just do a whole new study with new parameters to ensure it’s valid? Just a non scientist wondering!
A great question! Going off the cuff here but I think it comes down to what that p value really means. p = 0.05 means there's a 5% chance the effect is coincidence, so if you slice up the data 20+ ways, you're essentially hunting for that coincidence and then not telling people how many times you jumbled your analysis so it seems like the new "result" is what you were looking for from the outset. You're right that there's a thread of a good idea here, which is, it also might not be coincidence, but you would have to do a new study with new data that you didn't slice and dice, to prove that. Edit: scare quotes on the word 'result'
The thing is, that this revelation can be due to a random chance, so you would have to account for all possible revelations when calculating statistical significance, and you would need to be honest about what you did in the paper (which he wasn't) for the calculation to be done correctly.
There are also ways to publish exploratory studies, but it has to be done in an honest way. If you want to describe a phenomenon/population/disorder/etc that is not yet well know, and do not have many hypothesis about the results, you can make a research where you gather a large number of data, then analyse them and look at their exploratory relationships, then publish a descriptive study where you say "we don't know a lot about this yet, here are some first explorations". The key part, however, it about being honest about it, and not choping your dataset in tens of subgroups just to find some effects. In truth, real exploratory research quite often have some form of interesting results, even if there are no group differences, or no special new effects, because we still get new knowledge about a phenomenon or a population. But if you design an experimental research testing something pretty specific like the effect of paying half-price on the satisfaction given by a meal, if you base your hypothesis and method on flawed, p-hacked litterature, it's not really surprising that nothing will come out of it "naturally". So yes, @catyatzee4143, it's true that new revelations are often found randomly, and it's exciting when that happens! But it should always be said if that it is the case. Scientist should not try to pass these discoveries as something they always thought was going to come out, based on their vast knowledge and great research expertise (lol). And more importantly, as @MR-ym3hg said, there has to be follow-up studies where researchers try to replicate the results and dig deeper into them.
Everything that everyone said above 💯 The problem with finding coincidental findings in an experiment is exactly that - they’re coincident! These kind of findings are specifically atheoretical - the experiment was not testing for the random finding, it just kind of jumped into being. If you want to test for the conincedent finding later, and you replicate appropriately, that’s great! But you can’t say it’s a genuine finding until it’s tested. TL;DR: real science is resource intensive, and coincident findings are not “bonus efficiency”
I think the Universities should make sure that when a PhD is graduating the candidate knows at least a minimum of research methodology and research ethics. It probably differs from institution to institution, and discipline to discipline, and maybe it has gotten better over the years, but when I graduated (in the STEM field) in the early 2000s, these aforementioned skills were optional, not compulsory. And I am not even sure, I even could take a class on research ethics. Now, years later, at the agency where I am working, an ethics board has recently been established and we have had courses and workshops about these issues to guarantee the quality of our research. Hopefully, we as a global research community are maturing. Good work Pete, for bringing this up.
My institution has research integrity seminars as part of the mandatory training for all new PhD students, I think is becoming more common but it won't fix the incentives problem.
@@RichardJActon Yeah, incentives is an additional dimension. Where I work now we are only vaguely affected by the "publish or perish" paradigm. It is secondary to solving the problems at hand of our clients, which are mostly other agencies.
Good stuff. Just because you asked for suggestions, I think you could maybe do a long episode with Chris Kavanaugh and Matt Browne from Decoding the Gurus, they do Decoding Academia episodes that are great. They had a panel that talked about open science initiatives, etc. Can’t seem to find it on UA-cam. Thanks for all of your hard work.
Anyone noting the satisfaction of postdocs paid $8 and those paid $0? Those paid $0 seem to be more willing to participate. Is this consistent with his hypothesis?
No. Postdocs and PhD students used to get paid much less and corruption was not nearly the problem it is now. The younger generations have all grown up in an environment where cheating and stealing (i.e. digital music, movies) carry no negative connotations, this is why there are so many corrupt people now.
I have a bit of a problem with some of what Pete Judo says. To be clear, my objections are not meant to defend Brian Wansink. First, Pete calls the initial Italian Buffet experiment a 'failed experiment' because the result obtained without slicing and dicing the data was that there was no relationship between customer satisfaction and meal price.A study that finds that there is no relationship between satisfaction and price is in no way a failure. It simply produced a result that disproves the hypothesis that satisfaction and price are correlated--in the specific case of moderately priced Italian restaurants (in central NY state I imagine). Pete suggests the data set should have been "put in the file drawer". That makes no sense. Data showing that price and satisfaction are not correlated (at least for customers of moderately priced Italian restaurants) ought to be published--that conclusion would have been as interesting as the conclusion that satisfaction and price are correlated. If Pete doesn't bother to publish data that shows there is no relationship between two things that people have hypothesized might be linked, then he's failed to enlighten scientists in his discipline that there is no such relationship. Notice I've used the word 'failed' there. Second, from what I could see of what seems to be Wansink's first email (at ~5:15) to the Turkish student about slicing and dicing the italian restaurant data. If all Wansink had done so far was look at the bulk data and found no relationship, which is what the email suggest to me, then it seems to me that you'd certainly want to look for weird outliers, and you'd certainly want to know whether lthe satisfaction versus price relationship might be different between men and women. If that specific example is 'Textbook P-hacking' then I'm all for it. Where things went from Wansink's initial suggestions with that italian restaurant I don't know because Pete doesn't explain. Again for the record: I'm not defending Wansink. I just think this part of Pete's discussion is problematic. Does p-hacking happen? I'm sure it does.
It should be industry standard for authors to first submitting hypothesis, methodology to a journal, then being accepted on the basis of the quality of this and then be garanteed to be pulished no matter if the results are positive, inconclusive or negative. With todays practise the p value isnt very meaningful even if results arent tampered the distortion happens anyway as negatives are almost never published. I am astonished this wasnt done yet even tho this isnt a new idea at all. Its such an obvious flaw in todays science and really not that hard to fix.
This concept is known as 'registered reports' check out the centre for open science's pages on it for some more details. Quite a number of journals now accept submissions in this form (even high profile ones like Nature) but they are not yet the standard - unfortunately. There is also some talk of integrating this into the grant process - i.e. funding at least part of a study based on the proposed hypothesis and methodology. This avoids too much overlap with the review of grants and the study registrations as they may otherwise end up being partially redundant.
@@RichardJActon I wonder if this could be pushed further into some form of log. Where you not only register the study but also update the most important information like the value of n. Each change of n would require a justification that could be verified by editors (if they bothered to audit).
@@samm7334 In short Yes. Using git and something like semantic versioning, major version number bumps only occur after a review, minors for small corrigenda/errata, patch for inconsequential stuff like typos, tie this in with versioned DOIs. Submit papers as literate programming documents where all the stats and graphs are generated by the code in the document. That way to get the final paper the this document has to build and generate all the outputs from the inputs as a part of the submission process. The method submitted with the registered report would ideally include running code on example data. Where possible for both a positive and a negative case. Then once you generate the data all you do is change the example to the real and re-run the code. Then you can add another section incidental findings it needed but it's clearly separate from the tested prediction.
As an academic, you MUST publish whatever, and a LOT, in order to get a job or a scholarship. And, since readers only "trust" what they read and have no critical thinking, you can only scape that by being an outsider and do research by your own. Brian Wansink is not the problem, it is the system itself, it is the univerity as a whole, and also, people who "trust science" (that is an oximoron) or "trust" whatever is published in a "trusted" journal. And also it is problematic to think that you are a better researcher, just because if you have published X articles in a Q1 or Q2 journals. Academy is broken. Thank God internet exists, there are many excellent stuff out there, in public repositories, or foreign journals. Btw, this channel is cool, i have suscribed
Common practice when I was studying was, for lecturers to force us to use source material, published by writers I'd never heard of. Now, being a bit of a psychology nerd, I'd read work published by famous, unknown, and infamous researchers, simply out of interest. So, being presented with quotable 'authorities' I'd never heard or read of made me curious, so I did some digging. Long&short was, I found that there was a long established mutual backscrathing circle, where lecturers would plagerise researchers work, write poorly written books built on poorly understood psychology, without crediting the originator, or doing so in such vague terms, that it was more like a polite mention than credit, each lecturer in the circle, then forcing their students to use other members of the circle as their quotable source. Not surprising to say that, the majority of my fellows came away with the understanding that cheating is fine, so long as you don't get caught, or have the power/connections to squash your accusers. Perhaps starting at this basic level might be the way to go?
Pete, this is the third or fourth presentation I've seen from you, and your sincerity comes over very convincingly, as well as your superb diction. It is only in this film (the others dealt with the Gino scandal) that you turn to the possibility of a systemic problem, on which I agree with you. Clearly, solid data and reliable experimental work are the bedrock of your quest to clean up research science, and I cannot but suppose where problems might lie. And that is precisely the kind of surmising that you want to eradicate. I wish you good fortune and will follow your work with interest. You have now earned a subscription. Well done.
I think you have to reform the attitudes of university administrators and professors. Too many plagiarized and hacked their way into positions of authority and very few are willing to give up that power.
Maybe if there was a way to set up a system that rewarded positive replication of your work by only unaffiliated researchers you'd fix the system. No clue how you'd set that up or what it would look like, but if somehow you could create a reward structure around that, you'd fix science in general.
"How can we ever get anything published if you're going to be so damn ethical?" And. "What's P-Hacking?" Nothing like outing yourself. I guess we just need to keep a closer eye on this kind of thing.
Fascinating and distressing. And yet we have also the opposite problem as well. I have a colleague with an interesting yet speculative paper in physics. The physics and math is complicated and correct, but not in line with the mainstream. He cannot get it published in any journal. How do peer-reviewed journals publish trash and keep out interesting work? What criteria do they use? It's a rhetorical question.
Pete, how about a video talking about how the journals make money. Researchers have to publish to justify their existence. Journal ask peer reviewers to work for free, and either authors pay a charge to get it published or the journal requires payment from readers. Researchers sign over their copyright and often pay for the privilege
4:15 It's not even right to call it a failef experiment. It would be failed if the experimental procedure was flawed or they had to stop partway through. A successful experiment may give you a negative result. Your language here shows how ingrained this attitude is in science; even someone disillusioned as you are is still talking this way
Two things I think could've been added to this video: (1) An explanation for non-scientists as for why the salami-slicing of data is problematic (I don't think it's obvious to a layperson what exactly is the problem with this approach), (2) the notion that this is a decent approach for an explorative study to develop ideas / hypotheses, but that you then need to collect a new dataset specifically for this hypothesis to test whether the hypothesis holds without p-hacking.
@@johntippin here you go: there is randomness (noise) in every measurement you make. even clearly one-sided empirical observations can happen by chance in a given sample. say you compare one variable between two groups: any ratio of the variable between the two groups that you measure can be the result of random chance. luckily, we can, using math, determine the probability of receiving the observed outcome based on pure chance alone (i.e., without there actually being any difference between the two groups). this probability of getting the observed result based on pure chance alone is the p-value. thus, p=0.05 means there is a 5% chance that a certain observation was the result of pure chance, or in other words, there's a 95% chance that there really is a difference between the two groups in that variable. this is often taken as the criterion to accept a hypothesis. the problem with salami-slicing is that you now perform this test not for one variable, but for many variables, and you take the 5% criterion for each of them. thus, salami-slicing is another word for doing "multiple testing". If you look at 100 different variables in the dataset, and you apply the 5% criterion to each of them, then on average you will find 5 variables that show a "significant" (p
If journals required authors to provide their original data, taking into consideration reasonable privacy concerns, I bet it would reduce p-hacking. Some years ago, there was a physicist at Princeton working in solid state physics. He was denied tenure. He then went on to publish a slew of papers, and received an offer as a full professor at the University of Illinois. His comment on the process was priceless: “Those bastards can’t read, but they can count!”
I work in Data Analytics sphere, and it is constantly the same issue everywhere. 1st - you have the hypothesis you want to prove, and 2nd - you find the piece of data which can support and prove you idea (and decline all data which not support your hypothesis)
I think some percentage of all research funding should go towards a kind of science police. There are already policies in place for helping out underrepresented minorities in science. I think we should add "honest scientists" to that list😁
I advocate for a version of this that I call 'review bounties' which is modeled off of the concept of bug bounties in the software development space. It is designed to solve two issues, the problem of unpaid reviews and the lack of incentives to find errors in published works. Instead of an article processing fee someone wanting to publish posts a review bounty and someone with a role like that of a journal editor in the current system arranges the review and to host the resulting publication for a cut of the bounty, the reviewers each get a cut of the bounty if the editor thinks their review is of sufficient quality and a portion of the bounty is retained as a bug bounty. If someone can find an error in the paper they can claim the remaining funds if some combination of the reviewers, editor, and authors agree there is an error. If the bounty is unclaimed then it accrues back to the authors. This also potentially allows grant awarding bodies or other parties interested in the quality of a result to add to a bug bounty pot on results. This would allow grant makers to incentivize correctness/quality in publications arising from their awards they they currently cannot. It could also let companies / investors with a potential financial interest in a result e.g. underpinning the development of a new drug to incentivize additional outside scrutiny on results so they can avoid sinking money into a project based on a flawed study.
The problem is that the people who are already supposed to do this are bad. In fact those who can't do in academia are disproportionately bad same with this guy. You would basically need to force certain people into the job and have an institution that has good incentive structures though one of the clearest ways of having a good system is get rid of the bloat and let a good system emerge. You get this thing where people think because children are taught to read at a certain age through schooling today, they were incapable or learning it in the past. So people act like you can't have science without all this top down regulation and bureaucracy when in reality it's the other way around where all the bureaucracy is a sign that an area of society has become high status and therefore people are trying to get in on it for personal gain.
Thanks to comments explaining the term "p-hacking". Accelerates my crumbling respect for academics done over the internet, as well as anyrhing else done over the internet.
Hey, nice video! Have a question; not being disingenuous, honest. What's actually wrong with p-slicing, if the slices are big enough to be interesting? Or put another way, is it so wrong to start with data and look for patterns that you hadn't anticipated?
@@desertdude540 Does that really follow? I mean, other teams should test the theory later by getting new data, but in the case we saw in the video (pizza buffet goers) why can't the researcher say "Here is the data I gathered and here are the conclusions I drew from it"?
Carefully chosen data can show anything we want For example there're disproportionally less atheists in American prisons than Christians. Does it mean that atheists have less criminal tendencies than Christians? Not really Atheists tend to be wealthier and more educated. Both wealth as well as education are related to lower incanceration regardless of someone's belief system. On the one hand wealthy people don't have to be involved in crime in order to survive. On the other hand a well-paid lawyer allows you to avoid prison time whether you're guilty or not Not to mention that participating in a religious programs can give you an early release for "good behaviour". It's more worthwile for an atheist to lie about being a believer than for a believer to lie about rejecting their God
It's something of a statistical problem. If you look for one specific relation, there's a probability it can occur by chance in your sample even if that relation doesn't hold generally. That's what the p-value is about, how likely is the data to be this significant by chance. The problem with p-hacking is that if you keep looking for more and more things, one of them is likely to occur by chance. It makes the study much less statistically significant than it appears.
I wanna send this video to all my old research methods profs to show their students as a worst case example. Just insane. Every last bit of what we are taught NOT to do, nevermind that he exposed himself!!
You are quite scintillating in your delivery. You talk fast, but you analyze faster. Keep going. It is refreshing to see what happens when science becomes a con job instead of a way to a better future.
Researchers can be very competitive, which leads to discouragement if they think they can't win whatever game they're playing (get published). This is made worse by sponsors and journals only promoting positive results. There needs to be a way for null results to be published and celebrated, because knowing if an answer is wrong is just as important as knowing if it's right.
in Paul's reply to Brian: "...if I'd been driving lots of projects fwd that a more experienced mentor was directing..." reading between the lines, going along to get along is the key to success in ego driven fields. conversely, if you rock the boat with honesty, you'll be thrown overboard. I don't miss my career in academia even a little.
Around 5:30: Just to know, would slicing the data like that to try to find a correlation or something, then making a study (with a proper sample) that focuses on that be alright? I feel like it would.
I had never encountered the term "p-hacking" before, but I had certainly encountered the concept; we used to call it "cherry-picking." I am not a scientist, merely an intelligent layperson, but it seems to me this brings up several issues. 1. There should be (and I'm pretty sure there are) two kinds of experiments: • Experiments designed to test hypotheses, which are the kind we're talking about here. • Experiments designed to answer questions. Say, "We've seen a sudden rise in deaths from (fill in the blank); what is causing it?" Good question, and one that needs answering, but we don't have enough information yet to form a hypothesis, so we need to collect information (AKA "data") that will suggest a hypothesis we can then test. Am I wrong about this? Are scientists ALWAYS suppposed to begin with a hypothesis? 2. Scientists are judged by the number of papers they publish. Am I alone in thinking that this sounds lazy? It actively SELECTS for p-hacking and discriminates against ethical scientists, all because it's an easy statistic to get; it requires no one to dig deeper. I would be very interested to know what happened to the ethical grad student Wansink shamed in his blog post. 3. This is related to my second point: Apparently journals aren't interested in publishing null results. I can certainly understand this; if they did publish null results they could easily be overwhelmed. But null results are important; a list of what doesn't work is a good first step to finding what does work. And if scientists had a way to publish their null results there would be less pressure to engage in p-hacking. Perhaps someone should establish a new journal that looks only at null results.
I feel like finding out that people's opinions aren't based on how much they paid is pretty significant. Maybe needs a few more studies to confirm it, but that's pretty useful data if you're working out how to price food service.
I feel sorry for this daughter being pulled into the vortex of bad science. I hope she realises how she is being duped before too long. But it seems she is already being groomed by her father to go down the same path of bad science, submitting his guided research projects into school science fairs etc.
Sadly, even obviously ghostwritten papers in obviously worthless pay-to-publish journals can be a significant help in getting into certain programs at certain colleges. Some admissions offices really like to see "published research" without thinking much about what the chances are that a high-school student would primarily by his own efforts generate worthwhile research of general interest publishable in an academic journal. In this sense, it may sadly be the case that for an immediate purpose she'd being helped, not duped.
By no means is this an excuse, but rather this is meant to serve as a precaution. Having met Brian, he was more of an activist than a scientist. He believed in the cause behind his research (choosing healthy foods over unhealthy foods), which lead him to rationalize his methods. To spell out the cautionary part of my comment, and perhaps the obvious, we need to be especially vigilant for bias and methodological shortcuts when the research is aligned with our beliefs.
@@KaiHouston-m6j that is not what I said, nor do I believe it is what Brian would have thought he was doing. Reread what I wrote, or here read this briefer explanation: as an activist, he may have been naive to the (strong) role of bias in his method.
@@KaiHouston-m6j You seem very angry and this appears to be affecting your reading comprehension. I am not making excuses for him. Let me try again, using even fewer words this time: a scientist who is an activist may be vulnerable to bias.
So faking results is "OK" and get out with the concern bullying. If you love frauds so so much, ask your self how much of what you believe is truth, and how much is BS. Then look in the mirror.@@boundedlyrational
How to make status in academic correlate more with scientific quality and therefore makint predictions about immanent reality? A very tough question. A good place to start would be to destroy the bloat and let standards emerge from those who do
My wife is doing her PhD in clinical psychology at Notre Dame...has about six months left in a 5 year program...and has straight up said that most of her colleagues are manipulating data to support radical feminist theories. Mostly women but a surprising number of men. They're basically trying to validate an ideology that cant stand on its own by hijacking the science itself, which is inaccessible to most average people. Its disgusting. Another woman in her program had submitted two case studies to the U for review and was told to shut up and then had her funding for her own lab slashed. She sent a letter from an attorney saying 'make this right or were suing you', with the primary goal to force them and the study's authors to reveal what they knew/know during the discovery process if indeed it goes that far...we'll see what happens but evidently ND has restored all of her funding and is basically trying to fast track her to professorship even though she hasn't completed her program yet, which suggests that they are kind of freaking out.
"They're basically trying to validate an ideology that cant stand on its own by hijacking the science itself" In other news water is wet... ;) "which is inaccessible to most average people." Average people kind of know anyway what is going in softer sciences, if anything it leads average people to also distrust legitimate research. As side note, on one channel dude was already showing analogies how in Great Britain in aftermath of reformation universities lost their fledgling scientific prestige and become effectively childcare for higher nobility with high emphasis on the dominating ideology which was Anglicanism at that time.
I wonder how long this has been going on.. even with other ideologies. Something needs to change with academic research. I think we would feel sick knowing the extent of the lies we have been told
The problem is that "clinical psychology" is not a 'science' by any stretch of the imagination. You wife is getting a PhD, I'm sure it is very hard work. but that field is not a 'science'. The problematic behavior you describe is a very good indication that it is not a science. Scientific results are reproducible by anybody not just your political ally. when your 'science' require you to be a 'believer' to see the effect. for your 'science' to work.. that is called a scam. Anybody can try to step over a clif, and no matter your politics or religion: you will fall. Gravity does not care what you think. Similarly, Biology explain clearly (and has for more than a century in great details) that sex is determined at fecundation, and not 'assigned at birth'
I *love* the fact that you give props to the author of the source material that you based this episode on. It's almost like you're being... intellectually honest!
You nailed the root of the problem near the end - quantity is rewarded, while quality often is not. A researcher who publishes 10 mediocre papers will be rewarded more than a researcher who publishes 1 excellent paper.
as well as bad pharma it's also excellent - Ben was great on this stuff, he seems to have been relatively quite of late though, I wonder what he's up to these days.
Subscribed. Looking forward to learning more. P-hacking occurs as endemic in much of the "truths" and "facts" touted to support many current political stands.
For excellent book summaries, take advantage of a free trial and get 20% off by visiting www.Shortform.com/pete.
i left the system, because I felt that the organisation of academia, modeled off of catholic latin schools coming out of the dark ages, is fundamentally flawed and probably not going to be repaired from the inside alone.
I think hypothesis papers should be a regular thing, before considering a results paper. I also think their needs to be insentive to publish null results and punish antisocial journals with eugenic or manipulative focuses like the journal of controversial ideas. tends to produce bad stuff.
You can not "fix" the system. The system exists for the purpose of getting attention to scientific information, real or not. In the modern sense, attention is money and fame. Without it, the information does not matter.
There's a mistaken belief that if the data does not fit the hypothesis then the experiment has "failed." The hypothesis may be unsupported by the data, but assuming all the procedures were followed and necessary controls implemented, the experiment was a success as new knowledge was gained.
03:01 Not based off of. Based ON. You cannot base something off of something else.
Only one listed citation in the video description? That's bad science, and bad reporting.
When most people say "p-hacking", they mean something like "removing that pesky outlier to get from p
How about leaving the outlier in if it helps your case? That is also p-hacking. P-hacking is not manipulation of the data, it is biased manipulation of the data.
I always thought p-hacking originally referred to the process of enumerating multiple hypotheses *after* data collection. If you generate 100 random null hypotheses, 5% will have p-values less than 0.05 by chance.
That "pesky outlier" is a HUGE discrepancy in data. It's a little frustrating to see laypeople not getting it and saying such a small range is no big deal, but I counter by saying if you had diabetes and your doctor had a 5% chance of screwing up your dose so badly you went into a diabetic coma and died, is that still no big deal?
I believe we used to call that the snowball effect.
One starts with small things like removing the pesky outliers but every time we remove an outlier, another one appears. Keep taking out the outliers until there is a neat set that agrees with your hypothesis.
@@WobblesandBeananother way to look at it is that the p scale accounts for the outlier and is why there is a margin for error in the first place.
Removal of outliers give a greater margin of error for the remaining data.
As long as "publish or perish" reigns supreme and nice stories get celebrated while null results and replication studies go into the wastebin, stuff like this will be the main output of the scientific system, at least in the social sciences...
Sigh. This. All of this.
Publish or perish doesn't apply to identity hires - clearly.
...and don't forget the politics of the grant money$€£¥!!! You definitely cannot do any study that might give results that offends the politics of the "benefactors"!🤮
Science research abhors sloth. Justify your professional existence or choose another field.
Social Guesses you mean.
When I was working on my PhD in the early 90s, it was obvious to me the entire science foundation was broken. The get my doctorate, I needed a positive outcome of my model. I collected the data. I analyzed the data. I wrote up the results. Everything was biased to get a positive result. The same process occurs with getting hired and getting tenured. Positive outcomes are rewarded. Negative outcomes are ignored and cumulatively punished. I wrote a paper then suggesting that statisticians be paired with researchers to consult on methodologies (sufficient n to examine parameters, the right statistical methodology to examine data, etc.). As long as careers are determined by outcomes and researchers can cheat, they will.
With all the labs in school I’ve taken so far and putting together the data, it’s not often that I’d get p
I refused to cheat, defended my thesis and got black-balled for not being "a team player". American science is fake news.
Imagine what people will do when millions of dollars are on the line to get the "correct" results.
Not only that:
Many researchers, thinking they will help society, are subsidizing drug companies: "Pharmaceutical companies receive substantial U.S. government assistance in the form of publicly funded basic research and tax breaks, yet they continue to charge exorbitant prices for medications".
The medical industry abuses the taxpayer by overcharging, and takes advantage of both the taxpayer and universities by exploiting publicly funded research.
Why? It is by design. It is all to reap shareholder profits. The medical industry serves the shareholders and investors, and not the patients.
Proving a model wrong would advance science, by showing that something is wrong. Wolfgang Pauli 'That is not even wrong' (Das ist nicht nur nicht richtig, das ist nicht nur falsch!'.)
_"You pushing an unpaid PhD-student into salami slicing null-results intro 5 p-hacked papers and you shame a paid postdoc for saying 'no' to do the same."_ - quote of the week
“Shame” a paid postdoc. Spelling error.
The postdoc was very proud of his long hair and beard before Wansik shaved him as punishment
After he said that Orzge Sircher was a researcher from Turkey, I couldn't unhurr this enturr virdeo being rerd like the GersBermps merme
@@B3Band
-Heh. Bri"ish people-
Soon AI will research and publish papers that will be peer reviewed and found that the AI results cannot be ignored or refuted as the research will be based on a database that is catholic or universal in scope.
PhD's will mean sh^t by comparison. Social science will mean sh^t also. But you already sense that, if truth be told.
Fourth problem- researchers are not rewarded for publishing negative results, and incentivised for original research, but not replicating prior studies
Yes and this may be the most crucial problem. One of the most fundamental principles in science is that a result is not valid unless it can be replicated.
We have to be careful what we wish for. If we incentivize replicating prior study results we are likely to end up in a quagmire of fake replication studies. The paper mills will spin in full capacity to create studies that "replicate" other studies.
At least right now, if a study is replicated, we can be at least somewhat sure that it was actually replicated.
so so so many papers end up being unreplicated
That is very unfortunate. I guess people forget tha in sience even faliure is a result.
Part of the problem with replication is that it's unlikely to get funded. Grant issuers want to fund original research. Even if a researcher wanted to do replication studies, getting the funding would be difficult. Maybe all grants should include funding for follow-up replication, and require it to be done?
Pete you're laughing when you read out the emails but what he does is literally the bread and butter approach of data analyst and data scientist in businesses. Ive been fighting this stuff for 13 years and it's exhausting
It’s standard academic practice at this point, even if no one wants to admit it publicly.
I’ve seen things like this even in the “hard sciences”.
@@theondono it's worse, they're proud of it! When I do experiments my boss wants a 20% type-I error rate and still ignores the result of the test 😂
It happens in a lot of fields that rely on data. I am even guilty of this, not p-hacking, but manipulating data to reach a desired result.
I did real estate appraisals before and during the real estate crash. We routinely screwed with data sets and comparables in order to get a valuation the bank required for issuing a loan in order for the client to purchase the property.
While we did have some wiggle room if the data said one number and the number that was required was only a percentage or two higher, a significant number of real estate appraisers would swing for double digit manipulation.
Thankfully the trainer I had was fairly honest and we did not do as many bogus appraisals as others. Instead we would violate the portion where we gave a rough ballpark figure so the bank could decide if they wished to hire us or find another appraiser willing to be really shady. Unfortunately banks were notorious for blacklisting appraisers that tanked real estate deals by not "hitting numbers."
I only had one property that I feel was extremely wrong. The person I did the appraisal for happened to be someone that was able to get us a client that sent us enough work to keep two people employed. We received about 200k a year in wages from that company.
Not sure if it will make you feel better or not, but I’ve been fighting this for 33 years. It’s beyond p-hacking, it’s outright denial of business people to accept facts that go against their opinions.
Yeah, in industry it's common practices. My boss and other collaborators always "why don't u do more slicing?" We literally do tens and hundreds of testing without adjustment. I tried to get them pre-specifing a few sub-population, but that's futile. Whenever, the experiment doesn't turn out as expected, the response I got is always more slicing.
The weird thing about the buffet study is that finding no relationship between cost and satisfaction is also an interesting finding. There is value in this kind of findings, too.
Totally agree - you can learn something from almost every study if it's done properly. BTW, "p-hacking" is a skill that is highly rewarded in the corporate world, sadly.
Yup. I was wondering if it would be different if it was $10 vs $50 buffet. $4 and $8, obviously everyone still got a great bargain.
@@animula6908I've had expensive buffets that I've enjoyed and I've had expensive buffets that were over hyped ripoffs.
but it isn't good for the headlines
@@animula6908 The issue is, if given the choice, most people would choose 10 dollars. I wonder if they had a choice. Making the data imbalanced to 10 dollars over 50.
For a number of years, I was the Chair of my university's Institutional Review Board (which reviews and approves/disapproves research involving humans). The amount of crap research that we had to review from the social sciences was appalling...we had a number in which N (the size of the research group) was 1-2, from which they drew "conclusions". If there was any pushback from the IRB, they just made it a "qualitative" study or "preliminary study" to not have to do statistics. And the disregard for Federal guidelines for using research involving humans was scary. Luckily, what the IRB said could not be overruled by anyone, including the president of the university. But I made a lot of enemies across campus.
Yeah, reminds me of when I was a grad student and was the research assistant for an IE professor. He worked with a gem of a tenure-track mechanical engineering professor on a research topic, where the ME professor was responsible for the physical simulation and recording of results, and the IE professor (my boss) was responsible for the experimental design and data analysis, both of which he deferred to me.
So they do a few sample runs so we can get a good idea of the variance of the results. (Two between-subject factors, and a repeated-measure, in case anyone cares.) We discovered that there was actually little variance, so we have a meeting where we (IE professor and me) are happy to tell the ME professor and the client that we can probably have as few as 3 runs per cell of our design. This ME professor then asks why we can't do just one run per cell. I was appalled that someone would say something THAT much out of left field in front of our client, who was quite knowledgeable on experimental design and, you know, basic calculations regarding degrees of freedom. I was afraid my professor was going to have a stroke or something, so I quickly just pointed out that the math wouldn't work out.
I think that was the day that any respect I had for the title Ph.D. in itself died.
thank you for your work. I'm a female biomedical engineer who specializes in tissue and genetic engineering who did a lot of research during my time at my university, and we always had a running joke amongst engineers in my department, "biologists can't do math, and social scientists can't do anything period". its a real problem.
social science students are NOT science students in terms of STEM education and scientific practice - I want to make that clear. there is nothing scientific about their "studies" about 90% of the time, and their curriculum in school is the embarrassment of the scientific community. most don't even take calculus, let alone physics, chemistry or advanced natural sciences. they take mostly fluff courses like very basic anthropology (which is mostly about cultural norms), literature based courses, sociology based courses (which are also cultural in content), and psychology courses, which can be practically anything in terms of content at course levels beyond the beginner level course PSY101/PSY111 which is standardized. so the base curriculum itself is severely lacking in terms of scientific education, especially when compared to the education other STEM students are receiving.
due to the lack of a proper education in STEM, when it comes to social science studies, these people develop a foregone conclusion that they're trying to prove based on a fictional story they want to tell, instead of collecting existing data from a literature review or previous work THEN formulating a hypothesis. basically, they're developing an idea out of nowhere and then trying to prove it, which doesn't work, generally speaking. that's backwards. typically, you are simply studying a subject; let's say "the human behavior of purchase satisfaction". at this point, once you've decided on a subject of study, you conduct a literature review to see what has already been published on a subject. you may choose to peer review another study on that subject, or you may look through the data and methods of other previous studies to look for trends. once you have identified a study to peer review, or trends in previous literature, THEN you develop a hypothesis based on previous research on that subject or a similar subject, so that you are an educated expert on that subject before you start, and so you're making an educated guess with your hypothesis based on something tangible and real, not a wild guess based on your personal version of reality. this is also so that you don't inadvertently repeat a study that has already taken place, without realizing that you're peer reviewing, or repeating a study that has already been conducted and verified many times, so that resources are being spent to either strengthen previous findings, or develop new findings. we don't want to waste resources on useless subjects or on subjects that have already been extensively studied.
then you build your study around what you've learned from all of humanity's collective previous knowledge on the subject, collect your data, and look for trends with a large randomized sample size. if you don't find trends, you go back to the literature, and try to understand what went wrong, and try again after making modifications, or you choose another niche/subject. if you do actually find significance, you perform similar studies to collect more data and publish.
social science is doing the opposite. they are formulating a wild guess hypothesis based on their imagination (not anything concrete), putting together a poorly planned study due to their lack of scientific education with terrible sample sizes, no randomization, no controls, which is based on nothing but their fantasies, then collecting data in a way that doesn't make sense or is missing important aspects of data collection, then use Microsoft Excel to analyze the results for them because they don't understand the statistics themselves, and when they don't see anything significant (shocker!) due to the fact that they based their study on nothing but their daydreams, they delete outliers, eliminate dozens of data points until their sample size is tiny, or even more brazen methods of data manipulation until their study's data fits their pre-conceived narrative.
if Dr. Wansink had reviewed previous literature, he would have found that human beings in the situation he set up for his study will likely experience the same level of satisfaction. why? because they don't realize that they paid more, or paid less, than other people did for the same item. this is a study where the participants are not told that they paid more or less. they are simply served a meal, at a given price, then polled on their satisfaction. two people given the same food will experience the same level of satisfaction, especially with the price difference being unknown to them. the previous literature indicates this fact. only when people are *told* that they over or under paid for an item or service do their feelings on satisfaction begin to shift. like so many other social scientists, Dr. Wansink simply wanted to write a good fiction "story" to get published and picked up by the media, instead of doing real, useful science. I've yet to find a social science study that is practically useful, fully replicated and peer reviewed, scientifically sound, and not just common sense.
I've seen this more times than I can count out of both biologists and social scientists, but mostly social scientists. thank you for acting as a barrier between bad science and unwarranted funding/access to more resources including unwitting animal and human subjects. I'm sure you didn't make a lot of friends, but you did god's work. you should be proud of your integrity. cheers to you my friend.
I'm not surprised, I've heard a bunch of excuses about the social sciences as to why garbage results are acceptable. Apparently, the hard sciences are just easier to do reliably and always have been. There certainly hasn't been an effort over the course of centuries to ring as much reliability out of the methods as possible involved. It seems to me that so much of this is in part the result of considering crap results like those with a P in the .7-8 as quality work, when really, that just indicates that there's potentially a lot more to what's going on and that there should be work made to get better results as ..75 is not that much better than 50/50. It's certainly not good enough to do much with.
Yes, humans can and do vary, but that doesn't excuse t he attitude that there's no need to figure out ways of ringing out more reliability and more reproducibility from the test participants you can get. Sure, the results will never be as precise or generalizable as you get from physics or chemistry, but there's a lot more that could be done if folks were expecting more when they designed and executed the experiments.
I agree with everything you said in 941 words. But I can easily say it in 751. Précis and word count are the engineer's friends. @@scarlett8782
LOTS of social scientists start with lit review, and form a hypothesis after years of study. What you are talking about sounds abnormal. Also the basic Anth as science is physical anthropology which ranges from evolution to genetics, biomechanics, forensics.
Yay, you did my suggestion. I was a grad student in Cornell and I had a class taught by Brian Wansink right before this story blew up. He came across as an “oblivious diva”, but tbh I think Cornell’s administration is also to blame for both not firing him and also not helping his grad students into new labs/research once he “chose to retire”
100x this. Exposing the academics who p-hack is one thing, another is asking the question: who was their department head for so many years and so many papers, and never bothered to do any quality control on the output of their own faculty. Once you dive into the data, it is often not too difficult to spot patterns of iffy science, especially if you are in the same building and hear the rumours about their postdocs refusing to do a certain project etc. etc. Quality control should be the job of the department heads, who in general should know the subject matter quite well and should be competent scientists themself. Now the only control is quantity control which can be done by the secretary of the department.... SO a shaming of the department heads of cheating scientists would maybe help to create an environment where science quality rather than quantity is rewarded.
At least Wansunk didn't get Cornell to threaten legal action against the people who drew attention to his dumb blog post. Like another "academic diva" I could name.
@@Obladgolated there is always someone worse, still doesn’t make him a good person
@@pedromenchik1961your hypothesis is that there are infinitely many people?
I don’t think we even need to test that
Pretty much it takes an act of Congress to get someone fired from academia, literally!
You scroll quickly pas Wansink's response to the guy asking if the post was a joke. He says that he wishes his tutors had pushed him to to this when he was younger, as this way he would have published more and wouldn't "have been turned down for tenure twice."
So beyond this doofus spilling the beans, isn't it also an indictment of the whole field? I mean if he's bragging about it, it probably means that this is utterly commonplace and even expected in his academic circles, no?
I think he was completely oblivious that this is wrong
It is as the institutions care more about outward trappings than the essence of science, and the people in charge who don't actually do especially are like this, but also it's a language problem in that these frauds can use the same language as genuine actors when that's the only level the former care about
He doesn’t seem like one of the bad ones really. It’s more how he looked down on his employees. I don’t think it’s bad to look for what unites those who the effect does hold true for. He should follow up with another experiment to see if it’s significant or just a one off coincidence, too, but it sounds like looking for something, not fabricating it to me.
@@animula6908 He's looking for something to justify his fabricated hypothesis. Same thing.
Correct. Wansink's mistake wasn't the fraud, it was failing to realize the fraud was in fact fraud and needed to be kept quiet. All of his colleagues will continue to do exactly what he does, but they know better than to open their mouths
The biggest positive change for academia (imo) is for journals to publish papers where the researchers’ hypothesis was not ultimately supported by their data (either there was no findings either way or even if the data showed an effect completely different from what the researchers predicted). I know that this is less exciting for the news media but when science is so driven by exciting results and leaves out the “boring” stuff, it heavily incentivizes dishonesty in researchers.
You can learn as much from failure as from success.
This requires the invention of an incentive.
Say I run a journal. I want it to be a prestigious journal. I convince scientists that it is prestigious and that they want to submit their papers to my journal by maintaining a rate of citations. Lots of people cite papers published here, so publish with us!
If I now begin accepting null hypothesis papers, I am accepting papers that will tend to receive far fewer citations. Makes my journal look bad. I run a business, and you’re proposing a money losing idea.
Maybe journal could be required to have at least a certain percentage of their papers be null hypotheses? So it won’t punish the journals that are encouraging the honest science?
I sort of agree with your publication stance, but being able to cite a failed experiment based on "x" data and "y" hypothesis, can work wonders for metadata studies. Variables have to be accounted for, of course, but it is far easier to get grant and research money when you can say, "it's been done this way and that and failed so many times-we should look at this other hypothesis instead."
I reviewed a paper for a high impact journal once, where imho the graphs indicated results almost completely opposite to the text. This would not be obvious on a cursory read since classification of the data and the fits obscured it. I wrote this in my review but still saw the results in a less prestigious journal about a year later. To their credit they communicated more uncertainty in the final published product, but the motivated analysis was still super clear. I suspect I also engage in at least some motivated analysis, it when it becomes sufficiently widespread that a large community is chasing the same expectation it can get out of hand really fast.
Honestly, I think somebody should start journals dedicated to publishing repeat studies and rejected hypotheses.
I remember when this dude's shenanigans got revealed. Strengthened my resolve to ignore all "cutting edge" Psychology findings. If it holds up under rigorous scrutiny, I'll hear about it later. If it doesn't, I never needed to entertain the thought.
Something tells me there's no such thing as "cutting edge" psychology findings. If someone comes up with surprising new findings you've found your cheater.
Interesting. Books and philosophies (and music) that are "trending" really aren't in my personal radar till after a few years. Time separates the wheat and the chaff.
It's not just him.
It's the entire academic system--they're the ones who TELL THEM to do that.
My husband is a mathematician, & they wanted him to publish papers, regardless of how good they were. This baffled my husband, & he eventually left academia for the public sector.
It burns me up that honest people like him had to leave while con artists flourish.
That's good tho. Publish data even when the results aren't what you expected or disprove your hypothesis. It means people on the future will know this has already been tried and they can make better decisions about what to try next
I don't think the garbage p hacking has any bearing on the field of mathematical publishing. Theorems are either sound or unsound, there is no 'wiggle room'.
Thats bad tho...
The og comment never said that said paper disproved his(husbands) theory’s but that they were just bad - maybe too short to give a real answer on the question/problem that this paper was about, or the funding was not sufficient so that some results are heavily influenced
Imagine posting a paper about how vegetables aren’t healthy and the study itself had a duration of one year, and only enough money for 2 test persons and at the end everyone would think „damn I always thought vegetables were healthy but this paper said otherwise guess I was wrong all along and of course there’s no need to remade such a study cause it’s already had been done, maybe bad but it has been done“
And at the end the test person who ate no vegetables didn’t smoke (privately) and the one who did smoked 4 packs each day but because the project hadn’t enough money and this 2 were the cheapest option you choose them
(Or the Sugar/health sector/industry even encouraged wrong studies about it cause it boosts there regenue)
This would be absolutely shitty and acutally happened in the past
Like Cholesterin (fat in red meat etc) is soooo unhealthy and at fault for most heart diseases and not sugar/high fructose syrup and because it was backed by studies no one fought it for decades because as you said people already „tested“ it
@@funfungerman8401 Even such a bad paper should be published. It's up to the people who read and want to cite it to ascertain its veracity.
@@likemy I didn't say P-hacking was in math. I said the pressure to publish was there, regardless of how good the paper was.
How do you fix the system? LET PEOPLE PUBLISH NULL RESULTS.
the publish or perish system combined with having to have something significant to publish incentivizes people to "make the data significant" however they may. If a study can be published even if not "successful", this behavior will likely decrease significantly
This is the way. It should be about accurate results, not results that soothe corporate interests.
This would also need a way to easily look up the set of total results for a particular topic so researchers can get a better understanding of why something has null results vs something else having "significant" results.
Journals wont do this because it drives down citations, thereby impact factor, and thereby the journals incoming high quality research.
How would you award grant money if everyone gets to publish?
@@kuanged I suppose grant money could be bestowed upon truly life changing scientific results that would bring us closer to solving world problems. Instead of just giving grants to studies that just confirm biases, maybe be a bit more prejudiced about which studies it goes to.
OR, the government could be more strict. Require the results of a study to be able to be reproduced by groups not connected with the primary study.
There are so many ways to overhaul a system that seems to reward shoddy work. 💁♂️
Null results shouldn't be seen as bad imo. Its definitely possible to have a null result thats just as notable or surprising as a positive or negative result.
The irony of a Behavior Scientist falling into bad behavior because of the "reward system".
It's great that we're learning how to trust the science, but not the scientists. For too long we've been woefully gullible.
I think the recent surge in anti-science rhetoric has forced the scientific and academic community to FINALLY crack down on bad faith actors like Brian. How can we convince society that science is trustworthy if the establishment keeps on letting this kind of nonsense slide?
I trust the scientific method. I most certainly do NOT trust The Science, which often has very little to do with the scientific method.
it doesn't matter how much awareness you spread, humans are always going to fall for it 😂
@@sssspider
im not sure if "sc13nce" is c~nsored, but your comment doesn't show unless sorting by new
@@WobblesandBean
not sure if "sc~ience" is c~ensored but your comment doesn't appear unless sorting by new
This is actually absolutely hilarious. He seemed to have been almost oblivious that his way of conducting research isnt legitamte at all, but that surely cant be right?
This happens at every university. There are multiple labs at every research university that do these things.
@@hipsterbm5134 It's one thing to take part in shady behavior, it's another to brag about it on the Internet.
People lie to themselves all the time. In every walk of life.
Not seemed to be, was; he starts his blog post by saying p-hacking is bad, but a deep data dive is good, so he clearly thought what he was doing was not p-hacking.
That's how a lot of these frauds are, willfully ignorant and only dealing with the superficial language level of reality but they game the social system for status really well
A few notes from a statistician.
3:33 Having data before a hypothesis is not *necessarily* bad science. It just needs to be understood as retrospective research. This happens a lot in medicine and is a cheap way to see whether moving forward with a larger, prospective study would be worthwhile.
4:20 One issue that perpetuates P hacking like this is referring to negative studies as "failed" studies. If we don't find a link between two things, that is as worthy of publication as finding a link.
5:12 This caused me physical pain.
for a statistician you seem remarkably uninformed. with 90% errors doctors cannot successfully pass a basic test on probability and statistics. "Hypothesis Testing as Perverse Probabalistic Reasoning" Westover, Westover, Bianchi BMC Medicine. Poor patients.......
A lot of the issues with p-hacking would go away if we shift to Bayesian methods. Hypothesis from cherry picked data would have a lower prior. It's mind boggling that in 2024 we're still talking about p-value
In medicine, we often do retrospective studies because doing so would violate ethical considerations to do it prospectively. This is probably the most common reason rather than "It's just easier". However, while we have the data before the hypothesis is developed, we expect the hypothesis to be formulated before evaluating the data. However, one can easily cheat so it probably happens often.
@@DocPetron This. Exactly. The 'answer is in the data' is bad science. Which in a data obsessed world is one of the reasons why science is going wrong. The answer is in the hypothesis!
My experience in academics was that there are enough people who are willing to simply follow the incentives that it becomes imposssible for the ordinary person to participate. You either have to be a workaholic borderline genius or highly unethical.
For example my attempts at collaboration were quickly rewarded by a professor with a paper-mill getting a grad studuent to publish my idea without me. I learned to always have a fake idea to tell people I am working on and keep my real idea secret until its formed enough to get it onto the public record.
People publishing stuff that barely worked was routine.
A good friend and former student (I had her as an undergrad) lost 2 years of her PhD work, delaying her own degree, when she and her fellow grad students turned in their neuroscience professor/advisor for manufacturing data. After working for several years in her field of expertise, she has decided to return to her home country and go back to being an accountant. She has had trouble finding work that doesn't leave her in poverty. I've wondered how much of that trouble is attributable to her related. to her reputation as someone who won't put up with fudged data?
What field of science?
"You either have to be a workaholic borderline genius or highly unethical." Or a mix of both. My PhD supervisor was a genius AND highly unethical. He p-hacked his way through his entire career, because he saw that was the way to game the system (see: unethical).
This is a matter of philosophy.
Preregistration of studies seems to be a good start. I wonder if this could be pushed further into some form of open journaling. Researchers would not just log their intentions but also their important steps. You could see how the value of n changed over the history of the study and could see their justifications for excluding samples. This could also be a good way to semi-publish negative results. Something like "tried X but did not work" could not only be valuable information for other researchers but also ease the pressure to publish positive results since there is still a public display of research being done.
BTW for anyone watching, there is literally NOTHING wrong with "cutting up and slicing" the data to look for relationships and results, so long as you test whether those results hold on a different data set, ideally one gathered seperately.
Precisely. Looking for interesting stuff is how we generate hypotheses, but finding a weird artifact and then creating a reason post-hoc is stupidly weak argumentation.
I was just looking for this comment. It's an entirely valid method for identifying possible future hypotheses for further research, but shouldn't be considered conclusive in and of itself.
I was wondering about that. I'm not a scientist, but it seems like looking for patterns is what scientists _should_ be doing; albeit that should just be one step towards a future study, rather than trying to retroactively fix the previous one.
As a non-scientist but with a science based degree that was my immediate thought, so thanks for confirming it.
> "for anyone watching, there is literally NOTHING wrong with "cutting up and slicing" the data"
J. B. Rhine of Duke University used this method -- he picked the results that best agreed with his ideas about ESP. After he passed on, his spectacular confirmations for ESP evaporated once someone put the less exciting (and sometimes negative) results back into the data. That "someone" is now called a scientist.
Rutgers University Psych department has a professor named Alan Gilchrist who told grad students that he would not sign a dissertation unless it produced results supporting his model. The department kept him for decades even though he worked with grad students but never graduated a single one until 1991 when I defended a thesis to the other three committee members. It was the first PhD granted against the recommendation of the faculty advisor. He made sure I could never get a job with my PhD. After several decades, I sent the PhD diploma to the landfill
so sorry to hear about what you had to go through :(
sounds like a mentally sick man to me
Does any one person have such power to deny someone get a job in his field, anywhere in the country? And is the alleged victim so important to him that the director would take the trouble?
😂
We need a journal of nonsignificant findings. Somewhere where the status quo, if supported, is documented. That would go a long way to giving credit to researchers for the work that they do when when the results aren't "interesting".
During my own PhD program, I became discouraged and frustrated by the "Mathew Effect". Given that p
9:20 Absolutely INSANE that this guy just plainly put "hello, please do scientific malpractice on this paper and get back to me" in writing on his university e-mail and he didn't immediately face disciplinary action.
all points to credentialed LOW IQ, very dangerous. But the other fools in the department did not catch it either. LOL
I think a lot of p-hackers really don't realize they're doing anything wrong, although maybe that's been changing recently.
It’s pretty much institutionalized in pharma, finance, food science, energy, politics - likely many others. If someone told me this Cornell guy used to work for Kraft of similar it would all make sense.
The root problem is with institutions incentivizing constant success. That's not realistic. They will only give grants and coveted tenured positions to people who are consistently churning out desirable results. But that's not realistic, the world doesn't work that way.
It's like business investors demanding that profit margins must ALWAYS be increasing, every fiscal year, forever. As if inflation wasn't a thing and there were unlimited resources, manpower, and disposable income to throw around. It's delusional thinking.
There is a slippery slope between an explorative study and p-hacking. It is legit to make an explorative study about a dataset to look at it in different ways and generate different hypotheses for later validation on another dataset. E.g. Look at small dataset A, get an idea of what could be the case, then collect large independent dataset B to test that hypothesis. After all this is how technically all hypotheses are generated, at the very moment you decide to focus on one hypothesis to test you exclude several other hypotheses you could have had. Publishing the step of looking at dataset A separately is not p-hacking.
Similarly, you also can slice up dataset A and test for multiple things, but then you have to correct for literally everything you looked at for multiple testing. I.e. that something you find is significant the p-value has to be smaller and smaller the more things you tested for for it being still significant.
@@doctorlolchicken7478 Say you work for a car insurance company and they ask you to find population segments where your rates are mispriced. The data is fixed, and there are endless possible hypotheses (age, gender, car color, number of previous accidents, etc). How do you proceed?
Well they were rewarded with money and fame for it. Why would they think it was wrong?
Heard at an academic conference:
Questioner: You regression is terrible!
Presenter: We have done worse together.
ha ha ha, that's a really sick academic burn!
I would be more trusting of a 0.06 value than a 0.05
The 0.05 threshold itself was arbitrarily set, so...
@@aDarklingEloi Not the point.
Lol, funny, my advisor told me to do all these things too, but never was dumb enough to send those requests in emails... only group meetings
yeah, I guess that how these so-called 'sciences' will solve the problem: learn to cheat better
4:20 Why was that called a failed experiment? One that produces results opposite those expected has not failed. An experiment fails only when it fails to give an answer, or if something rendered the conclusion invalid, or something similar.
9:00 P < 0.05 is very close to two standard deviations away with a normal distribution.
Great job! Things will only get better once we call out bad practices, and not treat scientists as infallible
7:22 I think the best bit about this is that he was told how bad this looked, given an opportunity to say it was a joke, and he still replies "I meant it all as serious"!
Awesome job exposing and shaming these frauds! I think that is part of the solution to fixing the academic system. Because if nobody exposes and shames frauds, there is almost no downside to faking your data, and many people do it. On the other hand, the more fraudsters are exposed, shamed, and fired, the less enticing it will be for others to commit fraud. Keep up the good work!
We should publish and reward people who get null results. Knowing what is not true and does not work is actually valuable.
a single rejection of the null hypothesis establishes and proves nothing. It just casts suspicion. All inductive or statistical hypotheses must be established by repeated observation and testing, even the absolute rejection of the null.
Pre registration of methods is a pretty simple system to implement and I think it would solve the problem of positive result bias pretty well
I wish we could reward research with valuable or interesting null results. Knowing that there may not be any relationship between the price of food and satisfaction should be worthwhile on its own.
We should also incentivize replicating studies.
I'm sure the worry is that people will not make new science and continously repeat old studies or papers about null results without suprising insights. But we've gone too far in the other direction where researchers are heavily incentivized to fake results, chase fads, and churn out lots of low-quality papers that are hopefully "new" or "interesting" enough to keep the researchers employed.
For me it was interesting. O thought that paying more would trigger some kind of sunk cost reaction but it looks like humans are more resiliant than I thought
Which is great
Pete, i seriously love your content and suggest it to all grad students or aspiring researchers i know. you deliver the content in a unique way and the focus of your channel is so damn important. people need to know!!
Thank you, Pete for sharing a refreshing point of view - what can we do to incentivize PROFESSORS to be truthful? - how ironic that we have to even ask this question - and good for you to not only share the problem, but move us toward brainstorming solutions - this is the kind of thinking that can help people grow toward healing and restoration.
My 12yo daughter loves research and your videos have opened her eyes! Thank you!
Just like there’s “jury duty” there should be “replicating analysis duty” (additional to reviewing). Right now, only competitors have incentives to critically analyze other’s research. When they do it, there is suspicion of bias (which sometimes is real).
The idea is that if you’re on duty, you have to endorse, criticize or say you cannot do either.
remember that by definition, a p value of 0.05 is 5% likely to happen just by pure chance even if your hypothesis is wrong. If you test your hypothesis on 20 subsets of your data, you might find one that's p
There is an xkcd comic about just this: it reports a correlation between jelly beans and acne.
If you do 20 tests _and make the appropriate statistical adjustment for multiple comparisons,_ and get a result with an adjusted _p'
Actually, the p-value starts with the assumption that your hypothesis is wrong (i.e. the null hypothesis is true) and tells you how frequently random chance would produce the results you have observed. So a p-value of 0.05 suggests that random chance would produce the observed results once in 20 identical experiments. The p-value tells you nothing about the probability that your results are due to chance. That is a Baysian question and p-values are frequentist statistics.
I am looking forward to how you think it can be fixed! This whole topic goes far beyond even just academic issues, but hits all walks of life!
I think there are two ways. Both are not easy.
First is change the current culture or system of rewarding (and punishment) in research. Our current culture celebrate and reward researchers who work on trendy topics that break new grounds, while ignoring the majority of hardworking researchers doing normal (Kuhn's terminology) science. Without the latter, the former would not have been possible to rest on firmer foundations and to have practical applications made from them that benefit humanity. Both types of research are important.
Second is change or emphasise the good values that individuals hold themselves accountable for: commitment to the truth, service to others, do no harm, sacrifices, etc.
The wild part of the first dataset slicing example is that there are ways he could have looked at the data more ethically. If he did not want to just publish the null result, I could understand trimming the outliers and publishing both together in a transparent way. Or, he could have had the student investigate the data to try and inform the development of a different hypothesis for a follow up study as opposed to butchering the data for the sake of significance sausage.
Null results are underrated anyways. A good experiment can still produce profound null results. One of my favorite papers I've had the pleasure of working on is largely a null result because of the value getting that result still provides.
Preregistration of hypotheses and compulsory publication of non-findings.
As a former graduate student in statistics, this was very painful to watch. I believe that qualified statisticians should be consulted for statistical analysis of scientific studies, and that statisticians should be very involved in the peer review process prior to scientific journal publication.
Getting statisticians involved is a great idea, and unfortunately, won't work under the current framework.
Delaying publishing, splitting authorship, and working with someone on equal footing whose primary job is to find gaps in my research?
@@kap4020 Finding gaps in your research is the primary goal of peer review. Done poorly in many journals, but most of the top ones take this seriously, fortunately. I submitted to Cell recently and the reviewers found many 'gaps', for which I am very grateful, because I actually want to write as accurate and scientifically rigorous a paper as possible. This is how science should be done.
I have mixed feelings on the idea because depending on the research one might need experts in other fields to vet one’s work. The lines between science and engineering subfields are blurred. So, the list of experts needed grows quickly. Plus, I believe most academics won’t bother to check someone’s work unless they are getting some of that research money and/or included in the author list.
I can only speak for the physical science, and I know that scientists and engineers could brush up on their statistics. Either they need to take more statistics or take the standard introductory sequence that statistics majors where the fundamentals explored in more depth. I regret taking one semester programming course that a lot of non-computer science majors take instead of the two semester sequence that CS majors took.
But statistics can only go so far, bad data is bad data.
If the original hypothesis doesn’t work out in a study, but a new revelation is found, why can a scientist not roll with that? Or must they just do a whole new study with new parameters to ensure it’s valid?
Just a non scientist wondering!
A great question! Going off the cuff here but I think it comes down to what that p value really means. p = 0.05 means there's a 5% chance the effect is coincidence, so if you slice up the data 20+ ways, you're essentially hunting for that coincidence and then not telling people how many times you jumbled your analysis so it seems like the new "result" is what you were looking for from the outset. You're right that there's a thread of a good idea here, which is, it also might not be coincidence, but you would have to do a new study with new data that you didn't slice and dice, to prove that.
Edit: scare quotes on the word 'result'
The thing is, that this revelation can be due to a random chance, so you would have to account for all possible revelations when calculating statistical significance, and you would need to be honest about what you did in the paper (which he wasn't) for the calculation to be done correctly.
There are also ways to publish exploratory studies, but it has to be done in an honest way. If you want to describe a phenomenon/population/disorder/etc that is not yet well know, and do not have many hypothesis about the results, you can make a research where you gather a large number of data, then analyse them and look at their exploratory relationships, then publish a descriptive study where you say "we don't know a lot about this yet, here are some first explorations". The key part, however, it about being honest about it, and not choping your dataset in tens of subgroups just to find some effects. In truth, real exploratory research quite often have some form of interesting results, even if there are no group differences, or no special new effects, because we still get new knowledge about a phenomenon or a population. But if you design an experimental research testing something pretty specific like the effect of paying half-price on the satisfaction given by a meal, if you base your hypothesis and method on flawed, p-hacked litterature, it's not really surprising that nothing will come out of it "naturally".
So yes, @catyatzee4143, it's true that new revelations are often found randomly, and it's exciting when that happens! But it should always be said if that it is the case. Scientist should not try to pass these discoveries as something they always thought was going to come out, based on their vast knowledge and great research expertise (lol). And more importantly, as @MR-ym3hg said, there has to be follow-up studies where researchers try to replicate the results and dig deeper into them.
Everything that everyone said above 💯
The problem with finding coincidental findings in an experiment is exactly that - they’re coincident!
These kind of findings are specifically atheoretical - the experiment was not testing for the random finding, it just kind of jumped into being.
If you want to test for the conincedent finding later, and you replicate appropriately, that’s great!
But you can’t say it’s a genuine finding until it’s tested.
TL;DR: real science is resource intensive, and coincident findings are not “bonus efficiency”
@@JS-oh2dp Nice explanation, a simple thumb up did not feel like enough of an appreciation for it :-)
I think the Universities should make sure that when a PhD is graduating the candidate knows at least a minimum of research methodology and research ethics. It probably differs from institution to institution, and discipline to discipline, and maybe it has gotten better over the years, but when I graduated (in the STEM field) in the early 2000s, these aforementioned skills were optional, not compulsory. And I am not even sure, I even could take a class on research ethics. Now, years later, at the agency where I am working, an ethics board has recently been established and we have had courses and workshops about these issues to guarantee the quality of our research. Hopefully, we as a global research community are maturing. Good work Pete, for bringing this up.
My institution has research integrity seminars as part of the mandatory training for all new PhD students, I think is becoming more common but it won't fix the incentives problem.
@@RichardJActon Yeah, incentives is an additional dimension. Where I work now we are only vaguely affected by the "publish or perish" paradigm. It is secondary to solving the problems at hand of our clients, which are mostly other agencies.
Good stuff. Just because you asked for suggestions, I think you could maybe do a long episode with Chris Kavanaugh and Matt Browne from Decoding the Gurus, they do Decoding Academia episodes that are great. They had a panel that talked about open science initiatives, etc. Can’t seem to find it on UA-cam.
Thanks for all of your hard work.
Anyone noting the satisfaction of postdocs paid $8 and those paid $0?
Those paid $0 seem to be more willing to participate.
Is this consistent with his hypothesis?
No. Postdocs and PhD students used to get paid much less and corruption was not nearly the problem it is now. The younger generations have all grown up in an environment where cheating and stealing (i.e. digital music, movies) carry no negative connotations, this is why there are so many corrupt people now.
@@ohsweetmystery No.
I have a bit of a problem with some of what Pete Judo says. To be clear, my objections are not meant to defend Brian Wansink.
First, Pete calls the initial Italian Buffet experiment a 'failed experiment' because the result obtained without slicing and dicing the data was that there was no relationship between customer satisfaction and meal price.A study that finds that there is no relationship between satisfaction and price is in no way a failure. It simply produced a result that disproves the hypothesis that satisfaction and price are correlated--in the specific case of moderately priced Italian restaurants (in central NY state I imagine). Pete suggests the data set should have been "put in the file drawer". That makes no sense. Data showing that price and satisfaction are not correlated (at least for customers of moderately priced Italian restaurants) ought to be published--that conclusion would have been as interesting as the conclusion that satisfaction and price are correlated. If Pete doesn't bother to publish data that shows there is no relationship between two things that people have hypothesized might be linked, then he's failed to enlighten scientists in his discipline that there is no such relationship. Notice I've used the word 'failed' there.
Second, from what I could see of what seems to be Wansink's first email (at ~5:15) to the Turkish student about slicing and dicing the italian restaurant data. If all Wansink had done so far was look at the bulk data and found no relationship, which is what the email suggest to me, then it seems to me that you'd certainly want to look for weird outliers, and you'd certainly want to know whether lthe satisfaction versus price relationship might be different between men and women. If that specific example is 'Textbook P-hacking' then I'm all for it. Where things went from Wansink's initial suggestions with that italian restaurant I don't know because Pete doesn't explain.
Again for the record: I'm not defending Wansink. I just think this part of Pete's discussion is problematic. Does p-hacking happen? I'm sure it does.
How about exposing these journals that actually allow these papers to be published. Don't they know what he's all about at this point?
Glad to hear the plan, several others must have advised the same - use your platform to help fix this rubbish. You have the majority on your side
It should be industry standard for authors to first submitting hypothesis, methodology to a journal, then being accepted on the basis of the quality of this and then be garanteed to be pulished no matter if the results are positive, inconclusive or negative. With todays practise the p value isnt very meaningful even if results arent tampered the distortion happens anyway as negatives are almost never published. I am astonished this wasnt done yet even tho this isnt a new idea at all. Its such an obvious flaw in todays science and really not that hard to fix.
I like this idea a lot
This concept is known as 'registered reports' check out the centre for open science's pages on it for some more details. Quite a number of journals now accept submissions in this form (even high profile ones like Nature) but they are not yet the standard - unfortunately. There is also some talk of integrating this into the grant process - i.e. funding at least part of a study based on the proposed hypothesis and methodology. This avoids too much overlap with the review of grants and the study registrations as they may otherwise end up being partially redundant.
Yes, registered reports are a great idea.
@@RichardJActon I wonder if this could be pushed further into some form of log. Where you not only register the study but also update the most important information like the value of n. Each change of n would require a justification that could be verified by editors (if they bothered to audit).
@@samm7334 In short Yes. Using git and something like semantic versioning, major version number bumps only occur after a review, minors for small corrigenda/errata, patch for inconsequential stuff like typos, tie this in with versioned DOIs. Submit papers as literate programming documents where all the stats and graphs are generated by the code in the document. That way to get the final paper the this document has to build and generate all the outputs from the inputs as a part of the submission process. The method submitted with the registered report would ideally include running code on example data. Where possible for both a positive and a negative case. Then once you generate the data all you do is change the example to the real and re-run the code. Then you can add another section incidental findings it needed but it's clearly separate from the tested prediction.
As an academic, you MUST publish whatever, and a LOT, in order to get a job or a scholarship. And, since readers only "trust" what they read and have no critical thinking, you can only scape that by being an outsider and do research by your own. Brian Wansink is not the problem, it is the system itself, it is the univerity as a whole, and also, people who "trust science" (that is an oximoron) or "trust" whatever is published in a "trusted" journal. And also it is problematic to think that you are a better researcher, just because if you have published X articles in a Q1 or Q2 journals. Academy is broken. Thank God internet exists, there are many excellent stuff out there, in public repositories, or foreign journals. Btw, this channel is cool, i have suscribed
I'm a registered dietitian and Wansink's work was regarded as revolutionary when I was in grad school and shortly after. Very disappointing.
easily deceived
4:05 This is not a "failed experiment." It's a successful experimemt with the result that the price of food had no clear effect on satisfaction.
Common practice when I was studying was, for lecturers to force us to use source material, published by writers I'd never heard of. Now, being a bit of a psychology nerd, I'd read work published by famous, unknown, and infamous researchers, simply out of interest. So, being presented with quotable 'authorities' I'd never heard or read of made me curious, so I did some digging. Long&short was, I found that there was a long established mutual backscrathing circle, where lecturers would plagerise researchers work, write poorly written books built on poorly understood psychology, without crediting the originator, or doing so in such vague terms, that it was more like a polite mention than credit, each lecturer in the circle, then forcing their students to use other members of the circle as their quotable source. Not surprising to say that, the majority of my fellows came away with the understanding that cheating is fine, so long as you don't get caught, or have the power/connections to squash your accusers. Perhaps starting at this basic level might be the way to go?
was this a U.S university?
@@kap4020 I share awareness of a significant academic fraud, and your take away question is, where did it happen? OK.
@@fatedtolive667It seems like a reasonable question IMO.
Pete, this is the third or fourth presentation I've seen from you, and your sincerity comes over very convincingly, as well as your superb diction. It is only in this film (the others dealt with the Gino scandal) that you turn to the possibility of a systemic problem, on which I agree with you. Clearly, solid data and reliable experimental work are the bedrock of your quest to clean up research science, and I cannot but suppose where problems might lie. And that is precisely the kind of surmising that you want to eradicate. I wish you good fortune and will follow your work with interest. You have now earned a subscription. Well done.
I think you have to reform the attitudes of university administrators and professors. Too many plagiarized and hacked their way into positions of authority and very few are willing to give up that power.
clever but brutish animals
Great work illuminating only one of the problems with "science" today!
Maybe if there was a way to set up a system that rewarded positive replication of your work by only unaffiliated researchers you'd fix the system. No clue how you'd set that up or what it would look like, but if somehow you could create a reward structure around that, you'd fix science in general.
"How can we ever get anything published if you're going to be so damn ethical?" And. "What's P-Hacking?" Nothing like outing yourself. I guess we just need to keep a closer eye on this kind of thing.
Fascinating and distressing. And yet we have also the opposite problem as well. I have a colleague with an interesting yet speculative paper in physics. The physics and math is complicated and correct, but not in line with the mainstream. He cannot get it published in any journal. How do peer-reviewed journals publish trash and keep out interesting work? What criteria do they use? It's a rhetorical question.
“does it serve me” and “maybe do i like him”, i think its the former…
Great !!! Great !!! thanks Peter Judo, new to your channel! excellent content! Thanks, seriously, your content does a lot of good to humanity!!!
Man i love nerd drama, i have no idea who he is talking about but its so juicy.
Pete, how about a video talking about how the journals make money. Researchers have to publish to justify their existence. Journal ask peer reviewers to work for free, and either authors pay a charge to get it published or the journal requires payment from readers. Researchers sign over their copyright and often pay for the privilege
What institutional changes could prevent incentivizing cheating?
Career advancement not being based so much on journal publications and accept fewer people for phds so there's less competition.
4:15 It's not even right to call it a failef experiment. It would be failed if the experimental procedure was flawed or they had to stop partway through. A successful experiment may give you a negative result. Your language here shows how ingrained this attitude is in science; even someone disillusioned as you are is still talking this way
My teacher of Statistics called the people who adhere to p
P
Two things I think could've been added to this video: (1) An explanation for non-scientists as for why the salami-slicing of data is problematic (I don't think it's obvious to a layperson what exactly is the problem with this approach), (2) the notion that this is a decent approach for an explorative study to develop ideas / hypotheses, but that you then need to collect a new dataset specifically for this hypothesis to test whether the hypothesis holds without p-hacking.
Could you explain why salami-slicing is bad to me?
@@johntippin here you go:
there is randomness (noise) in every measurement you make. even clearly one-sided empirical observations can happen by chance in a given sample. say you compare one variable between two groups: any ratio of the variable between the two groups that you measure can be the result of random chance. luckily, we can, using math, determine the probability of receiving the observed outcome based on pure chance alone (i.e., without there actually being any difference between the two groups). this probability of getting the observed result based on pure chance alone is the p-value. thus, p=0.05 means there is a 5% chance that a certain observation was the result of pure chance, or in other words, there's a 95% chance that there really is a difference between the two groups in that variable. this is often taken as the criterion to accept a hypothesis.
the problem with salami-slicing is that you now perform this test not for one variable, but for many variables, and you take the 5% criterion for each of them. thus, salami-slicing is another word for doing "multiple testing". If you look at 100 different variables in the dataset, and you apply the 5% criterion to each of them, then on average you will find 5 variables that show a "significant" (p
If journals required authors to provide their original data, taking into consideration reasonable privacy concerns, I bet it would reduce p-hacking.
Some years ago, there was a physicist at Princeton working in solid state physics. He was denied tenure. He then went on to publish a slew of papers, and received an offer as a full professor at the University of Illinois.
His comment on the process was priceless: “Those bastards can’t read, but they can count!”
Could you link the source in the description?
Done :)
@@PeteJudo1 Thanks
I work in Data Analytics sphere, and it is constantly the same issue everywhere. 1st - you have the hypothesis you want to prove, and 2nd - you find the piece of data which can support and prove you idea (and decline all data which not support your hypothesis)
I think some percentage of all research funding should go towards a kind of science police. There are already policies in place for helping out underrepresented minorities in science. I think we should add "honest scientists" to that list😁
I advocate for a version of this that I call 'review bounties' which is modeled off of the concept of bug bounties in the software development space. It is designed to solve two issues, the problem of unpaid reviews and the lack of incentives to find errors in published works. Instead of an article processing fee someone wanting to publish posts a review bounty and someone with a role like that of a journal editor in the current system arranges the review and to host the resulting publication for a cut of the bounty, the reviewers each get a cut of the bounty if the editor thinks their review is of sufficient quality and a portion of the bounty is retained as a bug bounty. If someone can find an error in the paper they can claim the remaining funds if some combination of the reviewers, editor, and authors agree there is an error. If the bounty is unclaimed then it accrues back to the authors. This also potentially allows grant awarding bodies or other parties interested in the quality of a result to add to a bug bounty pot on results. This would allow grant makers to incentivize correctness/quality in publications arising from their awards they they currently cannot. It could also let companies / investors with a potential financial interest in a result e.g. underpinning the development of a new drug to incentivize additional outside scrutiny on results so they can avoid sinking money into a project based on a flawed study.
The problem is that the people who are already supposed to do this are bad. In fact those who can't do in academia are disproportionately bad same with this guy. You would basically need to force certain people into the job and have an institution that has good incentive structures though one of the clearest ways of having a good system is get rid of the bloat and let a good system emerge. You get this thing where people think because children are taught to read at a certain age through schooling today, they were incapable or learning it in the past. So people act like you can't have science without all this top down regulation and bureaucracy when in reality it's the other way around where all the bureaucracy is a sign that an area of society has become high status and therefore people are trying to get in on it for personal gain.
Thanks to comments explaining the term "p-hacking". Accelerates my crumbling respect for academics done over the internet, as well as anyrhing else done over the internet.
Hey, nice video! Have a question; not being disingenuous, honest. What's actually wrong with p-slicing, if the slices are big enough to be interesting? Or put another way, is it so wrong to start with data and look for patterns that you hadn't anticipated?
There's nothing wrong with doing it to form a hypothesis, but any experiment to test that hypothesis must rely entirely on new data.
@@desertdude540 Does that really follow? I mean, other teams should test the theory later by getting new data, but in the case we saw in the video (pizza buffet goers) why can't the researcher say "Here is the data I gathered and here are the conclusions I drew from it"?
Carefully chosen data can show anything we want
For example there're disproportionally less atheists in American prisons than Christians. Does it mean that atheists have less criminal tendencies than Christians?
Not really
Atheists tend to be wealthier and more educated. Both wealth as well as education are related to lower incanceration regardless of someone's belief system. On the one hand wealthy people don't have to be involved in crime in order to survive. On the other hand a well-paid lawyer allows you to avoid prison time whether you're guilty or not
Not to mention that participating in a religious programs can give you an early release for "good behaviour". It's more worthwile for an atheist to lie about being a believer than for a believer to lie about rejecting their God
It's something of a statistical problem. If you look for one specific relation, there's a probability it can occur by chance in your sample even if that relation doesn't hold generally. That's what the p-value is about, how likely is the data to be this significant by chance. The problem with p-hacking is that if you keep looking for more and more things, one of them is likely to occur by chance. It makes the study much less statistically significant than it appears.
I wanna send this video to all my old research methods profs to show their students as a worst case example. Just insane. Every last bit of what we are taught NOT to do, nevermind that he exposed himself!!
You are quite scintillating in your delivery. You talk fast, but you analyze faster. Keep going. It is refreshing to see what happens when science becomes a con job instead of a way to a better future.
Researchers can be very competitive, which leads to discouragement if they think they can't win whatever game they're playing (get published). This is made worse by sponsors and journals only promoting positive results. There needs to be a way for null results to be published and celebrated, because knowing if an answer is wrong is just as important as knowing if it's right.
It's not just positive results the journals are looking for, but frequently there is an agenda (depending on the subject and what's "hot").
in Paul's reply to Brian:
"...if I'd been driving lots of projects fwd that a more experienced mentor was directing..."
reading between the lines, going along to get along is the key to success in ego driven fields. conversely, if you rock the boat with honesty, you'll be thrown overboard.
I don't miss my career in academia even a little.
It seems to me that some of these studies like Elmo stickers on apples are pretty silly and Cornell researchers have way too much time on their hands.
Around 5:30: Just to know, would slicing the data like that to try to find a correlation or something, then making a study (with a proper sample) that focuses on that be alright? I feel like it would.
So I have this guy to blame for shitty school lunches?
I had never encountered the term "p-hacking" before, but I had certainly encountered the concept; we used to call it "cherry-picking."
I am not a scientist, merely an intelligent layperson, but it seems to me this brings up several issues.
1. There should be (and I'm pretty sure there are) two kinds of experiments:
• Experiments designed to test hypotheses, which are the kind we're talking about here.
• Experiments designed to answer questions. Say, "We've seen a sudden rise in deaths from (fill in the blank); what is causing it?" Good question, and one that needs answering, but we don't have enough information yet to form a hypothesis, so we need to collect information (AKA "data") that will suggest a hypothesis we can then test.
Am I wrong about this? Are scientists ALWAYS suppposed to begin with a hypothesis?
2. Scientists are judged by the number of papers they publish. Am I alone in thinking that this sounds lazy? It actively SELECTS for p-hacking and discriminates against ethical scientists, all because it's an easy statistic to get; it requires no one to dig deeper. I would be very interested to know what happened to the ethical grad student Wansink shamed in his blog post.
3. This is related to my second point: Apparently journals aren't interested in publishing null results. I can certainly understand this; if they did publish null results they could easily be overwhelmed. But null results are important; a list of what doesn't work is a good first step to finding what does work. And if scientists had a way to publish their null results there would be less pressure to engage in p-hacking. Perhaps someone should establish a new journal that looks only at null results.
to be clear, he published that paper with his daughter obviously so that she can put it on her college applications
I feel like finding out that people's opinions aren't based on how much they paid is pretty significant. Maybe needs a few more studies to confirm it, but that's pretty useful data if you're working out how to price food service.
I feel sorry for this daughter being pulled into the vortex of bad science. I hope she realises how she is being duped before too long. But it seems she is already being groomed by her father to go down the same path of bad science, submitting his guided research projects into school science fairs etc.
Sadly, even obviously ghostwritten papers in obviously worthless pay-to-publish journals can be a significant help in getting into certain programs at certain colleges. Some admissions offices really like to see "published research" without thinking much about what the chances are that a high-school student would primarily by his own efforts generate worthwhile research of general interest publishable in an academic journal. In this sense, it may sadly be the case that for an immediate purpose she'd being helped, not duped.
8:00 LOL, he must have thought to himself: "Now I'm going to remove this post from the internet..." - LOLz ROFLCOPTER...
By no means is this an excuse, but rather this is meant to serve as a precaution. Having met Brian, he was more of an activist than a scientist. He believed in the cause behind his research (choosing healthy foods over unhealthy foods), which lead him to rationalize his methods. To spell out the cautionary part of my comment, and perhaps the obvious, we need to be especially vigilant for bias and methodological shortcuts when the research is aligned with our beliefs.
So..."The ends, justify the means"? smh
@@KaiHouston-m6j that is not what I said, nor do I believe it is what Brian would have thought he was doing. Reread what I wrote, or here read this briefer explanation: as an activist, he may have been naive to the (strong) role of bias in his method.
Naive? F'er has a Ph.D and looks to be in his late 40's. Making excuses is exactly how stuff like this happens.@@boundedlyrational
@@KaiHouston-m6j You seem very angry and this appears to be affecting your reading comprehension. I am not making excuses for him. Let me try again, using even fewer words this time: a scientist who is an activist may be vulnerable to bias.
So faking results is "OK" and get out with the concern bullying. If you love frauds so so much, ask your self how much of what you believe is truth, and how much is BS. Then look in the mirror.@@boundedlyrational
How to make status in academic correlate more with scientific quality and therefore makint predictions about immanent reality? A very tough question. A good place to start would be to destroy the bloat and let standards emerge from those who do
Can you talk about the recent plagiarism accusation of hardvard Gay. It is difficult to follow that for a normal person.
Coming up!
Highly recommend what James Heathers wrote about that situation; great resource
@@iantingenHeathers is great. Where can I find that?
Thanks for making this video and getting this story heard
My wife is doing her PhD in clinical psychology at Notre Dame...has about six months left in a 5 year program...and has straight up said that most of her colleagues are manipulating data to support radical feminist theories. Mostly women but a surprising number of men. They're basically trying to validate an ideology that cant stand on its own by hijacking the science itself, which is inaccessible to most average people. Its disgusting. Another woman in her program had submitted two case studies to the U for review and was told to shut up and then had her funding for her own lab slashed. She sent a letter from an attorney saying 'make this right or were suing you', with the primary goal to force them and the study's authors to reveal what they knew/know during the discovery process if indeed it goes that far...we'll see what happens but evidently ND has restored all of her funding and is basically trying to fast track her to professorship even though she hasn't completed her program yet, which suggests that they are kind of freaking out.
Interesting, been wanting to go into clinical psych myself.
"They're basically trying to validate an ideology that cant stand on its own by hijacking the science itself" In other news water is wet... ;)
"which is inaccessible to most average people." Average people kind of know anyway what is going in softer sciences, if anything it leads average people to also distrust legitimate research.
As side note, on one channel dude was already showing analogies how in Great Britain in aftermath of reformation universities lost their fledgling scientific prestige and become effectively childcare for higher nobility with high emphasis on the dominating ideology which was Anglicanism at that time.
I wonder how long this has been going on.. even with other ideologies. Something needs to change with academic research. I think we would feel sick knowing the extent of the lies we have been told
If by "radical feminist theories" you mean propping up male perversions then sure, ill believe you
The problem is that "clinical psychology" is not a 'science' by any stretch of the imagination. You wife is getting a PhD, I'm sure it is very hard work. but that field is not a 'science'.
The problematic behavior you describe is a very good indication that it is not a science. Scientific results are reproducible by anybody not just your political ally. when your 'science' require you to be a 'believer' to see the effect. for your 'science' to work.. that is called a scam. Anybody can try to step over a clif, and no matter your politics or religion: you will fall. Gravity does not care what you think.
Similarly, Biology explain clearly (and has for more than a century in great details) that sex is determined at fecundation, and not 'assigned at birth'
I *love* the fact that you give props to the author of the source material that you based this episode on. It's almost like you're being... intellectually honest!
There's never enough good Italian pizza, doesn't matter if you paid $4 or $8.
No pineapple!🙀
@@capt.bart.roberts4975, he said Italian Pizza, so the lack of Pineapple is implied, lol 😂
If you don't get any cannoli, the reviews are going to suffer (at least that's my hypothesis...).
You nailed the root of the problem near the end - quantity is rewarded, while quality often is not. A researcher who publishes 10 mediocre papers will be rewarded more than a researcher who publishes 1 excellent paper.
Bad science is excellent, read the whole book!
as well as bad pharma it's also excellent - Ben was great on this stuff, he seems to have been relatively quite of late though, I wonder what he's up to these days.
@@RichardJActon I must read Bad Pharma
Subscribed. Looking forward to learning more. P-hacking occurs as endemic in much of the "truths" and "facts" touted to support many current political stands.