It's crazy how my view has changed over the entirety of my education; in my undergrad, I automatically assumed that anyone who was "smart enough" to publish a paper was deemed to be a pillar of the industry, only to enter grad school and find out that scumbags exist in every field :)
The two people I know who got published in undergrad did jack shit to get on the paper. It was "thanks for cleaning the dishes and attending meetings, here's a pub".
Its pretty much how the world works in general, not just academia. The US rewards a scam economy. The problem is the incentive structure. Reproducing studies? No, find the next "big thing" Publish in lower impact journal? No, must be in Nature. Access to raw data? No, too much work
I realized that while I was an undergrad and volunteering in a lab. The sheer amount of work needed to publish anything of value would be way beyond the physical limit of most people. Just doing literature review alone is more than a full time job. A very well organized lab, with people specializing in specific things (lit review, benchwork, data analysis, etc) would maybe churn out 2 or 3 good papers a year. Most seemed to be doing way more, but when you read the papers, you realize it's essentially the same work and data that has been republished over and over in different journals. Never mind that all the citations are from prior work done in the same lab, or another closely related lab in the same university/institute.
The main thing that determines if someone gets tenured at a College isn't their ability to teach. It's how many times they've been published. You can be the best teacher in your field and get passed over in favor of an incompetent teacher with no communication skills whatsoever if his bibliography is bigger than yours. And once you become tenured , it doesn't end because your bibliography is still the most important metric for advancement and funding. As long as "Publish or Perish" is the norm at colleges , you're going to have rampant academic fraud.
Frauds can also be detected if the following points are all true to some extent: a) fails to keep MS/phd students in their lab, b) has lots of collaboration, getting idea from collaborators, making students do everything with 0% help, securing corresponding author position doing simply nothing. c) hiring too many post docs. d) reaching out to great professors and doing their side works with the help of high-paid post-doc. As a result, being able to get Co-PI position in next grant. e) Being Co-PI in many grants and have no PI of any grants.
This is a good list, though your point (e) can be wrong depending on the person's specialization. For example, someone who's a technical specialist, e.g., a statistician or specialist in certain data gathering methods, is quite frequently a co-PI but quite rarely a PI.
@@samsonsoturian6013 Sure is. I was a co-PI with someone who was a PI like that. Terrible. There was massive turnover that definitely affected the work. Bleh. What a shitshow.
5 is a major eyebrow raiser...and that guy better be a workaholic with 15 rock star grad students and ZERO personal/family/social life, and is graduating at least 4 PhD's per year. Moreover, they only keep up that pace for a few years at a time.
It depends on how people operate. I'll use the example of group A: a lab is in a saturated field, is fully independent, and the gradstudents have nearly identical skills but are working on competing projects. The publication rate isn't going to scale very well for grad students or the PI. In this case, a sustained rate of 5 publications a year is impressive. Consider group B: The PI has developed a very specific measurement technique that almost nobody else can do and has a reputation for being easy to work with. Already well vetted experiments will pop up from other groups (who do most of the work and suffer most of the costs from the failures) and publications will scale well for all in the group if the PI is fair. I think you should be able to get 10 publications a year at least. Group C: Experimental particle physics or astronomy groups... Everyone blobs together with massive author counts and inflates publication/citation rates so much that nobody looks at H-index anymore.
We are seeing how people that lack integrity and honesty can easily game the system. And that's just not in publishing scientific papers. It's epidemic throughout our society. Great video!
They won't even check. Essentially, if a news blurb comes out: "Dark chocolate now proven to make you wealthier", you can *GUARANTEE* that dark choco sales will increase. Guaranteed.
A lot of people WANT fraud to go unnoticed. It's like the Olympics, we want to entertain power fantasies about what is possible, also fraud is the only way to change demographic trends in academia
They have literally been trained and coerced, with the threat of social ostracism, to listen to The Experts and trust The Science, no matter what. Acknowledging that scientists may be not only wrong, but *willfully* corrupt and serving a non-scientific agenda - is basically sacrilege of The Science, the modern church that has little to do with the scientific method.
I think the number one way to catch bad science ought be replication. Requiring independent replication of results will be a much more reliable way of finding bad science and bad scientist than counting publications, auditing for conflicts of interest or investigating images. Theories built on bad data will not stand up under replication regardless or whether the data were fraudulently made up, the result of unintentional methodical problems or bad luck.
Unfortunately replicating results doesn't get big fancy grant money. 🙃 So now we have 20 years of dodgy claims with no incentive to check them, except in the big fields like cancer research
Be careful of what you wish for because lawsuits can and are used by bullies who want to cause problems. Most scientists don't have deep enough pockets to ward that off.
What is the point of academic journals if they don’t already do all of this??!! All this time, effort and resources that could be used to do ACTUAL RESEARCH but has to be wasted on this - not to mention all the funding that they take from honest academics!! For just the autism article ALONE The Lancet and everyone involved should go “straight to jail!” 😡
Academic journals rely on peer review to detect bad science. But peer review is often sort of a cursory check-the-box deal. At worst, it's a mutual back scratching deal. When I was reviewing scientific papers, I would see short papers about analysis of one or two samples using the research group's favorite methodology. I would recommend rejection because 1) little to be learned from one or two samples, and 2) the group had already published multiple papers demonstrating their techniques, so there's nothing new here. It was thirty years ago, but even then I doubt my approach was commonplace. It's probably much worse now.
7:45 That is the single most true statement I’ve encountered so far in academia. I’m a high schooler (or rather was, I graduated as of June 2024) and I had to write a couple of papers for the International Baccalaureate programme. I was writing a paper on breast cancer rates and since I couldn’t conduct a study on that myself, I needed to find a database- an unaltered one. Using an analyzed or altered database (ie the data that would be present in a paper with all the conclusions) would result in me failing as it would be considered plagiarism. So, I contacted many authors of many papers asking for their raw datasets that were said to be available upon request. I was either told to ask someone else, ignored, or straight up denied. In the end, I didn’t get any of the datasets I wanted and spent about 3 months until I found in a very roundabout way a database that could offer me the data I sort of needed… not exactly, but it worked.
I went to a college that was known for hands on and in my last couple years they were making a shift toward wanting professors to publish which was never an issue before. I had my thesis advisor pushing me to ignore problematic data just because he wanted to publish and it really rubbed wrong.
I think it's time to treat scientific papers like we treat open source code: Something that EVERYONE can audit and analyze, without any paywalls or any other impediment. Also, I think that each university should host its own server for provisioning data about the publications and have them authenticated through a decentralized blockchain method.
For my data analysis, I've preselected 1 TB real and simulated data. There's no way my university is going to keep this data longer than I stay working on this data, after that it will be removed, because the cost of storing such big data for decades without any further use is just ridiculously expensive.
Universities do increasingly have long term storage for research data. You can always reach out to the researchers and ask if they'd share their data with you or if its being hosted somewhere. Of course lots of data can't actually be shared even when it is stored, for reasons related to privacy laws, ethics policies or commerical agreements.
My understanding is it could be very challenging to convince journals that the original dataset must be published, because it could be an ethical liability if participants have consented to scientists looking at their data but not it being published unabridged and itd make anonymising data more complicated in terms of how much information makes a participant identifiable vs removing necessary context
The survey you did could have benefited from lower response options. 70% of respondents choosing a single option is absurdly high. It is possible that many of your viewers consider more than 5 papers a year to be too many, but there wasn't an option for that so they choose the nearest one: 10+. However, this does not contradict your conclusions in any way. I just felt it important to comment on it, given that the video is about scientific accuracy. Keep fighting the good fight.
In cases where you see people putting out tons of papers, there’s a simple answer. They are the head of the department and are putting their name on every paper that comes out of their department, even if they never even saw the paper. And the underlings who actually do the work don’t complain because a) they have no power and no choice and b) having their famous professor’s name on the paper increases their chances of actually getting published. This is extremely common in the medical field.
I agree most strongly with the last point. Data needs to be more easily available throughout the scientific community, and should really be available to the public too
I’ve read through the code of one published paper that was gibberish. There were functions called that weren’t defined in the script/did not exist in any of the packages. I’m familiar with the language (r) and packages used. Even looked through older versions of the same packages and couldn’t find the functions they had called. If they’ve produced the code it’s often worth looking over.
The question depends not only on field but on what “publication” is. There’s a “tradition” of putting as authors people who have contributed to the paper, so if you built some machine that say 30 grad students are using, and they each publish 2 papers, it wouldn’t be rare to have 60 publications without you having to do pretty much anything. I hate how academics trade and game publications, but I wouldn’t jump to conclusions based solely on the number of publications.
Pete, you should do a statistical analysis on all types of scientific fraud to establish a general understanding on the extent and magnitude of this problem.
That would be something that would require a pretty massive research grant to do and would be field-specific as he said, but I think the basic signals are there: Unbelievable sustained publication rate is, in my view, the sine qua non. Unfortunately, universities and granting agencies really do like their rock stars, not us more ordinary working faculty who are the ones who make the university run and publish at a more reasonable rate. Some years we have a lot of articles (my best year had 7 articles) and other years are more down years (uh... 0, or 1).
@@DrJTPhysio - start with a pilot study. Sample of N papers selected at random from all published original studies within X field (or from c search term), with primary authors university faculty, between Jan. Xxxx and dec. Xxxx Or N number of primary research papers selected from total pubs new pharmaceutical agents in 1989 versus 2024 that demonstrate conflict of interest as defined by yada yada
@@ColdHawk there's a bit more to it, but I get where you're going. I'm working on a study that implements a similar research design. Doing this alone would be pure pain.
The first thing that should be done is to pay reviewers for their time and have a grading system for reviewers. Reviewers are incentivied to spend as little time as possible on reviewing a paper. Reviewing time should be mandated on an institutional level and paid for by the cost to publish fee by the for-profit publishing industry. Yes, this may lead to higher cost of publishing and slower publishing, but also fewer fraud cases and less retraction rate of papers.
The ones with this amount of papers published per year could be working in large physics collaborations, like those at CERN: ATLAS, CMS and even the progressively smaller ones such as LHCb, Alice and so forth. For example, last year, the ATLAS collaboration published 111 papers, each one having an author list of about 3000 people. Up to now, in 2024, there have been 69 ATLAS papers published. Of course, these large physics experiments are a bit funny that way because they couldn't even happen if they wouldn't be run this way. But I don't doubt that there are a lot of fraudsters claiming to publish hundreds of papers per year in small teams or on their own. That's a completely different story.
@@Anton-tf9iw did you miss the part where I said that this is the only way to run this type of physics experiment? In high-energy physics experiments aren't tabletop stuff. They are giant apparatuses that are some of the most complex devices ever created. You think people would contribute to building them without any recognition? Out of those 3000 ATLAS authors, a few dozens are actual physics theoreticians, a few hundred do physics data analysis, which could be any other type of data analysis in most cases, and then, the vast majority, are doing engineering research, development, manufacturing, logistics etc. And all of this is done by people working in research institutes, where publications are essential for career advancement. So nobody would contribute if their contributions were not recognized. And without all those contributions, nothing would be accomplished. So you criticizing the way things are done only show you don't understand how things are done.
@@jeffersondaviszombie2734 Engineers, developers, machinists, and warehouse operators do not need career advancements in physics. They may have other ways to advance their careers or they could also be happy to continue to work in their positions as professionals. I think it is ludicrous to include them as authors in physics publications even if they all work in an institution.
The number really depends a lot on authorship level, field, and subfield. For example, Erik Demaine, who is in my field, and overlapping with some subfields of mine, publishes around 20 papers a year, and is well respected. Since it is CS theory (AKA, math stuff), there is no data to be faked, and I haven't ready any flawed proofs of his. But certainly, the more papers that is published by someone, the more I'm suspicious. For social sciences and physical sciences, I would be very suspicious of even matching Erik's publication rate.
It also depends on what kind of authorship we’re talking about. Being the first author for 10+ papers might be suspicious but if you’re the PI of a larger research group then you might be entitled to co-authorship to 20-30+ papers, or even more. The PI’s actual contribution could be put into question but academia is, in some ways, a bit of a pyramid scheme. I.e. the people on the top get most of the resources, such as publications.
Single authorship on all or most papers someone publishes is another red flag. Another eye should be kept on journal editors, as they can help their buddies publish their research before someone else via simply rejecting a paper that tested that hypothesis first. Also, journal editors who gatekeep new directions in the field as a favour to their buddies is a form of scientific misconduct. As someone else pointed out, a search of how many harassment complaints are against a researcher is an indication that they have also gotten engaged in scientific misconduct.
7:40 I think this is unfair. Sometimes the dataset is just very large to host online. Sometimes the data is confidential. As for the codes, the ideal is indeed to release them. From experience though, well-documented working code that reproduce all the results with graphs is sometimes almost as much work as writing the manuscript itself. But most of the time when I wrote to an authors for their codes, they were kind enough to share them with me.
I think the uncut video recording of the experiments must also be published unless such video would compromise the identities of people. For example if you are studying microbiology, the video recordings of you preparing blots, putting them in the microscope etc. need to be published.
Regarding the scientists publishing a ridiculously high number of papers. Is that really a sign of fraud or does it just mean that the scientist is running a huge lab and all the junior researchers are putting his name on their work?
Even with a big group, 200 papers is way out of reasonable. There is no way that person has seriously contributed to those papers. I’d be surprised if they had even read them all.
@@danielschein6845 indeed, but seems to be conflated in a lot of comments on this type of video. Also, it inflates the risk of errors or fraud by the main author or contributor not getting spotted.
Great video like always, thanks! I wonder, though, if you could also look into the darker fallacies of science? The ones that are done more on an unconscious level. The obvious examples are (and I think you already touched upon those in past videos): - Statistical correction for multiple testing. In biology, p>0.05 is widely accepted as "significant", after Bonferroni (etc.) correction for multiple testing. All that means that, on average, 1 in 20 studies will yield a "significant" result. Well, in certain fields, hundreds of papers are published every year - so we end up with all the false positives, while the negative results are much less likely to see the eye of the reader. - Statistics in general (I won't go into cohorts here, that's a topic all by itself). Statistics is based on a random distribution, whatever that distribution might be. Especially biologists, who are not experts in statistics, working with statisticians, who are not experts in biology, can easily produce way to optimistic p-values, without being aware. In my experience, reviewers only rarely catch that, because they are either biologists or statisticians. - Reproducibility. This is particularly apparent in GWAS studies, only a tiny fraction can ever be reproduced by anyone else, and even if, never exactly. Yet, the claims of GWAS ("you have a 14% higher chance of getting this or that disease if...") are often widely reported by the media, hence the incentive to do these studies. Even if the statistics used is sound, this comes down to case vs. control selection, sampling bias, etc., which invalidates the underlying assumptions. I think there is a need to also publish negative results wrt reproducibility, as well as positive ones. Because the proof is ultimately in the pudding. All in all, yes, there are the obvious bad apples in the scientific community, the ones who willfully falsify results. I think these are in the minority though, and that most false results are published out of (tolerated) ignorance.
I think about this all the time. Great description of p-values, thank you for the sanitizer check. 1/20 times accidentally rejecting the null hypothesis is pretty damn high!
Coming at this from a computerphile angle, I think in addition to publishing the original data, an academic standard akin to hash fingerprinting data to visuals ought to be developed if such is possible and the establishment of provenance standards for data and visuals.
what do you mean primary author? if you only mean author I don't think so... because sometimesnyou need collaboration on fields for a research, example biologists and physicians, then, if it is the work of students, you need to include the supervisors and PI. if you need a technology only available in another lab, you'll have to include whoever conducted the experiment in the other lab... I understand that a max of 8-10 authors if sufficient for most of research
this type of mindset is what excluded me from 2 papers in which I had significantly contributed on. They thought there were too many authors listed so they excluded the undergrad students
It depends a lot on the area. In some areas like particle physics, you'll see A LOT of authors. In other areas, like math or philosophy, multiple authors is rare. That said, I do agree with your wife that author lists have gotten pretty out of hand in some areas... social psychology, I'm talking about you.
"And occasionally these things do happen in SCIENCE, right by NATURE they have to." Was this pun intended when talking about rare and surprising results making it to high-impact journals?
#6: genealogical connections to peerage. (These are invariably the wastrel children of hidden elites gifted sinecures in prestigious fields, with ghost-drones creating "their" work; this is easy to get away in fields where big cheeses have lots of assistants under them, and where "findings" generally do not have any immediate practical benefits such that there's an incentive to attempt duplication during the course of product-development. E.g., much if not most of theoretical physics has been treasury-soaking fraudulent hokum for nearly a century.) #7: any "public scientist" (which should be an oxymoron by now) should be heavily scrutinized.
Very good discussion. I think point 5 can be a problem in some areas of research, where privacy requirements pull against data sharing. For example, getting permission to release data from actual patients or minors can be quite onerous. That said, bad faith researchers can and do make use of these requirements as a shield. One of the worst examples I've encountered in my professional existence---won't name names but I'm talking B grade horrid, not A grade like the ones that have been getting outed to substantial publicity, such as that real peach Francesca Gino---loved to hide behind the IRB. In my time working with that team, I didn't see any data made up, but I did see quite a lot of really shady data practices.
@@User-y9t7u I'm not naming names on UA-cam, but I filed complaints to the relevant authorities, including the grant agency. As to the practices, basically the PI would state that data would not be released due to FERPA (educational data privacy law) but then was perfectly happy to do things that pretty clearly violated FERPA, such as give out datasets to people on the team who had no business having things like participant names. Simply put, she didn't want to have anyone else seeing the data. I wasn't convinced she understood the difference between research and data analysis done for course improvement, quite honestly. She's one of those people who's charismatic and clever, not actually smart.
Data availability. One paper i wanted to look at the data myself. It was apparently a PDF. However the data was protected by a password. I was able to copy the results by hand but this was a nasty trick
It's funny when they say "Upon reasonable request," as if there is such a thing as a request to see data (personally identifying information notwithstanding) that isn't reasonable.
Many of the papers in computer science are now providing the data and the programs used to create the data so the results can be reproduced. However, it is problematic to recreate some of these things because of the amount of time and the knowledge it takes to set up the actual "experiment". In the case of biology, providing the data usually means providing the readings taking in the lab. In the case of computer science, providing the program and the results means you can redo the lab experiments. Setting up these experiments and doing them takes a lot of work, however, this is the classic "reproduction" that we expect from people who do pure science. That is, if you say you created a cure for a disease, we usually don't trust that you have a perfect cure until other people have seen it. Unfortunately, with many drugs, there is no "100% cure", so people can fudge the data and then say, well, my results were different because of the statistical sample. I had a 20% remission rate and you did not see it because of this other factor. When there is no clear answer, i.e., 100% cure, it is much easier to misrepresent the results and get away with it.
It depends hugely what the "author's" role is in producing the paper. A senior prof may have a peripheral mentoring role for a large number of PhD students and post-docs. BUT it is a red flag for extra scrutiny.
I used to love publishing papers. But... There is science, and there is science. One is pursuing truth, and the other is pursuing paper quotas. While you may pat your own shoulder for catching the bad scientists, you'll find that those are found in the latter branch, but in a process you're also punishing the former truth seeking branch, rising suspicion of everything. Now, it may come as a surprise to you, but only the institutionalised scientists are under constant pressure to publish or perish. So I as an independent scientist with a PhD in electrical engineering in my pocket, and no pressure at all to play a rigged game - won't play it. Without any institutional backup, and the ever rising participation fees, it is also much easier on my pocket. Now, whose loss would that be? Also the small matter of contribution. A single paper is good with a single contribution, however insignificant. Real researchers used to pour down their whole research in a paper, with many significant contributions in a single paper. But if you're pursuing quotas, you'll make sure that 20 or so people milk a single contribution to death and back. Such skimmed papers are now norm, and a real research is now suspicious as out of place and too bombastic. Go figure.
I am not a scientist. The discouraging aspect is to undermine my confidence in any research. Even if I try to look into a paper, it would be almost impossible for a layman as myself to establish validity.
Thus, social media's fascination with citing academic papers to support their opinions, yet matched with revulsion to the science establishment as a whole. You get people who simultaneously swear that anti-parasitics must work against covid because the FDA has approved them for human use, and yet that the same FDA must have rushed the vaccine emergency use approval. 🙃
Thus, social media's fascination with citing academic papers to support their opinions, yet matched with revulsion to the science establishment as a whole. You get people who simultaneously swear that anti-parasitics must work against covid because the FDA has approved them for human use, and yet that the same FDA must have rushed the vaccine emergency use approval. 🙃
I've published a few papers and the group I am in publishes many. Journals will not host our data as it is too large. Some fields have community solutions for these issues, most don't.
Do you consider results that are very socially popular as falling in the "conflicts of interest" category? Maybe affecting the works of the likes of Roland Fryer and others who research such touchy topics?
If a pseudo-scientist hits on results which the general public WANT to see as being valid and true, you can bet that the P-hacking will commence, baby! 💪😎✌️ There's HUUUGE money in telling billions of people what they already hoped was "true". 😂
Do you "first author"? I find it very surprising you didn't specify this. Last authors, for example, are typically lab/science area heads who will appear on every paper authored by their team.
Another way is looking how diverse the fields of publication are. if someone publish 40 papers a year, but all of them are LLM research, ok, LLMs are exploding right now, with enough GPU you can automate lot's of experiments and some of them will get interesting results. If they publish 10 papers, one in AI, one in cybersecurity, one in robotics, etc. You may wonder how a person can be an expert on so many fields.
I've read many papers where the abstract reads more like a Billy Mays ad than a summary of the research, and I've read plenty of papers with spelling/grammar/formatting mistakes. I have no idea if there's a higher incidence rate of fraud in such papers, but it always makes me way more suspicious.
It’s important to change the whole apparatus built around the scientific method. Having to pay to be part of the scientific discussion it’s not acceptable, peer reviewers that peer review badly are unacceptable, Publish or Perish is unacceptable.
My father did brain and vision research and he did surgery on his "control" animals to adjust them for his desired results. Nobody ever checked his animal colonies, just his data. Fooled his graduate students, too, who assisted in the experiments. The data was real, but the theories and results were bunk. He wept, routinely, carrying on about how much he loved his research animals for their "beautiful and selfless sacrifice" for science. He always said it in exactly the same way he said he loved me. I think he really enjoyed the power to kill them slowly while taking apart their brains and giving them stimulus to respond to as they died. I caught him when he brought home one of his animals, a cat, whose eye was torn. I accused him of cheating when he told me it's cornea was torn in occular surgery, because he introduced it as a retired control animal. (For some reason he initially assumed I was upset that it wouldn't be able to sleep or shut out light.) That was the only time he blatantly and openly threatened my life. His theories were eventually discredited, since none of his results could be replicated. And he lost his position at the university after years of sexual advances and predatory pressure on his female students. I guess sociopaths gotta be sociopathic. I felt more relief than anything, when I heard he had died.
Beware of generalizations ..... "data available upon request" or "data available at http...." in many cases is a move in the good direction you state (and which I support) and the only means of telling your potential readers that you're willing to share your data.
can you please make a video on how to spot ai generated text in papers? My professors are saying "it's so obvious by the way ut is formated" but i can't really see it. Can you please share your thoughts on how to spot it?
A red flag that should prompt investigation would be cultivating loyal followers and building celebrity status. These are invalid reasons for believing an individual's hypotheses. BUT there is a big complication, some very good scientists have also produced false theories and suppressed rivals better theories, e.g. Georges Cuvier's suppression of Lamark's evolution, and Karl Pearson's suppression of research into causal inference. In both cases men who had made major contributions to their field, also held their field back for three generations.
Realistically, how would you implement “publishing all the data alongside the article”? Data is often expensive and proprietary. Once that data is published because someone published a paper with it, no one else would ever buy the data. They can just download it from the journal. The data providers will then never sell the data to academics in the first place.
Not a red but orange flag is someone getting too many successful grant applications from fields other than their own, for example, AI to address some topic in the humanities.
We need a "Day zero" reset before AI completely undermines conventional fraud detection. Removing all bad actors now would act as a strong deterrent, but it would also remove the commercial market for fraud. The last 20 years of papers should be for fraud using image checking and similar software. Then, not only should all affected papers be redacted, but also all authors, labs, and companies involved in the fraud should be blacklisted for future publication. That blacklisting should only be lifted with increased scrutiny of publications on an individual basis.
The solution is to make it irrelevant to have large numbers of papers. Use a logarithmic metric, with a maximum at --say-- log(12 papers/year) and decreasing for larger number pf papers per year down to zero if you publish --say-- 24 papers or more per year. Actually I think the numbers should be closer to half of those above.
How do you get the scientific community and journals change their baseline practices? Is there a committee/professional organization that defines these?
I publish dozen or more papers a year, though only a few are first/sole author. In planetary science we often work in broad flat teams of individual 'independent' scientists associated with space missions (without the more feudal lab head/minion hierarchy associated with a lab). The mission data are contractually obliged to be publicly available.
Administration takes +40% off of any incoming grant, claiming administration costs to support research efforts. "There just isn't the manpower or infrastructure doing this kind of work." Ah yes, if there's anything universities are light on it's the infrastructure and oversight...
The 10 papers per year limit is too low imo, it really depends on the scientists' position and career advancement. For young scientists, 3-4 would already be a high number since they typically run the studies. However renowned scientists are often collaborating with many different labs and supervising multiple people. Many of them end up co-authoring 20 papers without any cheating involved. The "reasonable" number of publications depends a lot on what work is being done, and how well-estalished the person is. Professors typically do a lot of *reviewing* of nearly finished papers, which takes much much less time than *conducting and writing* the study. The amount of studies being presented to them increases when they have more connections within their field. Some people (especially professors) are also workaholics human machines who will review your work as soon as you ask them, even if they're in the middle of their vacations in the middle of a remote mountain range. Now if we are talking about 10 first-author papers, I would agree that the number is high regardless of the position.
10 as primary author, yeah, would be suspicious, but I think my advisor averages around 10 overall per year, although only 1 or 2 of them with him as primary author that weren't invited articles. Nothing suspicious about it, we do a very niche kind of physics simulation, and he ends up being cited for fine tuning and advising other groups on planning their simulations. 50+ is very blatantly sketchy, though, even as a secondary author.
A software solution that sweeps for things that aren't straight up copies sounds very difficult and potentially problematic. I think that, at best, something like that should flag stuff for human review. It could be difficult to use unless it can give the human reviewer an account of its reasoning (which modern AI systems mostly cannot).
From personal experience in the role, If expert reviewers of publications have to go through all the original data then a far larger pool of reviewers will be needed. Experts have to devote quite a lot of time already to reviewing just the bare publication - work which is usually unpaid by the journal editorial boards I should add. Where are these extra experts going to come from?
I'm relatively new to academia and struggling to get one paper published a year becayse I spend too long sweating the details. It really disheartens me to see people publishing loads and loads every month!
Regarding the "10 Papers a year" thing. The reason why I am stating this, is because I assume to be an author you need to at least have "read" and "understood" it. Both of these need time. if you are publishing a paper a day, when are you actually doing the data generation part?
I'd say 2-3 is average but 5 is pushing it. Several papers published within a year is not only a sign of fraud but also of predatory tactics and exploitation such taking undeserved credit, forcing post-grad students with publication quota, and blatant authorship selling/trading within circles.
"Publish or Perish" was always a huge mistake. In only incentivizes bad behavior. Seriously, some dude is really going to put out over 500 papers this year? He's clearly gaming the system somehow
If you are a lab boss, you will be an author in everything from your lab. Publication rate isn't as simple, especially for collaborations and big institutions.
you should cover what the scientific community is proposing doing about this (fake data, but also p-hacking, replicability, publication bias etc), i think its a known problem to many scientists also the history of how and reasons why we got such ridiculous seeming systems like not having to publish the data along side your paper
Here’s how to tell (follow the money). For example when Sam Bankman Fried gave the researchers millions of dollars he embezzled from FTX immediately AFTER they announced Ibernectin was not effective.
This depends on what is meant by "published a paper". Co-author is not equivalent to doing any work. Anyone who's been on a conference call with the IPCC probably has their name stuck on a hundred papers without their permission.
I just finished reading another Carl Elliott book, his latest: The Occasional Human Sacrifice. Fraud isn't the half of it. Look into studies with many deaths.
These are all reasonable ideas to catch basic frauds. We should remember though that this is an adversarial system and fraudsters will get better at beating basic automated checks, so we'll really need to go deeper over time and maybe even change the way science is done if we really want to root out fraud.
We're all way too careful. Don't be afraid to hurt feelings of 0.1% if the upside is removing the filth of bad research. We just need to require more evidence from suspicious actors. >10 papers a year? You must supply all of the raw data, with very specific requirements for details. I think the requirement should be "An external non-field related person should be able to deduce the details of the experiment: how the participants were contacted, what they were given at what time. If this was a mechanical/chemical experiment, there has to be equipment specs, settings details." So that replication is no longer a guesswork, and there aren't "obvious" things that field-related scientist like to omit. For instance I know every detail Maxdiff study. I may just say RLH was X and expect the reader to understand. Such expectations should no longer be allowed. Specify the formula, and how you obtained it. Was it individual scores, aggregate scores, etc
I don't agree with you on the Data Availability Statement (below a suggestion). Data in biochemistry etc is typically very simple, there it can be standard practice. However, often the raw data impossible to handle, if you're not an expert in the field - typically there are maybe 100-1000 true experts in each specific subfield. Therefore, first, randomly upload data for the community won't really benefit anyone. The chance ppl will wrongly treat the data out of missing expertise is just really high. Second, making everything available we will just feed AI to know better, how to make raw data, which then will be used by asshole scientists to create fake raw data and deposit it. Therefore, another suggestion: First, here over in Germany, the German Research Foundation requires universities to store ALL data. If you don't do that, you won't get funding. Second, nowadays every journal provides paper statistics. Just add one for "Requests / Data shared / Not shared". Thats even better because honest scientist (who the majority) love it if somebody shows interest AND uses their data and has to cite them! In this way data is available, support is provided by the authors, the community communicate, and we will easily spot assholes.
More than an average of 2 papers, where you are the first or corresponding author, per research staff in your lab, is where I would say you have to be gaming the system somehow. I have 7 people in my lab, 5 of them I would consider actual researchers, and 10 would absolute insane from our group.
It's crazy how my view has changed over the entirety of my education; in my undergrad, I automatically assumed that anyone who was "smart enough" to publish a paper was deemed to be a pillar of the industry, only to enter grad school and find out that scumbags exist in every field :)
The two people I know who got published in undergrad did jack shit to get on the paper. It was "thanks for cleaning the dishes and attending meetings, here's a pub".
@@Im0nJupiter
And being a good-looking chick helps IMMENSELY.
Its pretty much how the world works in general, not just academia.
The US rewards a scam economy.
The problem is the incentive structure. Reproducing studies? No, find the next "big thing" Publish in lower impact journal? No, must be in Nature. Access to raw data? No, too much work
I realized that while I was an undergrad and volunteering in a lab. The sheer amount of work needed to publish anything of value would be way beyond the physical limit of most people. Just doing literature review alone is more than a full time job. A very well organized lab, with people specializing in specific things (lit review, benchwork, data analysis, etc) would maybe churn out 2 or 3 good papers a year. Most seemed to be doing way more, but when you read the papers, you realize it's essentially the same work and data that has been republished over and over in different journals. Never mind that all the citations are from prior work done in the same lab, or another closely related lab in the same university/institute.
@@Skank_and_Gutterboy that's true in every field, not just science.
The main thing that determines if someone gets tenured at a College isn't their ability to teach.
It's how many times they've been published.
You can be the best teacher in your field and get passed over in favor of an incompetent teacher with no communication skills whatsoever if his bibliography is bigger than yours.
And once you become tenured , it doesn't end because your bibliography is still the most important metric for advancement and funding.
As long as "Publish or Perish" is the norm at colleges , you're going to have rampant academic fraud.
Yep - it's baked in . It's necessary to be a fraudster .
Frauds can also be detected if the following points are all true to some extent:
a) fails to keep MS/phd students in their lab,
b) has lots of collaboration, getting idea from collaborators, making students do everything with 0% help, securing corresponding author position doing simply nothing.
c) hiring too many post docs.
d) reaching out to great professors and doing their side works with the help of high-paid post-doc. As a result, being able to get Co-PI position in next grant.
e) Being Co-PI in many grants and have no PI of any grants.
This is a good list, though your point (e) can be wrong depending on the person's specialization. For example, someone who's a technical specialist, e.g., a statistician or specialist in certain data gathering methods, is quite frequently a co-PI but quite rarely a PI.
@@crimfan I agree with you. The points I mentioned are all linked together. So, the points are all "AND"s
This should be the pinned comment.
Even without fraud that's definitely a shitty workplace
@@samsonsoturian6013 Sure is. I was a co-PI with someone who was a PI like that. Terrible. There was massive turnover that definitely affected the work. Bleh. What a shitshow.
5 is a major eyebrow raiser...and that guy better be a workaholic with 15 rock star grad students and ZERO personal/family/social life, and is graduating at least 4 PhD's per year. Moreover, they only keep up that pace for a few years at a time.
It depends on how people operate.
I'll use the example of group A: a lab is in a saturated field, is fully independent, and the gradstudents have nearly identical skills but are working on competing projects. The publication rate isn't going to scale very well for grad students or the PI. In this case, a sustained rate of 5 publications a year is impressive.
Consider group B: The PI has developed a very specific measurement technique that almost nobody else can do and has a reputation for being easy to work with. Already well vetted experiments will pop up from other groups (who do most of the work and suffer most of the costs from the failures) and publications will scale well for all in the group if the PI is fair. I think you should be able to get 10 publications a year at least.
Group C: Experimental particle physics or astronomy groups... Everyone blobs together with massive author counts and inflates publication/citation rates so much that nobody looks at H-index anymore.
We are seeing how people that lack integrity and honesty can easily game the system. And that's just not in publishing scientific papers. It's epidemic throughout our society. Great video!
Based on my observations: A lot of people don’t care about scientific fraud
A lot of people don't care about fraud
They won't even check. Essentially, if a news blurb comes out: "Dark chocolate now proven to make you wealthier", you can *GUARANTEE* that dark choco sales will increase. Guaranteed.
A lot of people WANT fraud to go unnoticed. It's like the Olympics, we want to entertain power fantasies about what is possible, also fraud is the only way to change demographic trends in academia
They're happy to have their pre-existing views supported, yep.
They have literally been trained and coerced, with the threat of social ostracism, to listen to The Experts and trust The Science, no matter what. Acknowledging that scientists may be not only wrong, but *willfully* corrupt and serving a non-scientific agenda - is basically sacrilege of The Science, the modern church that has little to do with the scientific method.
"Right to jail."
🤣😂
"Undercook, overcook"
I think the number one way to catch bad science ought be replication.
Requiring independent replication of results will be a much more reliable way of finding bad science and bad scientist than counting publications, auditing for conflicts of interest or investigating images. Theories built on bad data will not stand up under replication regardless or whether the data were fraudulently made up, the result of unintentional methodical problems or bad luck.
Unfortunately replicating results doesn't get big fancy grant money. 🙃 So now we have 20 years of dodgy claims with no incentive to check them, except in the big fields like cancer research
@2:22 "the manpower or the infrastructure" If you can popularize lawsuits against scientific fraud it will become a self driving mechanism.
That will NEVER happen. The wealthy cannot possibly allow it; it would destroy them at the very core.
Be careful of what you wish for because lawsuits can and are used by bullies who want to cause problems. Most scientists don't have deep enough pockets to ward that off.
Or just criminalize it as fraud. Technically no new laws are needed, as anyone can reasonably expect financial gain from getting published
@@crimfanYeah, it's easy to imagine corporations abusing it to suppress studies they don't like
@@_human_1946 then include something similar to SLAPP but with more teeth.
What is the point of academic journals if they don’t already do all of this??!! All this time, effort and resources that could be used to do ACTUAL RESEARCH but has to be wasted on this - not to mention all the funding that they take from honest academics!! For just the autism article ALONE The Lancet and everyone involved should go “straight to jail!” 😡
Money dude, to make money
Academic journals are for-profit publishers. Its the crux of the entire issue, really
Academic journals rely on peer review to detect bad science. But peer review is often sort of a cursory check-the-box deal. At worst, it's a mutual back scratching deal. When I was reviewing scientific papers, I would see short papers about analysis of one or two samples using the research group's favorite methodology. I would recommend rejection because 1) little to be learned from one or two samples, and 2) the group had already published multiple papers demonstrating their techniques, so there's nothing new here. It was thirty years ago, but even then I doubt my approach was commonplace. It's probably much worse now.
7:45 That is the single most true statement I’ve encountered so far in academia. I’m a high schooler (or rather was, I graduated as of June 2024) and I had to write a couple of papers for the International Baccalaureate programme.
I was writing a paper on breast cancer rates and since I couldn’t conduct a study on that myself, I needed to find a database- an unaltered one. Using an analyzed or altered database (ie the data that would be present in a paper with all the conclusions) would result in me failing as it would be considered plagiarism. So, I contacted many authors of many papers asking for their raw datasets that were said to be available upon request. I was either told to ask someone else, ignored, or straight up denied.
In the end, I didn’t get any of the datasets I wanted and spent about 3 months until I found in a very roundabout way a database that could offer me the data I sort of needed… not exactly, but it worked.
Cc your reply/reminder to their denials to provide the primusen data to the journal editors.
I went to a college that was known for hands on and in my last couple years they were making a shift toward wanting professors to publish which was never an issue before. I had my thesis advisor pushing me to ignore problematic data just because he wanted to publish and it really rubbed wrong.
I think it's time to treat scientific papers like we treat open source code: Something that EVERYONE can audit and analyze, without any paywalls or any other impediment. Also, I think that each university should host its own server for provisioning data about the publications and have them authenticated through a decentralized blockchain method.
For my data analysis, I've preselected 1 TB real and simulated data. There's no way my university is going to keep this data longer than I stay working on this data, after that it will be removed, because the cost of storing such big data for decades without any further use is just ridiculously expensive.
@@arabusovseriously? 1 terabyte is like what $40?
Universities do increasingly have long term storage for research data. You can always reach out to the researchers and ask if they'd share their data with you or if its being hosted somewhere. Of course lots of data can't actually be shared even when it is stored, for reasons related to privacy laws, ethics policies or commerical agreements.
@@BenjaminGatti it must be done by means of universities, they need to dedicate people and hardware for just this task. It's expensive
Important genome data like from the UK biobank is kept under lock and key so people don't get any unapproved ideas about things like race
My understanding is it could be very challenging to convince journals that the original dataset must be published, because it could be an ethical liability if participants have consented to scientists looking at their data but not it being published unabridged and itd make anonymising data more complicated in terms of how much information makes a participant identifiable vs removing necessary context
The survey you did could have benefited from lower response options. 70% of respondents choosing a single option is absurdly high. It is possible that many of your viewers consider more than 5 papers a year to be too many, but there wasn't an option for that so they choose the nearest one: 10+. However, this does not contradict your conclusions in any way. I just felt it important to comment on it, given that the video is about scientific accuracy. Keep fighting the good fight.
I fully agree with you sir. Anyone publishing more than one article a year raises my eyebrows.
I agree. Anything more than 2 or 3 is already suspicious. Over five is definitely a strong indication of something fishy.
In cases where you see people putting out tons of papers, there’s a simple answer. They are the head of the department and are putting their name on every paper that comes out of their department, even if they never even saw the paper.
And the underlings who actually do the work don’t complain because a) they have no power and no choice and b) having their famous professor’s name on the paper increases their chances of actually getting published.
This is extremely common in the medical field.
Thomas Edison was particularly adept at this.
Sciencing like a TRUE scientist. This is godtier work
I agree most strongly with the last point. Data needs to be more easily available throughout the scientific community, and should really be available to the public too
I’ve read through the code of one published paper that was gibberish. There were functions called that weren’t defined in the script/did not exist in any of the packages. I’m familiar with the language (r) and packages used. Even looked through older versions of the same packages and couldn’t find the functions they had called. If they’ve produced the code it’s often worth looking over.
great summary. would be interesting to see how journals react to recent developments and fraud cases.
The question depends not only on field but on what “publication” is.
There’s a “tradition” of putting as authors people who have contributed to the paper, so if you built some machine that say 30 grad students are using, and they each publish 2 papers, it wouldn’t be rare to have 60 publications without you having to do pretty much anything.
I hate how academics trade and game publications, but I wouldn’t jump to conclusions based solely on the number of publications.
the violin doesn't work with speech
Pete, you should do a statistical analysis on all types of scientific fraud to establish a general understanding on the extent and magnitude of this problem.
That is a massive undertaking for one person
@@DrJTPhysio shoot I'll help if he shows me what to do lol
That would be something that would require a pretty massive research grant to do and would be field-specific as he said, but I think the basic signals are there: Unbelievable sustained publication rate is, in my view, the sine qua non. Unfortunately, universities and granting agencies really do like their rock stars, not us more ordinary working faculty who are the ones who make the university run and publish at a more reasonable rate. Some years we have a lot of articles (my best year had 7 articles) and other years are more down years (uh... 0, or 1).
@@DrJTPhysio - start with a pilot study. Sample of N papers selected at random from all published original studies within X field (or from c search term), with primary authors university faculty, between Jan. Xxxx and dec. Xxxx
Or
N number of primary research papers selected from total pubs new pharmaceutical agents in 1989 versus 2024 that demonstrate conflict of interest as defined by yada yada
@@ColdHawk there's a bit more to it, but I get where you're going. I'm working on a study that implements a similar research design. Doing this alone would be pure pain.
The first thing that should be done is to pay reviewers for their time and have a grading system for reviewers. Reviewers are incentivied to spend as little time as possible on reviewing a paper. Reviewing time should be mandated on an institutional level and paid for by the cost to publish fee by the for-profit publishing industry. Yes, this may lead to higher cost of publishing and slower publishing, but also fewer fraud cases and less retraction rate of papers.
The ones with this amount of papers published per year could be working in large physics collaborations, like those at CERN: ATLAS, CMS and even the progressively smaller ones such as LHCb, Alice and so forth. For example, last year, the ATLAS collaboration published 111 papers, each one having an author list of about 3000 people. Up to now, in 2024, there have been 69 ATLAS papers published. Of course, these large physics experiments are a bit funny that way because they couldn't even happen if they wouldn't be run this way. But I don't doubt that there are a lot of fraudsters claiming to publish hundreds of papers per year in small teams or on their own. That's a completely different story.
Yes indeed that is quite true.
And why should that bad habit be acceptable?
@@Anton-tf9iw did you miss the part where I said that this is the only way to run this type of physics experiment? In high-energy physics experiments aren't tabletop stuff. They are giant apparatuses that are some of the most complex devices ever created. You think people would contribute to building them without any recognition? Out of those 3000 ATLAS authors, a few dozens are actual physics theoreticians, a few hundred do physics data analysis, which could be any other type of data analysis in most cases, and then, the vast majority, are doing engineering research, development, manufacturing, logistics etc. And all of this is done by people working in research institutes, where publications are essential for career advancement. So nobody would contribute if their contributions were not recognized. And without all those contributions, nothing would be accomplished. So you criticizing the way things are done only show you don't understand how things are done.
@@Anton-tf9iw Whether you think it's acceptable or not, it doesn't mean the papers are fake. Those are two different issues.
@@jeffersondaviszombie2734 Engineers, developers, machinists, and warehouse operators do not need career advancements in physics. They may have other ways to advance their careers or they could also be happy to continue to work in their positions as professionals. I think it is ludicrous to include them as authors in physics publications even if they all work in an institution.
After hearing about Harvard’s disgraced ex-dean, I am not surprised scammers fill the top-rung of the healthcare industrial complex.
👍Stanford too
Healthcare?
The number really depends a lot on authorship level, field, and subfield. For example, Erik Demaine, who is in my field, and overlapping with some subfields of mine, publishes around 20 papers a year, and is well respected. Since it is CS theory (AKA, math stuff), there is no data to be faked, and I haven't ready any flawed proofs of his. But certainly, the more papers that is published by someone, the more I'm suspicious. For social sciences and physical sciences, I would be very suspicious of even matching Erik's publication rate.
It also depends on what kind of authorship we’re talking about. Being the first author for 10+ papers might be suspicious but if you’re the PI of a larger research group then you might be entitled to co-authorship to 20-30+ papers, or even more. The PI’s actual contribution could be put into question but academia is, in some ways, a bit of a pyramid scheme. I.e. the people on the top get most of the resources, such as publications.
Single authorship on all or most papers someone publishes is another red flag.
Another eye should be kept on journal editors, as they can help their buddies publish their research before someone else via simply rejecting a paper that tested that hypothesis first. Also, journal editors who gatekeep new directions in the field as a favour to their buddies is a form of scientific misconduct.
As someone else pointed out, a search of how many harassment complaints are against a researcher is an indication that they have also gotten engaged in scientific misconduct.
The music is a tad loud at the start of the video, just fyi
7:40 I think this is unfair. Sometimes the dataset is just very large to host online. Sometimes the data is confidential. As for the codes, the ideal is indeed to release them. From experience though, well-documented working code that reproduce all the results with graphs is sometimes almost as much work as writing the manuscript itself. But most of the time when I wrote to an authors for their codes, they were kind enough to share them with me.
Graphic distortions e.g. truncated axis on a graph but uninterrupted plot. Not fraudulent as much as deceptive
I think the uncut video recording of the experiments must also be published unless such video would compromise the identities of people. For example if you are studying microbiology, the video recordings of you preparing blots, putting them in the microscope etc. need to be published.
Regarding the scientists publishing a ridiculously high number of papers. Is that really a sign of fraud or does it just mean that the scientist is running a huge lab and all the junior researchers are putting his name on their work?
That would be a possible sign of overreaching. What could be his contribution? Some of these "authors" never read the paper.
Even with a big group, 200 papers is way out of reasonable. There is no way that person has seriously contributed to those papers. I’d be surprised if they had even read them all.
@@AndrewJacksonSE True - But signing your name on to a paper you haven’t even read is a very different issue from research fraud.
@@danielschein6845 indeed, but seems to be conflated in a lot of comments on this type of video. Also, it inflates the risk of errors or fraud by the main author or contributor not getting spotted.
Except that is also fraud since it isn't their work
Great video like always, thanks!
I wonder, though, if you could also look into the darker fallacies of science? The ones that are done more on an unconscious level. The obvious examples are (and I think you already touched upon those in past videos):
- Statistical correction for multiple testing. In biology, p>0.05 is widely accepted as "significant", after Bonferroni (etc.) correction for multiple testing. All that means that, on average, 1 in 20 studies will yield a "significant" result. Well, in certain fields, hundreds of papers are published every year - so we end up with all the false positives, while the negative results are much less likely to see the eye of the reader.
- Statistics in general (I won't go into cohorts here, that's a topic all by itself). Statistics is based on a random distribution, whatever that distribution might be. Especially biologists, who are not experts in statistics, working with statisticians, who are not experts in biology, can easily produce way to optimistic p-values, without being aware. In my experience, reviewers only rarely catch that, because they are either biologists or statisticians.
- Reproducibility. This is particularly apparent in GWAS studies, only a tiny fraction can ever be reproduced by anyone else, and even if, never exactly. Yet, the claims of GWAS ("you have a 14% higher chance of getting this or that disease if...") are often widely reported by the media, hence the incentive to do these studies. Even if the statistics used is sound, this comes down to case vs. control selection, sampling bias, etc., which invalidates the underlying assumptions. I think there is a need to also publish negative results wrt reproducibility, as well as positive ones. Because the proof is ultimately in the pudding.
All in all, yes, there are the obvious bad apples in the scientific community, the ones who willfully falsify results. I think these are in the minority though, and that most false results are published out of (tolerated) ignorance.
I think about this all the time. Great description of p-values, thank you for the sanitizer check. 1/20 times accidentally rejecting the null hypothesis is pretty damn high!
Coming at this from a computerphile angle, I think in addition to publishing the original data, an academic standard akin to hash fingerprinting data to visuals ought to be developed if such is possible and the establishment of provenance standards for data and visuals.
My wife is a scientist. I think more than four as primary author is suspicious.
what do you mean primary author? if you only mean author I don't think so... because sometimesnyou need collaboration on fields for a research, example biologists and physicians, then, if it is the work of students, you need to include the supervisors and PI. if you need a technology only available in another lab, you'll have to include whoever conducted the experiment in the other lab... I understand that a max of 8-10 authors if sufficient for most of research
this type of mindset is what excluded me from 2 papers in which I had significantly contributed on. They thought there were too many authors listed so they excluded the undergrad students
@@kheming01 He means "equally contributing authors", those considered to have done the critical parts of the backbone of the paper.
It depends a lot on the area. In some areas like particle physics, you'll see A LOT of authors. In other areas, like math or philosophy, multiple authors is rare. That said, I do agree with your wife that author lists have gotten pretty out of hand in some areas... social psychology, I'm talking about you.
@Draconisrex1 Yeah, 4 is def a large amount as a PI, especially if it's experimental research and if it's human subjects.
"And occasionally these things do happen in SCIENCE, right by NATURE they have to."
Was this pun intended when talking about rare and surprising results making it to high-impact journals?
#6: genealogical connections to peerage. (These are invariably the wastrel children of hidden elites gifted sinecures in prestigious fields, with ghost-drones creating "their" work; this is easy to get away in fields where big cheeses have lots of assistants under them, and where "findings" generally do not have any immediate practical benefits such that there's an incentive to attempt duplication during the course of product-development. E.g., much if not most of theoretical physics has been treasury-soaking fraudulent hokum for nearly a century.) #7: any "public scientist" (which should be an oxymoron by now) should be heavily scrutinized.
Very good discussion.
I think point 5 can be a problem in some areas of research, where privacy requirements pull against data sharing. For example, getting permission to release data from actual patients or minors can be quite onerous.
That said, bad faith researchers can and do make use of these requirements as a shield. One of the worst examples I've encountered in my professional existence---won't name names but I'm talking B grade horrid, not A grade like the ones that have been getting outed to substantial publicity, such as that real peach Francesca Gino---loved to hide behind the IRB. In my time working with that team, I didn't see any data made up, but I did see quite a lot of really shady data practices.
You should name them or at least the practices
@@User-y9t7u I'm not naming names on UA-cam, but I filed complaints to the relevant authorities, including the grant agency.
As to the practices, basically the PI would state that data would not be released due to FERPA (educational data privacy law) but then was perfectly happy to do things that pretty clearly violated FERPA, such as give out datasets to people on the team who had no business having things like participant names. Simply put, she didn't want to have anyone else seeing the data. I wasn't convinced she understood the difference between research and data analysis done for course improvement, quite honestly. She's one of those people who's charismatic and clever, not actually smart.
@@crimfan send the deets to Judo my man
Data availability. One paper i wanted to look at the data myself. It was apparently a PDF. However the data was protected by a password. I was able to copy the results by hand but this was a nasty trick
It's funny when they say "Upon reasonable request," as if there is such a thing as a request to see data (personally identifying information notwithstanding) that isn't reasonable.
Many of the papers in computer science are now providing the data and the programs used to create the data so the results can be reproduced.
However, it is problematic to recreate some of these things because of the amount of time and the knowledge it takes to set up the actual "experiment". In the case of biology, providing the data usually means providing the readings taking in the lab. In the case of computer science, providing the program and the results means you can redo the lab experiments. Setting up these experiments and doing them takes a lot of work, however, this is the classic "reproduction" that we expect from people who do pure science. That is, if you say you created a cure for a disease, we usually don't trust that you have a perfect cure until other people have seen it.
Unfortunately, with many drugs, there is no "100% cure", so people can fudge the data and then say, well, my results were different because of the statistical sample. I had a 20% remission rate and you did not see it because of this other factor.
When there is no clear answer, i.e., 100% cure, it is much easier to misrepresent the results and get away with it.
It depends hugely what the "author's" role is in producing the paper. A senior prof may have a peripheral mentoring role for a large number of PhD students and post-docs. BUT it is a red flag for extra scrutiny.
Those are some great points to look out for and quite nice how easy it is to check for red flags.
I used to love publishing papers. But...
There is science, and there is science. One is pursuing truth, and the other is pursuing paper quotas. While you may pat your own shoulder for catching the bad scientists, you'll find that those are found in the latter branch, but in a process you're also punishing the former truth seeking branch, rising suspicion of everything. Now, it may come as a surprise to you, but only the institutionalised scientists are under constant pressure to publish or perish. So I as an independent scientist with a PhD in electrical engineering in my pocket, and no pressure at all to play a rigged game - won't play it. Without any institutional backup, and the ever rising participation fees, it is also much easier on my pocket. Now, whose loss would that be?
Also the small matter of contribution. A single paper is good with a single contribution, however insignificant. Real researchers used to pour down their whole research in a paper, with many significant contributions in a single paper. But if you're pursuing quotas, you'll make sure that 20 or so people milk a single contribution to death and back. Such skimmed papers are now norm, and a real research is now suspicious as out of place and too bombastic. Go figure.
I am not a scientist. The discouraging aspect is to undermine my confidence in any research. Even if I try to look into a paper, it would be almost impossible for a layman as myself to establish validity.
Thus, social media's fascination with citing academic papers to support their opinions, yet matched with revulsion to the science establishment as a whole. You get people who simultaneously swear that anti-parasitics must work against covid because the FDA has approved them for human use, and yet that the same FDA must have rushed the vaccine emergency use approval. 🙃
Thus, social media's fascination with citing academic papers to support their opinions, yet matched with revulsion to the science establishment as a whole. You get people who simultaneously swear that anti-parasitics must work against covid because the FDA has approved them for human use, and yet that the same FDA must have rushed the vaccine emergency use approval. 🙃
I've published a few papers and the group I am in publishes many. Journals will not host our data as it is too large. Some fields have community solutions for these issues, most don't.
Is statcheck or a similar software regularly used? As the name suggests, it checks the reported statistics for plausibility.
Do you consider results that are very socially popular as falling in the "conflicts of interest" category? Maybe affecting the works of the likes of Roland Fryer and others who research such touchy topics?
If a pseudo-scientist hits on results which the general public WANT to see as being valid and true, you can bet that the P-hacking will commence, baby! 💪😎✌️ There's HUUUGE money in telling billions of people what they already hoped was "true". 😂
If the results support existing political rhetoric then it's fraud. I find this when reading history all the time, no further study necessary: Fraud.
200 papers a year?!!! That's beyond suspicious
The way I catch them is by watching Pete judo's channel
Do you "first author"? I find it very surprising you didn't specify this. Last authors, for example, are typically lab/science area heads who will appear on every paper authored by their team.
Another way is looking how diverse the fields of publication are. if someone publish 40 papers a year, but all of them are LLM research, ok, LLMs are exploding right now, with enough GPU you can automate lot's of experiments and some of them will get interesting results. If they publish 10 papers, one in AI, one in cybersecurity, one in robotics, etc. You may wonder how a person can be an expert on so many fields.
The use of "novel statistical methods" can also be a hint at shenanigans.
I've read many papers where the abstract reads more like a Billy Mays ad than a summary of the research, and I've read plenty of papers with spelling/grammar/formatting mistakes. I have no idea if there's a higher incidence rate of fraud in such papers, but it always makes me way more suspicious.
It’s important to change the whole apparatus built around the scientific method. Having to pay to be part of the scientific discussion it’s not acceptable, peer reviewers that peer review badly are unacceptable, Publish or Perish is unacceptable.
My father did brain and vision research and he did surgery on his "control" animals to adjust them for his desired results. Nobody ever checked his animal colonies, just his data. Fooled his graduate students, too, who assisted in the experiments. The data was real, but the theories and results were bunk. He wept, routinely, carrying on about how much he loved his research animals for their "beautiful and selfless sacrifice" for science. He always said it in exactly the same way he said he loved me. I think he really enjoyed the power to kill them slowly while taking apart their brains and giving them stimulus to respond to as they died.
I caught him when he brought home one of his animals, a cat, whose eye was torn. I accused him of cheating when he told me it's cornea was torn in occular surgery, because he introduced it as a retired control animal. (For some reason he initially assumed I was upset that it wouldn't be able to sleep or shut out light.) That was the only time he blatantly and openly threatened my life.
His theories were eventually discredited, since none of his results could be replicated. And he lost his position at the university after years of sexual advances and predatory pressure on his female students.
I guess sociopaths gotta be sociopathic. I felt more relief than anything, when I heard he had died.
Beware of generalizations ..... "data available upon request" or "data available at http...." in many cases is a move in the good direction you state (and which I support) and the only means of telling your potential readers that you're willing to share your data.
It's a good generalization because the data is not available
Thank you Pete.
can you please make a video on how to spot ai generated text in papers? My professors are saying "it's so obvious by the way ut is formated" but i can't really see it. Can you please share your thoughts on how to spot it?
"right to jail" hahahahaha, great video
#6 With a scientist-sized net
A red flag that should prompt investigation would be cultivating loyal followers and building celebrity status. These are invalid reasons for believing an individual's hypotheses.
BUT there is a big complication, some very good scientists have also produced false theories and suppressed rivals better theories,
e.g. Georges Cuvier's suppression of Lamark's evolution, and Karl Pearson's suppression of research into causal inference.
In both cases men who had made major contributions to their field, also held their field back for three generations.
The french Didier Raoult fills all the boxes.
Realistically, how would you implement “publishing all the data alongside the article”? Data is often expensive and proprietary. Once that data is published because someone published a paper with it, no one else would ever buy the data. They can just download it from the journal. The data providers will then never sell the data to academics in the first place.
Not a red but orange flag is someone getting too many successful grant applications from fields other than their own, for example, AI to address some topic in the humanities.
The wealthy always win. Always. 💪😎✌️ No exceptions. Facts don't matter. Science is for geeks. MONEY rules all.
That's just fraud/embezzlement
We need a "Day zero" reset before AI completely undermines conventional fraud detection. Removing all bad actors now would act as a strong deterrent, but it would also remove the commercial market for fraud. The last 20 years of papers should be for fraud using image checking and similar software. Then, not only should all affected papers be redacted, but also all authors, labs, and companies involved in the fraud should be blacklisted for future publication. That blacklisting should only be lifted with increased scrutiny of publications on an individual basis.
The solution is to make it irrelevant to have large numbers of papers. Use a logarithmic metric, with a maximum at --say-- log(12 papers/year) and decreasing for larger number pf papers per year down to zero if you publish --say-- 24 papers or more per year. Actually I think the numbers should be closer to half of those above.
How do you get the scientific community and journals change their baseline practices? Is there a committee/professional organization that defines these?
I effectively spent a year on my MS thesis. Under other circumstances, I could maybe have broken it into 3 or 4 papers, max.
What peer review journals are publishing authors with a large number of papers per year?
this is effectively the essence of scientific work isn't it?
I publish dozen or more papers a year, though only a few are first/sole author. In planetary science we often work in broad flat teams of individual 'independent' scientists associated with space missions (without the more feudal lab head/minion hierarchy associated with a lab). The mission data are contractually obliged to be publicly available.
Administration takes +40% off of any incoming grant, claiming administration costs to support research efforts. "There just isn't the manpower or infrastructure doing this kind of work." Ah yes, if there's anything universities are light on it's the infrastructure and oversight...
The 10 papers per year limit is too low imo, it really depends on the scientists' position and career advancement. For young scientists, 3-4 would already be a high number since they typically run the studies. However renowned scientists are often collaborating with many different labs and supervising multiple people. Many of them end up co-authoring 20 papers without any cheating involved.
The "reasonable" number of publications depends a lot on what work is being done, and how well-estalished the person is. Professors typically do a lot of *reviewing* of nearly finished papers, which takes much much less time than *conducting and writing* the study. The amount of studies being presented to them increases when they have more connections within their field. Some people (especially professors) are also workaholics human machines who will review your work as soon as you ask them, even if they're in the middle of their vacations in the middle of a remote mountain range.
Now if we are talking about 10 first-author papers, I would agree that the number is high regardless of the position.
10 as primary author, yeah, would be suspicious, but I think my advisor averages around 10 overall per year, although only 1 or 2 of them with him as primary author that weren't invited articles. Nothing suspicious about it, we do a very niche kind of physics simulation, and he ends up being cited for fine tuning and advising other groups on planning their simulations.
50+ is very blatantly sketchy, though, even as a secondary author.
The music/voice balance is a bit off ?
Great topic, thanks
A software solution that sweeps for things that aren't straight up copies sounds very difficult and potentially problematic. I think that, at best, something like that should flag stuff for human review. It could be difficult to use unless it can give the human reviewer an account of its reasoning (which modern AI systems mostly cannot).
From personal experience in the role, If expert reviewers of publications have to go through all the original data then a far larger pool of reviewers will be needed. Experts have to devote quite a lot of time already to reviewing just the bare publication - work which is usually unpaid by the journal editorial boards I should add. Where are these extra experts going to come from?
I'm relatively new to academia and struggling to get one paper published a year becayse I spend too long sweating the details. It really disheartens me to see people publishing loads and loads every month!
Regarding the "10 Papers a year" thing. The reason why I am stating this, is because I assume to be an author you need to at least have "read" and "understood" it. Both of these need time. if you are publishing a paper a day, when are you actually doing the data generation part?
I'd say 2-3 is average but 5 is pushing it. Several papers published within a year is not only a sign of fraud but also of predatory tactics and exploitation such taking undeserved credit, forcing post-grad students with publication quota, and blatant authorship selling/trading within circles.
"Publish or Perish" was always a huge mistake. In only incentivizes bad behavior. Seriously, some dude is really going to put out over 500 papers this year? He's clearly gaming the system somehow
If you are a lab boss, you will be an author in everything from your lab. Publication rate isn't as simple, especially for collaborations and big institutions.
you should cover what the scientific community is proposing doing about this (fake data, but also p-hacking, replicability, publication bias etc), i think its a known problem to many scientists
also the history of how and reasons why we got such ridiculous seeming systems like not having to publish the data along side your paper
Great content BTW man.
Here’s how to tell (follow the money). For example when Sam Bankman Fried gave the researchers millions of dollars he embezzled from FTX immediately AFTER they announced Ibernectin was not effective.
This depends on what is meant by "published a paper". Co-author is not equivalent to doing any work. Anyone who's been on a conference call with the IPCC probably has their name stuck on a hundred papers without their permission.
I just finished reading another Carl Elliott book, his latest: The Occasional Human Sacrifice. Fraud isn't the half of it.
Look into studies with many deaths.
Brooo don't leave us hanging on that ominous note. People be hiring hitmen off the dark web to off academic researchers or what
You mean where people may or may not have died due an experiment? Probably doesn't happen much, even frauds feel bad when people get hurt
Ultimately, it's up to scientists to conduct replication studies to check each other. The system isn't incentivizing replication remotely enough.
These are all reasonable ideas to catch basic frauds. We should remember though that this is an adversarial system and fraudsters will get better at beating basic automated checks, so we'll really need to go deeper over time and maybe even change the way science is done if we really want to root out fraud.
We're all way too careful. Don't be afraid to hurt feelings of 0.1% if the upside is removing the filth of bad research. We just need to require more evidence from suspicious actors. >10 papers a year? You must supply all of the raw data, with very specific requirements for details. I think the requirement should be "An external non-field related person should be able to deduce the details of the experiment: how the participants were contacted, what they were given at what time. If this was a mechanical/chemical experiment, there has to be equipment specs, settings details." So that replication is no longer a guesswork, and there aren't "obvious" things that field-related scientist like to omit. For instance I know every detail Maxdiff study. I may just say RLH was X and expect the reader to understand. Such expectations should no longer be allowed. Specify the formula, and how you obtained it. Was it individual scores, aggregate scores, etc
Excellent video.
Keep up the good work.
I think more than two papers is suspicious. That's one every six months.
I don't agree with you on the Data Availability Statement (below a suggestion). Data in biochemistry etc is typically very simple, there it can be standard practice. However, often the raw data impossible to handle, if you're not an expert in the field - typically there are maybe 100-1000 true experts in each specific subfield.
Therefore, first, randomly upload data for the community won't really benefit anyone. The chance ppl will wrongly treat the data out of missing expertise is just really high. Second, making everything available we will just feed AI to know better, how to make raw data, which then will be used by asshole scientists to create fake raw data and deposit it.
Therefore, another suggestion:
First, here over in Germany, the German Research Foundation requires universities to store ALL data. If you don't do that, you won't get funding.
Second, nowadays every journal provides paper statistics. Just add one for "Requests / Data shared / Not shared". Thats even better because honest scientist (who the majority) love it if somebody shows interest AND uses their data and has to cite them!
In this way data is available, support is provided by the authors, the community communicate, and we will easily spot assholes.
More than an average of 2 papers, where you are the first or corresponding author, per research staff in your lab, is where I would say you have to be gaming the system somehow. I have 7 people in my lab, 5 of them I would consider actual researchers, and 10 would absolute insane from our group.
Anything above 2 papers is already suspicious. It has taken 10 years to do my thesis 😥
I would think one significant paper per year as lead author, or several related ones as lead author, is the maximum.