
A large-scale analysis of millions of cancer studies has uncovered patterns suggesting that a significant portion of the literature may not be as reliable as it appears.
Researchers have built a machine learning tool that spots signs of mass-produced science, and its first major test suggests the scale could be startling. The system flagged more than 250,000 cancer research papers that may be tied to so-called “paper mills,” groups that churn out manuscripts for sale.
The study was led by QUT researcher Professor Adrian Barnett from the School of Public Health and Social Work and Australian Centre for Health Services and Innovation (AusHSI), working with an international team. Reported in The BMJ, the project reviewed 2.6 million cancer studies published from 1999 to 2024 and looked for repeated writing habits linked to papers that were later withdrawn.
Instead of searching for obvious red flags like duplicated figures or impossible data, the tool focuses on language itself. The researchers found more than 250,000 papers whose writing patterns resembled those seen in articles already retracted for suspected fabrication, suggesting that template-driven writing can leave behind recognizable traces.
How Paper Mills Operate
“Paper mills are companies that sell fake or low-quality scientific studies. They are producing ‘research’ on an industrial scale, and our findings suggest the problem in cancer research is far larger than most people realized,” Professor Barnett said.

Paper mills can offer everything from a paid author slot to an entire completed manuscript. To produce work quickly, they may recycle blocks of text, rely on unnatural phrasing, or invent supporting data and images, creating papers that can look plausible at a glance while still being unreliable.
“Most likely, they’re relying on boilerplate templates which can be detected by large language models that analyze patterns in texts,” Professor Barnett said.
To detect those patterns, Barnett’s team trained a language model called BERT to recognize subtle textual “fingerprints” that show up again and again in known paper mill products.
When the model was evaluated using verified examples, it correctly identified suspicious papers 91 percent of the time, pointing to a potential new way to help publishers and researchers decide what deserves closer scrutiny.
“We’ve essentially built a scientific spam filter,” Professor Barnett said.
“Just like your email system can spot unwanted messages, our tool flags papers that match the writing style and structure we see in retracted, fraudulent work.”
Key Trends and Areas of Concern
Key findings from the large-scale analysis include:
- Flagged papers have increased dramatically over two decades, rising from around 1 percent in the early 2000s and peaking at over 16 percent in 2022.
- The issue affects thousands of journals across major publishers, including high-impact titles.
- The problem is most concentrated in fields such as molecular cancer biology and early-stage laboratory research.
- Some cancer types, including gastric, liver, bone, and lung cancer, show especially high rates of suspicious papers.
Three scientific journals are already piloting the tool as part of their editorial screening. It will allow editors to identify potentially fabricated manuscripts before they are sent for peer review.
The team plans to expand the tool to other fields of research and improve the model as more confirmed cases of paper-mill activity become available. They stress the findings are not confirmed cases of research fraud and should be checked by human specialists.
“Cancer research influences clinical trials, drug development, and patient care,” Professor Barnett said.
“If fabricated studies make their way into the evidence base, they can mislead real scientists and ultimately slow progress for patients. That’s why it’s vital we get ahead of this problem.”
Reference: “Machine learning based screening of potential paper mill publications in cancer research: methodological and cross sectional study” by Baptiste Scancar, Jennifer A Byrne, David Causeur and Adrian G Barnett, 30 January 2026, BMJ.
DOI: 10.1136/bmj-2025-087581
Never miss a breakthrough: Join the SciTechDaily newsletter.
Follow us on Google and Google News.
4 Comments
If big pharma is left to determine what is fake and what is not fake research, God help us! Big corporations fund research that supports their industries. As long as the research relies on “Where can we make the most profit? Legitamate research will always be tainted.
You suggest that ‘big pharma’ is responsible for a growth in fake research papers. Perhaps our working definitions are different. I consider ‘big pharma’ to be companies that develop and sell medicine. If they chase profitable products, based on fabricated research, it won’t take long before the consumer and prescribing doctors observe that the medicine is no better than a placebo, assuming that the medicine makes it past the regulatory agencies that check for efficacy and risks in the use.
I think that you have adopted a meme that is politically popular, but illogical. The real problem is more complex. We have an educational system that rewards publication of successful research. That is, “publish or perish.” Thus, there is pressure for recent graduates to have a curriculum vitae (CV) with many high-profile publications just to find a job in academia. That track record has to be continued to receive promotions and salary increases. It should be clear that the incentive for fraud lies with the academic institutions, their graduates, and perhaps to some extent private research, if they don’t weed out the under-performers.
However, journal publishers run a risk of publishing work that cannot be duplicated and having their reputation tarnished and result in lower subscription rates. Similarly, ‘big pharma’ runs a risk of getting a reputation of selling ‘snake oil’ if they promote and sell medicines that are worthless, or even worse, dangerous. The metric by which medicines are judged is quite demanding. We have eliminated polio, small pox, and measles, to name just a few. If new treatments aren’t as effective on the same or new diseases, people will ask “Why not?” We know what to expect for the longevity of those with terminal illnesses. If people don’t live longer with new treatments, it is de facto evidence that the treatment regimen is ineffective or at least not an improvement.
You come across as someone who is quick to ‘jump on the bandwagon’ by repeating accusations, but knowing little about the topic. There are plenty of justified complaints about medical and pharmaceutical practices. Find one that you can support with real evidence rather than just repeating things that you have read. The essence of this article is doing that.
When your studies have produced nothing, the need to show something becomes a strong and urgent need. Your funding often depends on it.
We are being flooded by fake stuff, AI news, and liars in general. When I see something online…I no longer believe it.