Even Doctors Can’t Tell These AI X-Rays Are Fake

Anatomy-Matched Real and GPT-4o-Generated X-Rays — Anatomy-matched real and GPT-4o-generated radiographs: (A) real and (B) GPT-4o-generated posteroanterior chest radiographs, (C) real and (D) GPT-4ogenerated lateral cervical spine radiographs, (E) real and (F) GPT-4o-generated posteroanterior hand radiographs, and (G) real and (H) GPT-4o-generated lateral lumbar spine radiographs. The pairs demonstrate that GPT-4o can produce radiographically plausible images across different anatomic regions. Credit: Radiological Society of North America (RSNA)

AI-generated X-rays are now so realistic they could fool doctors—and potentially disrupt the entire healthcare system.

A new study published today (March 24) in Radiology, the journal of the Radiological Society of North America (RSNA), finds that both radiologists and advanced multimodal large language models (LLMs) struggle to reliably tell apart real X-rays from artificial intelligence (AI)-generated “deepfake” versions. The results point to growing risks tied to synthetic medical images and highlight the urgent need for better detection tools and specialized training to protect the accuracy of medical records.

A “deepfake” is any image, video, or audio that appears authentic but has been created or altered using AI.

“Our study demonstrates that these deepfake X-rays are realistic enough to deceive radiologists, the most highly trained medical image specialists, even when they were aware that AI-generated images were present,” said lead study author Mickael Tordjman, M.D., post-doctoral fellow, Icahn School of Medicine at Mount Sinai, New York.

“This creates a high-stakes vulnerability for fraudulent litigation if, for example, a fabricated fracture could be indistinguishable from a real one. There is also a significant cybersecurity risk if hackers were to gain access to a hospital’s network and inject synthetic images to manipulate patient diagnoses or cause widespread clinical chaos by undermining the fundamental reliability of the digital medical record.”

Study Design and Global Participation

The study involved 17 radiologists from 12 medical centers across six countries (United States, France, Germany, Turkey, United Kingdom, and United Arab Emirates). Their experience levels ranged from newcomers to experts with up to 40 years in the field. Researchers analyzed a total of 264 X-ray images, split evenly between real and AI-generated scans.

Participants reviewed two separate sets of images. One included a mix of real X-rays and images generated by ChatGPT across different parts of the body. The second focused on chest X-rays, again evenly divided between real images and those created by RoentGen, an open-source generative AI diffusion model developed by Stanford Medicine researchers.

GPT-4o-Generated X-Rays of Fractures — Examples of GPT-4o-generated radiographs of fractures: (A) posteroanterior radiograph of the hand, (B) posteroanterior radiograph of the lower leg, and (C) medial oblique radiograph of the foot. The images show fracture lines (arrow) that are unusually smooth, clean, and consistent and, in the case of B, unicortical. The presence of these idealized fracture lines, characterized by unnatural smoothness and incomplete cortical disruption, could serve as a primary diagnostic cue for identifying artificial intelligence–generated trauma images. Credit: Radiological Society of North America (RSNA)

Detection Accuracy Remains Limited

When radiologists were not told that fake images were included, only 41% identified AI-generated X-rays on their own after reviewing image quality. Once they were informed that synthetic images were present, their average accuracy rose to 75% when distinguishing real from fake scans.

Performance varied widely. Some radiologists correctly identified as few as 58% of the AI-generated images, while others reached 92%. The AI systems themselves showed similar limitations. Four multimodal LLMs—GPT-4o (OpenAI), GPT-5 (OpenAI), Gemini 2.5 Pro (Google), and Llama 4 Maverick (Meta)—achieved accuracy rates between 57% and 85%. Even ChatGPT-4o, which helped generate the deepfake images, failed to identify all of them, though it outperformed the other models.

For chest X-rays created by RoentGen, radiologists achieved accuracy rates between 62% and 78%, while AI models ranged from 52% to 89%.

Interestingly, years of professional experience did not improve detection ability. However, musculoskeletal radiologists performed better than other subspecialists.

Signs of AI-Generated X-Rays

Researchers found that synthetic X-rays often share subtle visual patterns.

“Deepfake medical images often look too perfect,” Dr. Tordjman said. “Bones are overly smooth, spines unnaturally straight, lungs overly symmetrical, blood vessel patterns excessively uniform, and fractures appear unusually clean and consistent, often limited to one side of the bone.”

Risks and Potential Safeguards

The findings raise serious concerns about how AI-generated medical images could be misused. Experts warn that fake X-rays could be used in legal cases or introduced into hospital systems to influence diagnoses.

To address these risks, the researchers recommend stronger digital protections. These include invisible watermarks embedded directly in images and cryptographic signatures linked to the technician who captured the scan, helping verify authenticity.

What Comes Next for AI Medical Imaging

“We are potentially only seeing the tip of the iceberg,” Dr. Tordjman said. “The logical next step in this evolution is AI-generation of synthetic 3D images, such as CT and MRI. Establishing educational datasets and detection tools now is critical.”

To support training and awareness, the research team has released a curated deepfake dataset that includes interactive quizzes for educational use.

Reference: “The Rise of Deepfake Medical Imaging: Radiologists’ Diagnostic Accuracy in Detecting ChatGPT-generated Radiographs” by Mickael Tordjman, Murat Yuce, Amine Ammar, Mingqian Huang, Fadila Mihoubi Bouvier, Maxime Lacroix, Anis Meribout, Ian Bolger, Efe Ozkaya, Himanshu Joshi, Amine Geahchan, Rayane El Rahi, Haidara Almansour, Ashwin Singh Parihar, Carolyn Horst, Samet Ozturk, Muhammed Edip Isleyen, Gul Gizem Pamuk, Ahmet Tan Cimilli, Timothy Deyer, Arvin Calinghen, Enora Guillo, Rola Husain, Jean-Denis Laredo, Zahi A. Fayad, Xueyan Mei and Bachir Taouli, 24 March 2026, Radiology.
DOI: 10.1148/radiol.252094

Collaborating with Dr. Tordjman were Murat Yuce, M.D., M.S., Amine Ammar, M.D., Mingqian Huang, M.D., Fadila Mihoubi Bouvier, M.D., Maxime Lacroix, M.D., Anis Meribout, M.D., Ian Bolger, M.S., Efe Ozkaya, Ph.D., Himanshu Joshi, Ph.D., Amine Geahchan, M.D., Rayane El Rahi, M.D., Haidara Almansour, M.D., Ashwin Singh Parihar, M.D., Carolyn Horst, M.D., Samet Ozturk, M.D., Muhammed Edip Isleyen, M.D., Gul Gizem Pamuk, M.D., Ahmet Tan Cimilli, M.D., Timothy Deyer, M.D., Arvin Calinghen, M.D., Enora Guillo, M.D., Rola Husain, M.D., Jean-Denis Laredo, M.D., Zahi A. Fayad, Ph.D., Xueyan Mei, Ph.D., and Bachir Taouli, M.D., M.H.A.

Never miss a breakthrough: Join the SciTechDaily newsletter.
Follow us on Google and Google News.

1 Comment

Clyde Spencer on March 24, 2026 2:09 pm
What may be necessary in the future is a chain of custody protocol, or audit trail, for medical imagery to ensure that it is authentic. Perhaps some sort of difficult to forge ‘water mark’ will be necessary to ensure that an image is what it is purported to be.

Even Doctors Can’t Tell These AI X-Rays Are Fake

This One Twist Was Enough to Fool ChatGPT – And It Could Cost Lives

Is AI Funnier Than Humans? Research Suggests Yes

Virtual AI Radiologist: ChatGPT Passes Radiology Board Exam

New Chip Reduces Neural Networks’ Power Consumption by 95 Percent

Engineers Develop Automated Process for Discovering Optimal Structure for Metamaterials

MIT Launches Intelligence Quest To Advance Human and Machine Intelligence Research

New Artificial Intelligence System To Aid In Materials Fabrication

New Algorithm Lets Robots Autonomously Plan for Tasks

TEM Based Data Collection Technique Maps Nanoparticle Atomic Structures

1 Comment

This Alien Planet Has Rock Clouds That Vaporize Before Sunset

The Simple Habit That Could Lower Your Cancer Risk

146,000-Year-Old Discovery Rewrites the Story of Human Creativity

The Type of Alcohol You Drink Could Affect How Long You Live

This Common Vitamin May Help Stop Prediabetes From Turning Into Diabetes

Scientists Finally Solve the Mystery of “Clockwork” Earthquakes

Breakthrough Parkinson’s Drug Targets Disease at Its Genetic Roots

Just 4 Weeks of Simple Diet Changes Reversed Signs of Aging in Older Adults

Even Doctors Can’t Tell These AI X-Rays Are Fake

Study Design and Global Participation

Detection Accuracy Remains Limited

Signs of AI-Generated X-Rays

Risks and Potential Safeguards

What Comes Next for AI Medical Imaging

Related Articles

1 Comment