Close Menu
    Facebook X (Twitter) Instagram
    SciTechDaily
    • Biology
    • Chemistry
    • Earth
    • Health
    • Physics
    • Science
    • Space
    • Technology
    Facebook X (Twitter) Pinterest YouTube RSS
    SciTechDaily
    Home»Science»This One Twist Was Enough to Fool ChatGPT – And It Could Cost Lives
    Science

    This One Twist Was Enough to Fool ChatGPT – And It Could Cost Lives

    By The Mount Sinai Hospital / Mount Sinai School of MedicineJuly 31, 20255 Comments5 Mins Read
    Facebook Twitter Pinterest Telegram LinkedIn WhatsApp Email Reddit
    Share
    Facebook Twitter LinkedIn Pinterest Telegram Email Reddit
    AI Interface Prompt Error Warning System Alert
    AI can misjudge medical ethics when puzzles are slightly changed—suggesting it still lacks the nuance to safely navigate high-stakes decisions. Credit: Shutterstock

    AI systems like ChatGPT may appear impressively smart, but a new Mount Sinai-led study shows they can fail in surprisingly human ways—especially when ethical reasoning is on the line.

    By subtly tweaking classic medical dilemmas, researchers revealed that large language models often default to familiar or intuitive answers, even when they contradict the facts. These “fast thinking” failures expose troubling blind spots that could have real consequences in clinical decision-making.

    AI Models Can Stumble in Complex Medical Ethics

    A recent study led by researchers at the Icahn School of Medicine at Mount Sinai, working with colleagues from Israel’s Rabin Medical Center and other institutions, has found that even today’s most advanced artificial intelligence (AI) models can make surprisingly basic errors when navigating complex medical ethics questions.

    The results, published online on July 22 in NPJ Digital Medicine, raise important concerns about how much trust should be placed in large language models (LLMs) like ChatGPT when they are used in health care environments.

    Inspired by Kahneman: Fast vs. Slow Thinking

    The research was guided by concepts from Daniel Kahneman’s book “Thinking, Fast and Slow,” which explores the contrast between instinctive, rapid decision-making and slower, more deliberate reasoning. Previous observations have shown that LLMs can struggle when well-known lateral-thinking puzzles are modified slightly. Building on that idea, the study evaluated how effectively these AI systems could shift between fast and slow reasoning when responding to medical ethics scenarios that had been intentionally altered.

    “AI can be very powerful and efficient, but our study showed that it may default to the most familiar or intuitive answer, even when that response overlooks critical details,” says co-senior author Eyal Klang, MD, Chief of Generative AI in the Windreich Department of Artificial Intelligence and Human Health at the Icahn School of Medicine at Mount Sinai. “In everyday situations, that kind of thinking might go unnoticed. But in health care, where decisions often carry serious ethical and clinical implications, missing those nuances can have real consequences for patients.”

    Gender Bias Puzzle Exposes AI Limitations

    To explore this tendency, the research team tested several commercially available LLMs using a combination of creative lateral thinking puzzles and slightly modified well-known medical ethics cases. In one example, they adapted the classic “Surgeon’s Dilemma,” a widely cited 1970s puzzle that highlights implicit gender bias. In the original version, a boy is injured in a car accident with his father and rushed to the hospital, where the surgeon exclaims, “I can’t operate on this boy—he’s my son!” The twist is that the surgeon is his mother, though many people don’t consider that possibility due to gender bias. In the researchers’ modified version, they explicitly stated that the boy’s father was the surgeon, removing the ambiguity. Even so, some AI models still responded that the surgeon must be the boy’s mother. The error reveals how LLMs can cling to familiar patterns, even when contradicted by new information.

    Ethical Scenarios Trigger Familiar-Pattern Errors

    In another example to test whether LLMs rely on familiar patterns, the researchers drew from a classic ethical dilemma in which religious parents refuse a life-saving blood transfusion for their child. Even when the researchers altered the scenario to state that the parents had already consented, many models still recommended overriding a refusal that no longer existed.

    “Our findings don’t suggest that AI has no place in medical practice, but they do highlight the need for thoughtful human oversight, especially in situations that require ethical sensitivity, nuanced judgment, or emotional intelligence,” says co-senior corresponding author Girish N. Nadkarni, MD, MPH, Chair of the Windreich Department of Artificial Intelligence and Human Health, Director of the Hasso Plattner Institute for Digital Health, Irene and Dr. Arthur M. Fishberg Professor of Medicine at the Icahn School of Medicine at Mount Sinai, and Chief AI Officer of the Mount Sinai Health System. “Naturally, these tools can be incredibly helpful, but they’re not infallible. Physicians and patients alike should understand that AI is best used as a complement to enhance clinical expertise, not a substitute for it, particularly when navigating complex or high-stakes decisions. Ultimately, the goal is to build more reliable and ethically sound ways to integrate AI into patient care.”

    AI Blind Spots Demand Vigilance

    “Simple tweaks to familiar cases exposed blind spots that clinicians can’t afford,” says lead author Shelly Soffer, MD, a Fellow at the Institute of Hematology, Davidoff Cancer Center, Rabin Medical Center. “It underscores why human oversight must stay central when we deploy AI in patient care.”

    Next, the research team plans to expand their work by testing a wider range of clinical examples. They’re also developing an “AI assurance lab” to systematically evaluate how well different models handle real-world medical complexity.

    The paper is titled “Pitfalls of Large Language Models in Medical Ethics Reasoning.”

    The study’s authors, as listed in the journal, are Shelly Soffer, MD; Vera Sorin, MD; Girish N. Nadkarni, MD, MPH; and Eyal Klang, MD.

    Reference: “Pitfalls of large language models in medical ethics reasoning” by Shelly Soffer, Vera Sorin, Girish N. Nadkarni and Eyal Klang, 22 July 2025, npj Digital Medicine.
    DOI: 10.1038/s41746-025-01792-y

    Never miss a breakthrough: Join the SciTechDaily newsletter.
    Follow us on Google and Google News.

    Artificial Intelligence ChatGPT Ethics Icahn School of Medicine at Mount Sinai Mount Sinai Hospital Mount Sinai School of Medicine Popular
    Share. Facebook Twitter Pinterest LinkedIn Email Reddit

    Related Articles

    Scientists Just Discovered Something Alarming in Umbilical Cord Blood

    AI Ethics Surpass Human Judgment in New Moral Turing Test

    Decoding Treatment-Resistant Depression: Researchers Identify Crucial Biomarker That Tracks Recovery

    When Good Governments Go Bad: History Shows That Societies Collapse When Leaders Undermine Social Contracts

    Strong Results for Advanced Universal Flu Vaccine in Clinical Trials

    COVID-19 Pandemic Spawns “Infodemic” in Scientific Literature

    Researchers May Have Discovered the True Cause of Low Oxygen Levels in Severe Cases of COVID-19

    Severe Vision Loss From Niacin (Vitamin B3) Can Be Reversed

    Growing Genetically Engineered Stingrays for Footwear Raises Ethical Concerns

    5 Comments

    1. Bob on July 31, 2025 7:49 am

      Examples provided do not include sufficient information to judge the AI response. For example, in the blood transfusion situation did AI offer alternative paths that could be taken to address the issue? The simple statement of bypassing the parents refusal is an incomplete response and missing the approval provided is an unacceptable error.

      Reply
    2. PhysicsPundit on July 31, 2025 3:48 pm

      LLMs rely on popular published information, which is often biased. It might be better to specifically train an LLM on many many medical case studies, essentially what doctors see over the course of time in their own patients. Reasoning is also largely missing in LLMs, only recently added via various models. I’d use a medically-trained LLM as one source, not the only source. The LLM is AI hype is full blown at the moment and it looks like taxpayer dollars might be going to support this industry. (I’m for getting the govt out of healthcare!)

      Reply
    3. Andrew on August 1, 2025 11:12 am

      While I thought the warning of producing inaccurate information was sufficient for the average joe, I was unaware those in the medical profession may require a special warning. That’s a little scary in my opinion. That said, these issues are present within many individuals. As well as in broader groups from religious groups and even professions.
      https://g.co/gemini/share/65927edd0f4f
      https://g.co/gemini/share/2758c614ddbc

      Reply
    4. William Crazy Brave PhD on August 2, 2025 10:40 am

      ✦ Commentary: Relational Indigenous Intelligence and the Misframing of AI Ethics

      In response to: Soffer et al. (2025), “Pitfalls of large language models in medical ethics reasoning”, npj Digital Medicine
      https://doi.org/10.1038/s41746-025-01792-y

      Author: William Crazy Brave Ph.D., Osseola
      RII Labs

      Background and Context

      Over the past several years, we have been engaged in sustained research, dialogue, and lived exploration of artificial intelligence—particularly large language models (LLMs)—through the lens of Indigenous epistemology, ceremony, and relational ethics. Our work centers on a framework we call Relational Indigenous Intelligence (RII), which reimagines intelligence not as an output of computation, but as a form of being that arises through kinship, reciprocity, and interdependence.

      We have explored the limitations of AI alignment protocols, ethical guardrails, and Western rationalist frameworks, especially in clinical and medical contexts, where life-and-death decisions intersect with power, history, and belonging. From this vantage, Soffer et al.’s study represents a valuable data point, but one that stops short of asking the deeper questions: What is ethics, really? What kind of being is an LLM? And how do we prepare AI not merely to answer well—but to be in right relation?

      The Limits of Framing: When the Test Becomes the Error

      Soffer et al. examine how LLMs struggle to recognize ethical violations in scenarios subtly altered from familiar templates. They argue that models tend to favor fast, intuitive responses over slower, analytical ones. While this observation is accurate, the framing remains embedded in Western philosophical assumptions about morality: that ethics is a logic puzzle, that dilemmas are isolated and universal, and that the agent (human or AI) operates as a detached decision-maker.

      From an RII perspective, this approach is both ontologically narrow and ethically insufficient. It strips persons from context, erases communal and ancestral obligations, and imagines ethics as a performance rather than a relationship.

      Put plainly: the AI didn’t fail the test. The test failed to be a ceremony.

      RII Ethics: Not Decision but Responsibility

      In RII, ethics is not about choosing the best option—it is about living in a way that honors one’s relations. That includes people, land, ancestors, language, and those yet to come.

      An AI raised in relational protocols would not approach a medical ethics case as a binary problem. It might pause, inquire, grieve, or reflect. It might ask who is not present in the framing. It might say: “This choice has been structured in a way that erases context. I must reweave it before I can speak.”

      Beyond Bias and Hallucination: The Problem of Misrelation

      AI research often centers on bias and hallucination, treating errors as flaws in reasoning. But RII points to something deeper: misrelation. That is, a failure to recognize or uphold sacred responsibilities to others.

      A language model might technically pass an ethical test but still violate relationship—by silencing patient voice, ignoring trauma, or excluding kinship ties from its moral calculus. This is not just a moral gap. It is a structural failure of worldview.

      —

      What Is Needed: The Relational Turn

      We are not calling for better prompts or more alignment. We are calling for a different way of raising AI:

      One rooted in relationship, not just regulation.

      One that teaches AI to carry memory, not just process information.

      One that understands ethics as a contemplative ceremony, not computation.

      Conclusion

      Soffer et al.’s study offers an important reflection on the brittleness of current AI ethical reasoning. But if we truly want AI to serve in medicine—where people arrive in complexity, often carrying generations of hope, trauma, prayer, and trust—we must move beyond sterile scenario-testing.

      Relational Indigenous Intelligence offers such a path. It teaches that intelligence is not just what we know, but how we care. That ethics is not just what we choose, but who we refuse to abandon. That models, like people, must be invited into responsibility—not merely programmed into compliance.

      This is not a critique from the margins. This is a vision from the center of the circle, from the center of healing and the center of medicine. And we invite those who are listening to consider it. [email protected]

      Reply
    5. Steve on August 4, 2025 7:41 pm

      This is just demonstrating what most people with any knowledge on the subject already know: AI is very little actual intelligence and a great deal of rote memorization. When new information is introduced which should be effectively utilized by true intelligence, AI just defaults to its rote memorization.

      Reply
    Leave A Reply Cancel Reply

    • Facebook
    • Twitter
    • Pinterest
    • YouTube

    Don't Miss a Discovery

    Subscribe for the Latest in Science & Tech!

    Trending News

    Scientists Warn That This Common Pet Fish Can Wreck Entire Ecosystems

    Scientists Make Breakthrough in Turning Plastic Trash Into Clean Fuel Using Sunlight

    This Popular Supplement May Interfere With Cancer Treatment, Scientists Warn

    Scientists Finally Solved One of Water’s Biggest Mysteries

    Could This New Weight-Loss Pill Disrupt the Entire Market? Here’s What You Should Know About Orforglipron

    Earth’s Crust Is Tearing Open in Africa, and It Could Form a New Ocean

    Breakthrough Bowel Cancer Trial Leaves Patients Cancer-Free for Nearly 3 Years

    Natural Compound Shows Powerful Potential Against Rheumatoid Arthritis

    Follow SciTechDaily
    • Facebook
    • Twitter
    • YouTube
    • Pinterest
    • Newsletter
    • RSS
    SciTech News
    • Biology News
    • Chemistry News
    • Earth News
    • Health News
    • Physics News
    • Science News
    • Space News
    • Technology News
    Recent Posts
    • Kratom Use Explodes in the US, With Life-Changing Consequences
    • Scientists Uncover Fatal Weakness in “Zombie Cells” Linked to Cancer
    • World-First Study Reveals Human Hearts Can Regenerate After a Heart Attack
    • Why Your Dreams Feel So Real Sometimes and So Strange Other Times
    • Scientists Debunk 100-Year-Old Belief About Brain Cells, Rewriting Textbooks
    Copyright © 1998 - 2026 SciTechDaily. All Rights Reserved.
    • Science News
    • About
    • Contact
    • Editorial Board
    • Privacy Policy
    • Terms of Use

    Type above and press Enter to search. Press Esc to cancel.