Close Menu
    Facebook X (Twitter) Instagram
    SciTechDaily
    • Biology
    • Chemistry
    • Earth
    • Health
    • Physics
    • Science
    • Space
    • Technology
    Facebook X (Twitter) Pinterest YouTube RSS
    SciTechDaily
    Home»Technology»Don’t Panic Yet: “Humanity’s Last Exam” Has Begun
    Technology

    Don’t Panic Yet: “Humanity’s Last Exam” Has Begun

    By Texas A&M UniversityFebruary 28, 20268 Comments5 Mins Read
    Facebook Twitter Pinterest Telegram LinkedIn WhatsApp Email Reddit
    Share
    Facebook Twitter LinkedIn Pinterest Telegram Email Reddit
    Artificial Intelligence Data AI Problem Solving
    A sweeping new global exam exposes how far even the most advanced AI systems remain from matching deep human expertise, reshaping how researchers measure machine intelligence and its limits. Credit: Stock

    As artificial intelligence systems rapidly outgrow traditional academic benchmarks, researchers have unveiled an ambitious new test designed to probe the true limits of machine intelligence.

    When advanced artificial intelligence systems began scoring near-perfect marks on established academic tests, researchers recognized a growing concern. The exams that once posed serious challenges were no longer difficult enough to meaningfully evaluate cutting-edge AI. Well-known benchmarks such as the Massive Multitask Language Understanding (MMLU) exam, previously viewed as rigorous, have become less effective at distinguishing true progress in AI capability.

    In response, an international group of nearly 1,000 researchers, including a professor from Texas A&M University, developed a far more demanding assessment. Their goal was to design an exam so comprehensive and grounded in specialized human expertise that today’s AI systems would struggle to pass it.

    The result is “Humanity’s Last Exam” (HLE), a 2,500-question test that covers mathematics, the humanities, natural sciences, ancient languages, and highly specialized academic fields. The project is described in a paper published in Nature, and additional details are available at lastexam.ai.

    One of the contributors is Dr. Tung Nguyen, instructional associate professor in the Department of Computer Science and Engineering at Texas A&M. He helped write and refine questions for the assessment.

    “When AI systems start performing extremely well on human benchmarks, it’s tempting to think they’re approaching human‑level understanding,” Nguyen said. “But HLE reminds us that intelligence isn’t just about pattern recognition — it’s about depth, context, and specialized expertise.”

    The point wasn’t to stump humans. It was to reveal, precisely and systematically, what AI cannot do, at least not yet.

    A global effort to measure AI’s limits

    Specialists from around the world drafted and reviewed the HLE questions. Each item was required to have one clear, verifiable answer and to resist being solved through quick online searches. The material reflects advanced scholarship, ranging from translating ancient Palmyrene inscriptions to identifying tiny anatomical structures in birds and examining the detailed sound patterns of Biblical Hebrew.

    Before being included, every question was tested on leading AI systems. If a model produced the correct answer, that question was eliminated. This process ensured the final exam would remain just beyond the reach of current AI performance.

    The results show how difficult the assessment is. Early testing found that even top models struggled. GPT-4o scored 2.7%. Claude 3.5 Sonnet achieved 4.1%. OpenAI’s o1 model reached 8%. More recent systems, including Gemini 3.1 Pro and Claude Opus 4.6, have improved to roughly 40-50% accuracy, but they still do not demonstrate full mastery.

    Why a new benchmark matters

    According to Nguyen, the fact that AI has surpassed older benchmarks carries real-world consequences. He contributed 73 of the 2,500 public questions (the second-highest author) and wrote more questions in math and computer science than any other contributor.

    “Without accurate assessment tools, policymakers, developers, and users risk misinterpreting what AI systems can actually do,” he said. “Benchmarks provide the foundation for measuring progress and identifying risks.”

    As explained in the team’s paper, high scores on human-designed exams do not automatically indicate genuine intelligence. Such tests measure performance on tasks originally created for people, not machines. Strong results may reflect pattern matching rather than deep understanding.

    Not a threat, a tool

    Despite its apocalyptic name, Humanity’s Last Exam isn’t meant to suggest the end of human relevance. Instead, it highlights how much knowledge remains uniquely human and how far AI systems still have to go.

    “This isn’t a race against AI,” Nguyen said. “It’s a method for understanding where these systems are strong and where they struggle. That understanding helps us build safer, more reliable technologies. And, importantly, it reminds us why human expertise still matters.”

    A future-proof exam

    HLE is intended to serve as a long‑term, transparent benchmark for evaluating advanced AI systems. As part of that mission, the team has made some of the exam publicly available, while keeping most of the test questions hidden so AI models can’t memorize the answers.

    “For now, Humanity’s Last Exam stands as one of the clearest assessments of the gap between AI and human intelligence,” Nguyen said, “and despite rapid technological advances, it remains wide.”

    Research on a grand scale

    Nguyen noted the massive project reflects the importance of interdisciplinary, international research efforts.

    “What made this project extraordinary was the scale,” he said. “Experts from nearly every discipline contributed. It wasn’t just computer scientists; it was historians, physicists, linguists, medical researchers. That diversity is exactly what exposes the gaps in today’s AI systems —perhaps ironically, it’s humans working together.”

    Reference: “A benchmark of expert-level academic questions to assess AI capabilities” by Center for AI Safety, Scale AI and HLE Contributors Consortium, 28 January 2026, Nature.
    DOI: 10.1038/s41586-025-09962-4

    Funding: The Center for AI Safety and Scale AI consortia

    Never miss a breakthrough: Join the SciTechDaily newsletter.
    Follow us on Google and Google News.

    Artificial Intelligence Computer Science Machine Learning Texas A&M University
    Share. Facebook Twitter Pinterest LinkedIn Email Reddit

    Related Articles

    Machine Learning at Speed: Optimization Code Increases Performance by 5x

    MIT’s New Neural Network: “Liquid” Machine-Learning System Adapts to Changing Conditions

    How AI Sees Through the Looking Glass: Things Are Different on the Other Side of the Mirror

    Widely Used AI Machine Learning Methods Don’t Work as Claimed

    Hunting Down Cybercriminals With New Machine-Learning System

    New AI System Identifies Personality Traits from Eye Movements

    New Artificial Intelligence Device Identifies Objects at the Speed of Light

    Machine-Learning Models Capture Subtle Variations in Facial Expressions

    ‘Deep Learning’ Algorithm Brings New Tools to Astronomy

    8 Comments

    1. maher on February 28, 2026 1:16 pm

      The best test isn’t there yet & is far more simple: Make money fast.

      The faster the AI can make it, the more human it is

      Reply
    2. Cheryl V Johnson on February 28, 2026 3:20 pm

      How do you avoid teaching to the test. Some questions used in IQ tests became examples and even individuals who didn’t really understand the question started knowing the answer.

      Reply
    3. Marvin Rumery III on February 28, 2026 5:38 pm

      using a mainframe for learning or spreading information is a good thing but AI isn’t. There is no point in making a being more intelligent than we are

      Reply
      • Heck on March 2, 2026 5:31 am

        A mainframe? Which century are you from?

        Reply
    4. Stanley Korn on March 1, 2026 2:51 pm

      “A future-proof exam”

      Really?!

      When (not if) AIs master this so-called future-proof exam, maybe it will then be time to allow AIs to design the next such exam.

      Reply
    5. Stanley Korn on March 1, 2026 2:58 pm

      Here’s a twist: Have an AI design an exam that tests the limits of human intelligence.

      Reply
      • Minh on March 18, 2026 6:52 am

        Just stupid thinking; because for now AI dataset is filled with junk and most of the time evaluate it at false. The main problem is the creative thinking, inventing. Aspartame could not exist because a scientist had not licked his fingers and so many cases. How many times did you generate code again and again in hopes for it to work? The more knowledge you have the more you will fight it.

        Reply
    6. Scott on March 2, 2026 5:46 am

      We are not getting the toothpaste back in the tube, and feed or not feeding AI information/data will not slow its growth and capability. With China’s latest quantum chip being 100 time faster than Google’s, it will not be long before Quantum AI is accessing everything: behind paywalls, PINs, health account, bank accounts, even through blockchain encryption, and Government firewalls. Remember “Too Many Secrets” from the Robert Redford and Ben Kingsley movie: “Sneakers”? The issue here, is it isn’t contained in a small box transportable in a back pack. If it isn’t connected to every smart device in the world – I give it 6 months. Not going for doom and gloom, just the reality we are racing toward. The real question isn’t what is the industry going to do with it – it is what are you going to do with it. I can only change myself.

      Reply
    Leave A Reply Cancel Reply

    • Facebook
    • Twitter
    • Pinterest
    • YouTube

    Don't Miss a Discovery

    Subscribe for the Latest in Science & Tech!

    Trending News

    Scientists Discover Game-Changing New Way To Treat High Cholesterol

    This Small Change to Your Exercise Routine Could Be the Secret to Living Longer

    Scientists Discover 430,000-Year-Old Wooden Tools, Rewriting Human History

    AI Could Detect Early Signs of Alzheimer’s in Under a Minute – Far Before Traditional Tests

    What if Dark Matter Has Two Forms? Bold New Hypothesis Could Explain a Cosmic Mystery

    This Metal Melts in Your Hand – and Scientists Just Discovered Something Strange

    Beef vs. Chicken: Surprising Results From New Prediabetes Study

    Alzheimer’s Breakthrough: Scientists Discover Key Protein May Prevent Toxic Protein Clumps in the Brain

    Follow SciTechDaily
    • Facebook
    • Twitter
    • YouTube
    • Pinterest
    • Newsletter
    • RSS
    SciTech News
    • Biology News
    • Chemistry News
    • Earth News
    • Health News
    • Physics News
    • Science News
    • Space News
    • Technology News
    Recent Posts
    • Scientists Reveal Why a Common Drug Causes Birth Defects and Autism
    • A Medieval Japanese Diary Just Helped Scientists Detect a Dangerous Solar Event
    • Humans Returned to Britain 500 Years Earlier Than Scientists Thought
    • 250-Million-Year-Old Egg Solves One of Evolution’s Biggest Mysteries
    • Living With Roommates Might Be Changing Your Gut Microbiome Without You Knowing
    Copyright © 1998 - 2026 SciTechDaily. All Rights Reserved.
    • Science News
    • About
    • Contact
    • Editorial Board
    • Privacy Policy
    • Terms of Use

    Type above and press Enter to search. Press Esc to cancel.