Close Menu
    Facebook X (Twitter) Instagram
    SciTechDaily
    • Biology
    • Chemistry
    • Earth
    • Health
    • Physics
    • Science
    • Space
    • Technology
    Facebook X (Twitter) Pinterest YouTube RSS
    SciTechDaily
    Home»Technology»Don’t Panic Yet: “Humanity’s Last Exam” Has Begun
    Technology

    Don’t Panic Yet: “Humanity’s Last Exam” Has Begun

    By Texas A&M UniversityFebruary 28, 20268 Comments5 Mins Read
    Facebook Twitter Pinterest Telegram LinkedIn WhatsApp Email Reddit
    Share
    Facebook Twitter LinkedIn Pinterest Telegram Email Reddit
    Artificial Intelligence Data AI Problem Solving
    A sweeping new global exam exposes how far even the most advanced AI systems remain from matching deep human expertise, reshaping how researchers measure machine intelligence and its limits. Credit: Stock

    As artificial intelligence systems rapidly outgrow traditional academic benchmarks, researchers have unveiled an ambitious new test designed to probe the true limits of machine intelligence.

    When advanced artificial intelligence systems began scoring near-perfect marks on established academic tests, researchers recognized a growing concern. The exams that once posed serious challenges were no longer difficult enough to meaningfully evaluate cutting-edge AI. Well-known benchmarks such as the Massive Multitask Language Understanding (MMLU) exam, previously viewed as rigorous, have become less effective at distinguishing true progress in AI capability.

    In response, an international group of nearly 1,000 researchers, including a professor from Texas A&M University, developed a far more demanding assessment. Their goal was to design an exam so comprehensive and grounded in specialized human expertise that today’s AI systems would struggle to pass it.

    The result is “Humanity’s Last Exam” (HLE), a 2,500-question test that covers mathematics, the humanities, natural sciences, ancient languages, and highly specialized academic fields. The project is described in a paper published in Nature, and additional details are available at lastexam.ai.

    One of the contributors is Dr. Tung Nguyen, instructional associate professor in the Department of Computer Science and Engineering at Texas A&M. He helped write and refine questions for the assessment.

    “When AI systems start performing extremely well on human benchmarks, it’s tempting to think they’re approaching human‑level understanding,” Nguyen said. “But HLE reminds us that intelligence isn’t just about pattern recognition — it’s about depth, context, and specialized expertise.”

    The point wasn’t to stump humans. It was to reveal, precisely and systematically, what AI cannot do, at least not yet.

    A global effort to measure AI’s limits

    Specialists from around the world drafted and reviewed the HLE questions. Each item was required to have one clear, verifiable answer and to resist being solved through quick online searches. The material reflects advanced scholarship, ranging from translating ancient Palmyrene inscriptions to identifying tiny anatomical structures in birds and examining the detailed sound patterns of Biblical Hebrew.

    Before being included, every question was tested on leading AI systems. If a model produced the correct answer, that question was eliminated. This process ensured the final exam would remain just beyond the reach of current AI performance.

    The results show how difficult the assessment is. Early testing found that even top models struggled. GPT-4o scored 2.7%. Claude 3.5 Sonnet achieved 4.1%. OpenAI’s o1 model reached 8%. More recent systems, including Gemini 3.1 Pro and Claude Opus 4.6, have improved to roughly 40-50% accuracy, but they still do not demonstrate full mastery.

    Why a new benchmark matters

    According to Nguyen, the fact that AI has surpassed older benchmarks carries real-world consequences. He contributed 73 of the 2,500 public questions (the second-highest author) and wrote more questions in math and computer science than any other contributor.

    “Without accurate assessment tools, policymakers, developers, and users risk misinterpreting what AI systems can actually do,” he said. “Benchmarks provide the foundation for measuring progress and identifying risks.”

    As explained in the team’s paper, high scores on human-designed exams do not automatically indicate genuine intelligence. Such tests measure performance on tasks originally created for people, not machines. Strong results may reflect pattern matching rather than deep understanding.

    Not a threat, a tool

    Despite its apocalyptic name, Humanity’s Last Exam isn’t meant to suggest the end of human relevance. Instead, it highlights how much knowledge remains uniquely human and how far AI systems still have to go.

    “This isn’t a race against AI,” Nguyen said. “It’s a method for understanding where these systems are strong and where they struggle. That understanding helps us build safer, more reliable technologies. And, importantly, it reminds us why human expertise still matters.”

    A future-proof exam

    HLE is intended to serve as a long‑term, transparent benchmark for evaluating advanced AI systems. As part of that mission, the team has made some of the exam publicly available, while keeping most of the test questions hidden so AI models can’t memorize the answers.

    “For now, Humanity’s Last Exam stands as one of the clearest assessments of the gap between AI and human intelligence,” Nguyen said, “and despite rapid technological advances, it remains wide.”

    Research on a grand scale

    Nguyen noted the massive project reflects the importance of interdisciplinary, international research efforts.

    “What made this project extraordinary was the scale,” he said. “Experts from nearly every discipline contributed. It wasn’t just computer scientists; it was historians, physicists, linguists, medical researchers. That diversity is exactly what exposes the gaps in today’s AI systems —perhaps ironically, it’s humans working together.”

    Reference: “A benchmark of expert-level academic questions to assess AI capabilities” by Center for AI Safety, Scale AI and HLE Contributors Consortium, 28 January 2026, Nature.
    DOI: 10.1038/s41586-025-09962-4

    Funding: The Center for AI Safety and Scale AI consortia

    Never miss a breakthrough: Join the SciTechDaily newsletter.
    Follow us on Google and Google News.

    Artificial Intelligence Computer Science Machine Learning Texas A&M University
    Share. Facebook Twitter Pinterest LinkedIn Email Reddit

    Related Articles

    Machine Learning at Speed: Optimization Code Increases Performance by 5x

    MIT’s New Neural Network: “Liquid” Machine-Learning System Adapts to Changing Conditions

    How AI Sees Through the Looking Glass: Things Are Different on the Other Side of the Mirror

    Widely Used AI Machine Learning Methods Don’t Work as Claimed

    Hunting Down Cybercriminals With New Machine-Learning System

    New AI System Identifies Personality Traits from Eye Movements

    New Artificial Intelligence Device Identifies Objects at the Speed of Light

    Machine-Learning Models Capture Subtle Variations in Facial Expressions

    ‘Deep Learning’ Algorithm Brings New Tools to Astronomy

    8 Comments

    1. maher on February 28, 2026 1:16 pm

      The best test isn’t there yet & is far more simple: Make money fast.

      The faster the AI can make it, the more human it is

      Reply
    2. Cheryl V Johnson on February 28, 2026 3:20 pm

      How do you avoid teaching to the test. Some questions used in IQ tests became examples and even individuals who didn’t really understand the question started knowing the answer.

      Reply
    3. Marvin Rumery III on February 28, 2026 5:38 pm

      using a mainframe for learning or spreading information is a good thing but AI isn’t. There is no point in making a being more intelligent than we are

      Reply
      • Heck on March 2, 2026 5:31 am

        A mainframe? Which century are you from?

        Reply
    4. Stanley Korn on March 1, 2026 2:51 pm

      “A future-proof exam”

      Really?!

      When (not if) AIs master this so-called future-proof exam, maybe it will then be time to allow AIs to design the next such exam.

      Reply
    5. Stanley Korn on March 1, 2026 2:58 pm

      Here’s a twist: Have an AI design an exam that tests the limits of human intelligence.

      Reply
      • Minh on March 18, 2026 6:52 am

        Just stupid thinking; because for now AI dataset is filled with junk and most of the time evaluate it at false. The main problem is the creative thinking, inventing. Aspartame could not exist because a scientist had not licked his fingers and so many cases. How many times did you generate code again and again in hopes for it to work? The more knowledge you have the more you will fight it.

        Reply
    6. Scott on March 2, 2026 5:46 am

      We are not getting the toothpaste back in the tube, and feed or not feeding AI information/data will not slow its growth and capability. With China’s latest quantum chip being 100 time faster than Google’s, it will not be long before Quantum AI is accessing everything: behind paywalls, PINs, health account, bank accounts, even through blockchain encryption, and Government firewalls. Remember “Too Many Secrets” from the Robert Redford and Ben Kingsley movie: “Sneakers”? The issue here, is it isn’t contained in a small box transportable in a back pack. If it isn’t connected to every smart device in the world – I give it 6 months. Not going for doom and gloom, just the reality we are racing toward. The real question isn’t what is the industry going to do with it – it is what are you going to do with it. I can only change myself.

      Reply
    Leave A Reply Cancel Reply

    • Facebook
    • Twitter
    • Pinterest
    • YouTube

    Don't Miss a Discovery

    Subscribe for the Latest in Science & Tech!

    Trending News

    Scientists Discover Cheap, Natural Remedy for High Blood Pressure

    Earth’s Upper Atmosphere Is Cooling Fast and Scientists Finally Know Why

    32,000 Olympic Pools of Magma Nearly Erupted Beneath Atlantic Island

    Exercise Changes the Heart in a Way Researchers Never Expected

    Too Much Sleep May Age Your Body Faster, New Study Warns

    Scientists Uncover Promising New Strategy To Stop Parkinson’s in Its Tracks

    Experts Reveal the Surprising Cancer Link Behind a Common Vitamin

    This Strange “Golden Orb” Found 2 Miles Deep Stumped Scientists for Years

    Follow SciTechDaily
    • Facebook
    • Twitter
    • YouTube
    • Pinterest
    • Newsletter
    • RSS
    SciTech News
    • Biology News
    • Chemistry News
    • Earth News
    • Health News
    • Physics News
    • Science News
    • Space News
    • Technology News
    Recent Posts
    • The Type of Alcohol You Drink Could Affect How Long You Live
    • 19-Year Study Reveals the Surprising Truth About Sitting and Dementia
    • This Common Vitamin May Help Stop Prediabetes From Turning Into Diabetes
    • Canada’s Billion-Year-Old Rocks Could Hold the Future of Clean Energy
    • Climate Change Is Quietly Choking Rivers Across the Planet
    Copyright © 1998 - 2026 SciTechDaily. All Rights Reserved.
    • Science News
    • About
    • Contact
    • Editorial Board
    • Privacy Policy
    • Terms of Use

    Type above and press Enter to search. Press Esc to cancel.