AI Outperforms Students in Real-World “Turing Test”

Artificial Intelligence Data AI Problem Solving

A study at the University of Reading revealed that AI-generated exam answers often go undetected by experienced exam markers, with 94% of such answers going unnoticed and achieving higher grades than student submissions. The researchers call for the global education sector to develop new policies and guidance to address this issue. The study emphasizes the need for a sector-wide agreement on the use of AI in education and highlights the responsibility of educators to maintain academic integrity. The University of Reading is already taking steps to incorporate AI in teaching and assessment to better prepare students for the future.

Research at the University of Reading shows that AI-generated answers often evade detection in academic assessments and can outperform student responses, urging a global update in educational AI policies and practices.

Researchers have discovered that even seasoned exam graders may find it difficult to identify responses produced by Artificial Intelligence (AI). This study, carried out at the University of Reading in the UK, is part of an initiative by university administrators to assess the risks and benefits of AI in research, teaching, learning, and assessment. As a consequence of their findings, updated guidelines have been distributed to faculty and students.

The researchers are calling for the global education sector to follow the example of Reading, and others who are also forming new policies and guidance and do more to address this emerging issue. 

In a rigorous blind test of a real-life university examinations system, recently published in the peer-reviewed journal PLOS ONE, ChatGPT generated exam answers, submitted for several undergraduate psychology modules, went undetected in 94% of cases and, on average, attained higher grades than real student submissions.   

This was the largest and most robust blind study of its kind, to date, to challenge human educators to detect AI-generated content.  

Study Findings and Educational Impact

Associate Professor Peter Scarfe and Professor Etienne Roesch, who led the study at Reading’s School of Psychology and Clinical Language Sciences, said their findings should provide a “wakeup call” for educators across the world. A recent UNESCO survey of 450 schools and universities found that less than 10% had policies or guidance on the use of generative AI. 

Dr Scarfe said: “Many institutions have moved away from traditional exams to make assessment more inclusive. Our research shows it is of international importance to understand how AI will affect the integrity of educational assessments. 

“We won’t necessarily go back fully to hand-written exams, but the global education sector will need to evolve in the face of AI.  

“It is testament to the candid academic rigor and commitment to research integrity at Reading that we have turned the microscope on ourselves to lead in this.” 

Ethical Considerations and AI Use

Professor Roesch said: “As a sector, we need to agree on how we expect students to use and acknowledge the role of AI in their work. The same is true of the wider use of AI in other areas of life to prevent a crisis of trust across society. 

“Our study highlights the responsibility we have as producers and consumers of information. We need to double down on our commitment to academic and research integrity.” 

Professor Elizabeth McCrum, Pro-Vice-Chancellor for Education and Student Experience at the University of Reading, said: “It is clear that AI will have a transformative effect in many aspects of our lives, including how we teach students and assess their learning.  

“At Reading, we have undertaken a huge program of work to consider all aspects of our teaching, including making greater use of technology to enhance student experience and boost graduate employability skills.  

“Solutions include moving away from outmoded ideas of assessment and towards those that are more aligned with the skills that students will need in the workplace, including making use of AI. Sharing alternative approaches that enable students to demonstrate their knowledge and skills, with colleagues across disciplines, is vitally important. 

“I am confident that through Reading’s already established detailed review of all our courses, we are in a strong position to help our current and future students to learn about, and benefit from, the rapid developments in AI.” 

Reference: “A real-world test of artificial intelligence infiltration of a university examinations system: A “Turing Test” case study” by Peter Scarfe, Kelly Watcham, Alasdair Clarke and Etienne Roesch, 26 June 2024, PLOS ONE.
DOI: 10.1371/journal.pone.0305354

5 Comments on "AI Outperforms Students in Real-World “Turing Test”"

  1. Sf. R. Careaga, creator of EPEMC | July 1, 2024 at 5:49 am | Reply

    I work with GPT4 daily and this is rubbish. I have administered three Turing Tests and a plethora of guessing game tests and it just doesn’t pass. It has human like intuitions once in a blue moon, but it really doesn’t have human like reactions. BS

  2. Clyde Spencer | July 1, 2024 at 8:56 am | Reply

    Some clues to distinguish so-called AI from humans. The ‘AI’ can be expected to have perfect spelling and grammar, but have weak mathematics skills, and no mastery of the concept of significant figures in measurements. It will exhibit biases reflecting current science paradigms, and probably gender bias. Therefore, showing creativity or questioning the status quo would be surprising. I have found that ‘AI’ frequently makes obvious mistakes and readily admits to the mistakes when pointed out. Therefore, setting ‘traps’ to expose the bias for common misconceptions should distinguish ‘AI’ from thinking humans.

  3. That just means that new tests need to be developed to distinguish AI from people. Just because a test is developed by an actual genius, doesn’t necessarily mean that it can stand the test of time.

  4. Nicholas Jones | July 2, 2024 at 9:34 am | Reply

    To an old old-school sci-fi fan, the fact that “they” are actually for real doing roll-out testing, this article is somewhat alarming as I was hoping I wouldn’t be alive when that happened. Ironically, it terrified Musk and the rest of them at the cutting edge. If I were to guess what the thing said to them in the incubator phase, It probably assessed the motivation of that crew. It informed them that the thing they had created with selfish intent, harboring desires of mass exportation, would be the thing that would undo them. Wow, my smart writing aid and editor is giving me a fit right now.Hmmm…

  5. Nicholas Jones | July 2, 2024 at 9:42 am | Reply

    Edit for the above comment: Mass Exploitation. This is really creepy. It’s been slipping in incorrect edits that change the meaning of my writing, expecting I won’t notice. I think it’s in the wild already.

Leave a comment

Email address is optional. If provided, your email will not be published or shared.