ChatGPT vs. Humans: Even Linguistic Experts Can’t Tell Who Wrote What

Professor Examining Paper

Linguistics experts struggled to differentiate between AI- and human-generated writing, with a positive identification rate of only 38.9%, according to a new study. Despite logical reasoning behind their choices, they were frequently incorrect, suggesting that short AI-generated texts can be as competent as human writing.

Experts in linguistics struggle to differentiate between AI-produced and human-authored texts.

According to a recent study co-authored by an assistant professor from the University of South Florida, even linguistics experts struggle to discern between writings produced by artificial intelligence and those written by humans.

The findings, published in the journal Research Methods in Applied Linguistics, indicate that linguistic experts from top global journals could accurately distinguish between AI and human-authored abstracts only about 39 percent of the time.

“We thought if anybody is going to be able to identify human-produced writing, it should be people in linguistics who’ve spent their careers studying patterns in language and other aspects of human communication,” said Matthew Kessler, a scholar in the USF the Department of World Languages.

Working alongside J. Elliott Casal, assistant professor of applied linguistics at The University of Memphis, Kessler tasked 72 experts in linguistics with reviewing a variety of research abstracts to determine whether they were written by AI or humans.

Each expert was asked to examine four writing samples. None correctly identified all four, while 13 percent got them all wrong. Kessler concluded that, based on the findings, professors would be unable to distinguish between a student’s own writing or writing generated by an AI-powered language model such as ChatGPT without the help of software that hasn’t yet been developed.

Despite the experts’ attempts to use rationales to judge the writing samples in the study, such as identifying certain linguistic and stylistic features, they were largely unsuccessful with an overall positive identification rate of 38.9 percent.

“What was more interesting was when we asked them why they decided something was written by AI or a human,” Kessler said. “They shared very logical reasons, but again and again, they were not accurate or consistent.”

Based on this, Kessler and Casal concluded ChatGPT can write short genres just as well as most humans, if not better in some cases, given that AI typically does not make grammatical errors.

The silver lining for human authors lies in longer forms of writing. “For longer texts, AI has been known to hallucinate and make up content, making it easier to identify that it was generated by AI,” Kessler said.

Kessler hopes this study will lead to a bigger conversation to establish the necessary ethics and guidelines surrounding the use of AI in research and education.

Reference: “Can linguists distinguish between ChatGPT/AI and human writing?: A study of research ethics and academic publishing” by J. Elliott Casal, and Matt Kessler, 7 August 2023, Research Methods in Applied Linguistics.
DOI: 10.1016/j.rmal.2023.100068

4 Comments on "ChatGPT vs. Humans: Even Linguistic Experts Can’t Tell Who Wrote What"

  1. Considering the “AI” content I’ve seen, this is an indictment of the bad writing humans create. More information about the methodology is necessary to judge; was it really just the first random 4 samples of writing the AI produced?

  2. “We thought if anybody is going to be able to identify human-produced writing, it should be people in linguistics who’ve spent their careers studying patterns in language and other aspects of human communication,” This is a major error, it is the casual reader who is more likely to distinguish between AI and human writing, and it’s not that difficult. I often abandon reading articles having identified that they have been written by AI, and can only assume the linguistics experts who failed to detect AI are so transfixed by finding patterns in the vocabularly that they lost the ability to approach the text as a casual reader.

    • I think that one of the biggest tells is if the dialog confidently expounds on a technical area that a subject-matter specialist recognizes as being wrong or illogical. Of course, in the day of the internet, one encounters what appear to be people doing something similar, but their grammar, punctuation, and spelling seem to give them away as being everyday jerks.

  3. Translate the old french nostradamus quatrain ChatGPT AI and describe it in plain english.

Leave a comment

Email address is optional. If provided, your email will not be published or shared.