The Limits of AlphaFold: High Schoolers Reveal AI’s Flaws in Bioinformatics Challenge

Human Brain vs Artificial Intelligence Concept

Scientists at Skoltech Bio have tested AlphaFold, the artificial intelligence program that solved the central problem of structural bioinformatics by predicting protein structures, on another challenge in the field. The team asked AlphaFold to predict the impact of single mutations on protein stability, and the results contradicted experimental findings, suggesting that the AI is not a cure-all for structural bioinformatics. The authors also refuted claims by some AlphaFold enthusiasts that the program had mastered the ultimate protein physics and should work beyond the task it was designed for. The findings were reported in a PLOS One study.

Skoltech Bio scientists tested AlphaFold on predicting the impact of single mutations on protein stability, and the AI program’s predictions contradicted experimental findings, refuting claims that it had mastered the ultimate protein physics.

A bioinformatics boot camp for high schoolers at Skoltech turned into a venue for the latest chapter in the ongoing contest between humans and artificial intelligence in science. Having earlier resolved a key 50-year-old problem of structural bioinformatics, the breakthrough AI program AlphaFold proved inapplicable to another challenge researchers in this field are faced with. This finding is reported in a PLOS One study, whose authors refute the claims by some AlphaFold enthusiasts that DeepMind’s AI has mastered the ultimate protein physics and is the be-all and end-all of structural bioinformatics.

Structural bioinformatics is a branch of science that explores the structures of proteins, RNA, DNA, and their interactions with other molecules. The findings supply the basis for drug discovery and the creation of proteins with exciting properties, such as the catalysts of reactions not seen in the natural world.

Historically, the central problem of structural bioinformatics was predicting protein structures. That is, given an arbitrary sequence of amino acids that comprise a protein, how do you reliably compute what 3D shape that protein will assume in the body — and therefore how it will function.

Playing With AlphaFold2

Poster of the project Playing With AlphaFold2 at the School of Molecular and Theoretical Biology held by Skoltech online in 2021. Credit: Dmitry Ivankov/Skoltech

After 50 years, the problem was resolved by AlfaFold, an artificial intelligence program created by Google’s DeepMind, whose predecessors earlier made headlines by achieving superhuman performance in chess, the game of go, and the video game StarCraft II.

This milestone achievement led to speculations that the neural network must have somehow internalized the underlying physics of proteins and should work beyond the task it was designed for. Some people, even in the structural bioinformatics community, expected that the AI would soon give the definitive answers to that discipline’s remaining questions and consign it to the history of science.

“We decided to settle this and put AlphaFold to work on another central task of structural bioinformatics: predicting the impact of single mutations on protein stability. That means you choose a certain known protein and introduce exactly one mutation, the smallest change possible. And you want to know whether the resulting mutant is more stable or less stable and to what extent. AlphaFold was clearly unable to do this, as evidenced by its predictions contradicting the known experimental findings,” the study’s principal investigator, Assistant Professor Dmitry Ivankov of Skoltech Bio, commented.

Asked about the role of the high school students taking part in the project, the researcher said they were involved in mutation data processing, writing scripts for handling prediction results, visualizing the structures specified by AlphaFold, and basically fooling around with the online version of the AI.

Ivankov emphasized that AlphaFold’s creators never actually claimed that the AI was applicable to other tasks besides predicting protein structures based on their amino acid sequences. “But some machine learning enthusiasts were quick to prophesy the end of structural bioinformatics. So we thought it a good idea to go ahead and check, and we now know it cannot predict the effect of single mutations,” Ivankov added.

On a practical level, predicting how single mutations affect protein stability is useful for sifting through the many possible mutations to determine which ones might be useful. This comes in handy, for example, if you want to make a protein additive for laundry detergents resistant to higher temperatures so it could break down the fats, starch, fibers, or other proteins in hotter water. Also, sweet proteins are known that could someday be used in place of sugar, provided they can withstand the heat of a cup of coffee or tea.

On a more fundamental level, the findings of the study show that the artificial intelligence of today is no cure-all, and while it might be wildly successful in solving one problem, others remain, including a dozen or so major challenges in structural bioinformatics. Among them are predicting the structures of complexes made up of proteins and either small molecules or DNA or RNA, determining how mutations affect the binding energy of proteins with other molecules, and designing proteins with amino acid sequences that endow them with desired properties, such as the ability to catalyze otherwise impossible reactions, serving as an element of a tiny “molecular factory.”

Besides issuing a reminder that even in the wake of AlphaFold, scientists in their field have one or two things to do, the authors of the study in PLOS One examine the contention that the AI program’s success stems from its “having learned physics,” as opposed to just internalizing the totality of the protein structures known to humanity and cleverly manipulating them. Apparently, this is not the case, because knowing the physics involved, it should be relatively easy to compare two very similar but not identical structures in terms of their stability, but it is precisely the task AlphaFold did not accomplish.

This point is supported by two previously voiced reservations regarding the AI’s “knowledge” of physics. First, AlphaFold predicts some structures with side groups dangling in a way that suggests a zinc ion to be bound to them. However, the program’s input is limited to the protein’s amino acid sequence, so the only reason why the “invisible zinc” is there is that the AI was trained on analogous protein structures bound to this ion. Without the zinc, the predicted side group orientation contradicts physics. Second, AlphaFold can predict a solitary protein structure that looks sort of like a spiral and is indeed accurate — provided that it is interlaced with two other such chains. Without them, the prediction is physically unsound. So rather than rely on physics, the program must be simply reproducing a shape it isolated from a compound structure.

“Interestingly, this research grew out of a ‘playful’ project featuring the participants of the School of Molecular and Theoretical Biology. We called it ‘Games With AlphaFold.’ The moment AlphaFold became openly accessible, our lab installed it on the Zhores supercomputer. One of the games involved comparing the known mutation effects with what AlphaFold predicts for the original and the mutant proteins. This led to a study, in which high schoolers got the chance to simultaneously experience a supercomputer and advanced artificial intelligence,” the study’s lead author, Skoltech PhD student Marina Pak, commented.

Reference: “Using AlphaFold to predict the impact of single mutations on protein stability and function” by Marina A. Pak, Karina A. Markhieva, Mariia S. Novikova, Dmitry S. Petrov, Ilya S. Vorobyev, Ekaterina S. Maksimova, Fyodor A. Kondrashov and Dmitry N. Ivankov, 16 March 2023, PLOS One.
DOI: 10.1371/journal.pone.0282689

The study reported in this story was co-authored by Skoltech scientists, their colleagues from the Institute of Science and Technology Austria and Okinawa Institute of Science and Technology, Japan, and high school students who currently study at Ural Federal University and the Peoples’ Friendship University of Russia, and Armand Hammer United World College of the American West.

3 Comments on "The Limits of AlphaFold: High Schoolers Reveal AI’s Flaws in Bioinformatics Challenge"

  1. Anyone can find faults in a system if the system wasn’t trained for the information you are asking for. Hey McDonald’s, can you make me a whopper. Asking for a non descript mutation is
    like looking for a made up word in a dictionary. If they trained on the data, I’m sure it would work. There’s too much information missing. Did they wrangle their own data? Who went over the training? What system did they use? An antiquated GTX9600? This is a major problem. High school students messing with gene modification/predictions? I cal 15 minutes of fame BS.

  2. I myself would never have said that AlphaFold “solved” the protein folding problem. What it did was took all the available information and codified it into a big “database” that can be searched for nearest matches (in some sense). But if there’s nothing “similar” in the database, it won’t find anything useful. And it’s no substitute for understanding the real physics … which the mere fact that it’s still quite difficult to predict exactly what a protein will do even given its structure, AlphaFold doesn’t really help us do that. An AI designed free energy function that made all the known structures have the lowest energy of all the possible structures for every known sequence might do some of that, but it’s a harder problem yet!

  3. I would like to see AI solve the He atom Schrodinger equation or the Sun-Earth-Moon Hamilton equation exactly.

Leave a comment

Email address is optional. If provided, your email will not be published or shared.