From AI Black Boxes to Physics: The New Frontier of Protein Folding Prediction

Protein Folding Molecular Chemistry Art Concept

The University of Tokyo’s new protein folding model, WSME-L, offers enhanced predictions over traditional models. This breakthrough can impact medical research, including studying Alzheimer’s and Parkinson’s, and help in designing functional proteins for medical and industrial uses.

New protein folding models could lead to new medicines and industrial processes.

Proteins are important molecules that perform a variety of functions essential to life. To function properly, many proteins must fold into specific structures. However, the way proteins fold into specific structures is still largely unknown.

Researchers from the University of Tokyo developed a novel physical theory that can accurately predict how proteins fold. Their model can predict things previous models cannot. Improved knowledge of protein folding could offer huge benefits to medical research, as well as to various industrial processes.

The Vital Role of Proteins

You are literally made of proteins. These chainlike molecules, made from tens to thousands of smaller molecules called amino acids, form things like hair, bones, muscles, enzymes for digestion, antibodies to fight diseases, and more. Proteins make these things by folding into various structures that in turn build up these larger tissues and biological components.

Protein Folding Models

Four iterations of WSME, from the original to the new, and two specialized versions for more specific circumstances. Credit: ©2023 Ooka & Arai CC-BY

By knowing more about this folding process, researchers can better understand more about the processes that constitute life itself. Such knowledge is also essential to medicine, not only for the development of new treatments and industrial processes to produce medicines, but also for knowledge of how certain diseases work, as some are examples of protein folding gone wrong. So, to say proteins are important is putting it mildly. Proteins are the stuff of life.

A New Approach to Prediction

Encouraged by the importance of protein folding, Project Assistant Professor Koji Ooka from the College of Arts and Sciences and Professor Munehito Arai from the Department of Life Sciences and Department of Physics embarked on the hard task of improving upon the prediction methods of protein folding. This task is formidable for many reasons. In particular, the computational requirements to simulate the dynamics of molecules necessitate a powerful supercomputer.

Protein Folding Landscapes

Example maps with protein folding pathways. Credit: ©2023 Ooka & Arai CC-BY

Recently, the artificial intelligence-based program AlphaFold 2 accurately predicts structures resulting from a given amino acid sequence; but it cannot give details of the way proteins fold, making it a black box. This is problematic, as the forms and behaviors of proteins vary such that two similar ones may fold in radically different ways. So, instead of AI, the duo needed a different approach: statistical mechanics, a branch of physical theory.

Evolution of Existing Models

“For over 20 years, a theory called the Wako-Saitô-Muñoz-Eaton (WSME) model has successfully predicted the folding processes for proteins comprising around 100 amino acids or fewer, based on the native protein structures,” said Arai. “WSME can only evaluate small sections of proteins at a time, missing potential connections between sections farther apart. To overcome this issue, we produced a new model, WSME-L, where the L stands for ‘linker.’ Our linkers correspond to these nonlocal interactions and allow WSME-L to elucidate the folding process without the limitations of protein size and shape, which AlphaFold 2 cannot.”

But it doesn’t end there. There are other limitations of existing protein folding models that Ooka and Arai set their sights on. Proteins can exist inside or outside of living cells; those within are in some ways protected by the cell, but those outside cells, such as antibodies, require additional bonds during folding, called disulfide bonds, which help to stabilize them.

Conventional models cannot factor in these bonds, but an extension to WSME-L called WSME-L(SS), where each S stands for sulfide, can. To further complicate things, some proteins have disulfide bonds before folding starts, so the researchers made a further enhancement called WSME-L(SSintact), which factors in that situation at the expense of extra computation time.

“Our theory allows us to draw a kind of map of protein folding pathways in a relatively short time; mere seconds on a desktop computer for short proteins, and about an hour on a supercomputer for large proteins, assuming the native protein structure is available by experiments or AlphaFold 2 prediction,” said Arai.

“The resulting landscape allows a comprehensive understanding of multiple potential folding pathways a long protein might take. And crucially, we can scrutinize structures of transient states. This might be helpful for those researching diseases like Alzheimer’s and Parkinson’s — both are caused by proteins that fail to fold correctly. Also, our method may be useful for designing novel proteins and enzymes which can efficiently fold into stable functional structures, for medical and industrial use.”

While the models produced here accurately reflect experimental observations, Ooka and Arai hope they can be used to elucidate the folding processes of many proteins that have not yet been studied experimentally. Humans have about 20,000 different proteins, but only around 100 have had their folding processes thoroughly studied.

Reference: “Accurate prediction of protein folding mechanisms by simple structure-based statistical mechanical models” Koji Ooka and Munehito Arai, 19 October 2023, Nature Communications.
DOI: 10.1038/s41467-023-41664-1

This work was supported by JSPS KAKENHI Grant Numbers JP16H02217, JP19H02521, JP21K18841, and JP23H04545 (M.A.), Kayamori Foundation of Informational Science Advancement (M.A.), and a Grant-in-Aid for JSPS Fellows Grant Number JP20J11762 (K.O.).

Be the first to comment on "From AI Black Boxes to Physics: The New Frontier of Protein Folding Prediction"

Leave a comment

Email address is optional. If provided, your email will not be published or shared.