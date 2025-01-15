Researchers have innovatively merged protein structural data with genetic sequences to construct evolutionary trees, revealing deep-rooted relationships among species with enhanced accuracy.

This novel approach leverages both experimentally determined and predicted protein structures, potentially revolutionizing our understanding of life’s history and advancing health sciences by refining targets for cancer therapies and more.

Protein Structures in Evolutionary Studies

The three-dimensional shape of proteins is unlocking ancient evolutionary connections in the tree of life, according to a study published today (January 15) in Nature Communications.

For the first time, researchers have combined protein shape data with genomic sequences to build more reliable evolutionary trees. These trees are vital tools for scientists, helping them explore the history of life, track the spread of pathogens, and develop new treatments for diseases.

Overcoming Data Saturation with Protein Structures

Notably, this method works even with predicted protein structures that haven’t been experimentally verified. With tools like AlphaFold 2 generating vast amounts of structural data, this approach could provide new insights into the deep history of life on Earth.

There are 210 thousand experimentally determined protein structures but 250 million known protein sequences. Initiatives like the EarthBioGenome project could generate billions more protein sequences in the next few years. The abundance of data opens the door to applying the approach on an unprecedented scale.

Traditional vs. Structural Phylogenetic Approaches

For many decades, biologists have been reconstructing evolution by tracing how species and genes diverge from common ancestors. These phylogenetic or evolutionary trees are traditionally built by comparing DNA or protein sequences and counting the similarities and differences to infer relationships.

However, researchers face a significant hurdle – a problem known as saturation. Over vast timescales, genomic sequences can change so much that they no longer resemble their ancestral forms, erasing signals of shared heritage.

“The issue of saturation dominates phylogeny and represents the main obstacle for the reconstruction of ancient relationships,” says Dr. Cedric Notredame, researcher at the Centre for Genomic Regulation (CRG) and lead author of the study. “It’s like the erosion of an ancient text. The letters become indistinct, and the message is lost.”

Advantages of Using Structural Data in Phylogenetics

To overcome this challenge, the research team turned to the physical structures of proteins. Proteins fold into complex shapes that determine a cell’s function. These shapes are more conserved over evolutionary time than the sequences themselves, meaning they change more slowly and retain ancestral features for longer.

The shape of a protein is dictated by its amino acid sequence. While sequences may mutate, the overall structure often remains similar to preserve function. The researchers hypothesized they could gauge how much the structures diverge over time by measuring the distance between pairs of amino acids within a protein, also known as intra-molecular distances (IMDs).

Methodology and Impact of Structural Phylogenetics

The study compiled a massive dataset of proteins with known structures, covering a wide range of species. They calculated the IMDs for each protein and used these measurements to construct phylogenetic trees.

They found that trees built from structural data closely matched those derived from genetic sequences, but with a crucial advantage: the structural trees were less affected by saturation. This means they retained reliable signals even when genetic sequences had diverged significantly.

Practical Implications and Future Applications

Recognizing that both sequences and structures offer valuable insights, the team developed a combined approach which not only improved the reliability of the tree branches but also helped distinguish between correct and incorrect relationships.

“It’s akin to having two witnesses describe an event from different angles,” explains Dr. Leila Mansouri, coauthor of the study. “Each provides unique details, but together they give a fuller, more accurate account.”

One practical example where the combined approach could make a significant impact is in understanding the relationships among kinases in the human genome. Kinases are proteins involved in many different important cellular functions.

“The genome of most mammals, including humans, contains about 500 protein kinases that regulate most aspects of our biology,” says Dr. Notredame. “These kinases are major targets for cancer therapy, for example drugs like imatinib for humans or toceranib for dogs.”

Human kinases have arisen through duplications occurring over the last billion years. “Within the human genome, the most distantly related kinases are about a billion years apart,” says Dr. Notredame. “They duplicated in the common ancestor of the common ancestor of our common ancestor.”

This vast timescale makes it incredibly difficult to build accurate gene trees that show how all these kinases are related. “Yet, as imperfect as it may be, the kinase evolutionary tree is widely used to understand how it interacts with other drugs. Improving this tree, or improving trees of other important protein families, would be an important advance for human health,” adds Dr. Notredame.

Expanding the Utility of Evolutionary Trees

The potential applications of the work go beyond cancer. Using the approach to create more accurate evolutionary trees could also improve our understanding of how diseases evolve more generally, aiding in the development of vaccines and treatments. They can also help shed light on the origins of complex traits, guide the discovery of new enzymes for biotechnology, and even help trace the spread of species in response to climate change.

Reference: “multistrap: boosting phylogenetic analyses with structural information” by Athanasios Baltzis, Luisa Santus, Björn E. Langer, Cedrik Magis, Damien M. de Vienne, Olivier Gascuel, Leila Mansouri and Cedric Notredame, 15 January 2025, Nature Communications.

DOI: 10.1038/s41467-024-55264-0

Never miss a breakthrough: Join the SciTechDaily newsletter.

Follow us on Google, Discover, and News.