Close Menu
    Facebook X (Twitter) Instagram
    SciTechDaily
    • Biology
    • Chemistry
    • Earth
    • Health
    • Physics
    • Science
    • Space
    • Technology
    Facebook X (Twitter) Pinterest YouTube RSS
    SciTechDaily
    Home»Biology»Scientists Uncover Hidden Clues to the Origin of the Genetic Code
    Biology

    Scientists Uncover Hidden Clues to the Origin of the Genetic Code

    By University of Illinois College of Agricultural, Consumer and Environmental SciencesApril 13, 20262 Comments6 Mins Read
    Facebook Twitter Pinterest Telegram LinkedIn WhatsApp Email Reddit
    Share
    Facebook Twitter LinkedIn Pinterest Telegram Email Reddit
    Gustavo Caetano Anollés
    Gustavo Caetano-Anollés. Credit: Fred Zwicky

    Clues to the genetic code’s origin may be hidden in tiny protein fragments, revealing a synchronized and highly structured path to life’s earliest molecular systems.

    Genes act as the instruction manual for life, encoding the information that allows cells to build, repair, and reproduce. But scientists still struggle to explain how this system first emerged.

    A new study from the University of Illinois Urbana-Champaign takes a different approach, suggesting that the answer may be hidden not in DNA itself, but in the simplest building blocks of proteins.

    “We find the origin of the genetic code mysteriously linked to the dipeptide composition of a proteome, the collective of proteins in an organism,” said corresponding author Gustavo Caetano-Anollés, professor in the Department of Crop Sciences, the Carl R. Woese Institute for Genomic Biology, and Biomedical and Translation Sciences of Carle Illinois College of Medicine at U. of I.

    Tracing Evolution Through Molecular History

    Caetano-Anollés specializes in phylogenomics, the study of how genomes evolve and relate to one another. His team previously created evolutionary maps of protein domains (structural units in proteins) and transfer RNA (tRNA), which carries amino acids to ribosomes during protein production.

    In this study, the researchers focused on dipeptides, simple units made of two amino acids linked by a peptide bond. Their analysis showed that the evolutionary patterns of protein domains, tRNA, and dipeptides closely align, suggesting a shared history.

    Life on Earth began about 3.8 billion years ago, but the genetic code likely appeared roughly 800 million years later. Scientists still debate how this transition occurred. Some propose that RNA-based enzymes came first, while others argue that proteins initially drove early biological activity.

    Caetano-Anollés and his colleagues support the protein-first perspective. Their earlier work indicates that interactions involving ribosomal proteins and tRNA developed later, not at the very beginning of life.

    Two Interconnected Biological Codes

    According to Caetano-Anollés, life depends on two tightly linked systems. The genetic code stores information in nucleic acids (DNA and RNA), while the protein code determines how molecules carry out cellular functions. The ribosome connects these systems by assembling proteins from amino acids delivered by tRNA.

    Aminoacyl tRNA synthetases are the enzymes responsible for attaching the correct amino acids to tRNA molecules. These enzymes help maintain accuracy during protein production and play a central role in preserving the integrity of the genetic code.

    “Why does life rely on two languages – one for genes and one for proteins?” Caetano-Anollés asked. “We still don’t know why this dual system exists or what drives the connection between the two. The drivers couldn’t be in RNA, which is functionally clumsy. Proteins, on the other hand, are experts in operating the sophisticated molecular machinery of the cell.”

    Dipeptides and the Earliest Protein Structures

    The researchers suggest that the proteome may hold important clues about the earliest stages of genetic code development. Dipeptides appear to have been especially important as early structural building blocks of proteins. There are 400 possible dipeptide combinations, and their frequency varies across organisms.

    To investigate this, the team analyzed 4.3 billion dipeptide sequences from 1,561 proteomes spanning the three major domains of life: Archaea, Bacteria, and Eukarya. They used these data to build an evolutionary timeline of dipeptides and compared it with patterns seen in protein structural domains.

    Previous research from the group had also mapped the evolution of tRNA, providing a timeline for when amino acids became part of the genetic code. These amino acids were grouped based on when they appeared. Group 1 included early amino acids such as tyrosine, serine, and leucine, while Group 2 added eight more.

    These early groups were linked to the emergence of error-correcting mechanisms in synthetase enzymes and to an early operational code that ensured each codon matched a specific amino acid. Group 3 consisted of amino acids that appeared later and contributed to more advanced functions in the modern genetic code.

    Converging Evidence Across Biological Systems

    The researchers had already shown that synthetases and tRNA evolved together as amino acids were incorporated into the genetic code. By adding dipeptides to the analysis, they tested whether this pattern held across another level of biological organization.

    “We found the results were congruent,” Caetano-Anollés explained. “Congruence is a key concept in phylogenetic analysis. It means that a statement of evolution obtained with one type of data is confirmed by another. In this case, we examined three sources of information: protein domains, tRNAs, and dipeptide sequences. All three reveal the same progression of amino acids being added to the genetic code in a specific order.”

    The study also identified a striking symmetry in dipeptide pairs. Each dipeptide consists of two amino acids, such as alanine-leucine (AL), while its counterpart, called an anti-dipeptide, reverses the order to leucine-alanine (LA). These pairs act as complementary, mirror-like structures.

    “We found something remarkable in the phylogenetic tree,” Caetano-Anollés said. “Most dipeptide and anti-dipeptide pairs appeared very close to each other on the evolutionary timeline. This synchronicity was unanticipated. The duality reveals something fundamental about the genetic code with potentially transformative implications for biology. It suggests dipeptides were arising encoded in complementary strands of nucleic acid genomes, likely minimalistic tRNAs that interacted with primordial synthetase enzymes.”

    Implications for Modern Science

    Understanding how the genetic code evolved provides insight into the origins of life and supports advances in fields such as synthetic biology, biomedical research, and genetic engineering.

    “Synthetic biology is recognizing the value of an evolutionary perspective. It strengthens genetic engineering by letting nature guide the design. Understanding the antiquity of biological components and processes is important because it highlights their resilience and resistance to change. To make meaningful modifications, it is essential to understand the constraints and underlying logic of the genetic code,” Caetano-Anollés said.

    Reference: “Tracing the Origin of the Genetic Code and Thermostability to Dipeptide Sequences in Proteomes” by Minglei Wang, M. Fayez Aziz and Gustavo Caetano-Anollés, 14 August 2025, Journal of Molecular Biology.
    DOI: 10.1016/j.jmb.2025.169396

    The study was supported by grants from the National Science Foundation (MCB-0749836 and OISE-1132791), the United States Department of Agriculture (ILLU-802-909 and ILLU-483-625), and Blue Waters supercomputer allocations from the National Center for Supercomputing Applications to Caetano-Anollés.

    Never miss a breakthrough: Join the SciTechDaily newsletter.
    Follow us on Google and Google News.

    Bioinformatics DNA Genetics Molecular Biology University of Illinois at Urbana-Champaign
    Share. Facebook Twitter Pinterest LinkedIn Email Reddit

    Related Articles

    New Tool Reads DNA and RNA in a Single Cell, Unlocking Secrets of Disease

    Landmark Study: Sequencing of 64 Full Human Genomes to Better Capture Genetic Diversity

    Genetic Analysis Reveals Evolution of the Enigmatic Y Chromosome in Great Apes

    New Tool Developed to Sequence Circular DNA

    Mechanical Forces Shape Animal “Origami” Precisely Despite “Noise” and Genetic Variation

    Online DNA Services Vulnerable to Genetic Hacking

    Re-Cracking the Genetic Code – We May Have Only Begun to Scratch the Surface

    Key Differences in Seemingly Synonymous Parts of the Genetic Code

    Sequencing DNA From Individual Cells Yields Dramatic New Information

    2 Comments

    1. kamir bouchareb st on April 13, 2026 1:46 pm

      thanks

      Reply
    2. Torbjörn Larsson on April 14, 2026 11:18 pm

      The paper is an eager sequence pattern search which substitutes for genetic phylogenies. It is also using late evolution eukaryotes as the dominant data contributor to ~ 3/4 of patterns (dipeptides).

      Since protein domains conserve better than sequences over deep time which essentially corrupt the sequence patterns, using domains has become the more used and phylogenetically better supported method. Speaking of congruences, a paper and a preprint that explore the phylogeny of evolution of genetic code [S. Wehbi,A. Wheeler,B. Morel,N. Manepalli,B.Q. Minh,D.S. Lauretta, & J. Masel, Order of amino acid recruitment into the genetic code resolved by last universal common ancestor’s protein domains, Proc. Natl. Acad. Sci. U.S.A. 121 (52) e2410311121] and core metabolism [Gradual assembly of metabolism at a phosphorylating hydrothermal vent, Mrnjavac et al, q-bio arXiv:2510.08410] show a 2 sigma congruence, well above the usual 80 % threshold. They agree on 7 of the 10 first of the 20 standard amino acid codons which assuming a binomial distribution is a 95 % likelihood, a significant congruence. Notably, the genetic code evolution paper show an early quasispecies like rampant horizontal gene transfer before the robust code was established, which is a positive sign that it is correct. The paper also show other phylogenetic signals that are congruent with the metabolic paper, of small molecules recruited first before the cell membrane import/export properties can be compositionally controlled (by way of incorporating the carboxylation biotine cofactor) and of metal-dependent catalysis before cofactor evolution is complete.

      The new paper agree with the code evolution paper on mere 5 of the first 11 amino acids or a 50 % likelihood, a random result (as I would expect).

      Reply
    Leave A Reply Cancel Reply

    • Facebook
    • Twitter
    • Pinterest
    • YouTube

    Don't Miss a Discovery

    Subscribe for the Latest in Science & Tech!

    Trending News

    Artificial Sweeteners May Harm Future Generations, Study Suggests

    Splashdown! NASA Artemis II Returns From Record-Breaking Moon Mission

    What If Consciousness Exists Beyond Your Brain

    Scientists Finally Crack the 100-Million-Year Evolutionary Mystery of Squid and Cuttlefish

    Beyond “Safe Levels”: Study Challenges What We Know About Pesticides and Cancer

    Researchers Have Found a Dietary Compound That Increases Longevity

    Scientists Baffled by Bizarre “Living Fossil” From 275 Million Years Ago

    Your IQ at 23 Could Predict Your Wealth at 27, Study Finds

    Follow SciTechDaily
    • Facebook
    • Twitter
    • YouTube
    • Pinterest
    • Newsletter
    • RSS
    SciTech News
    • Biology News
    • Chemistry News
    • Earth News
    • Health News
    • Physics News
    • Science News
    • Space News
    • Technology News
    Recent Posts
    • What if Dark Matter Has Two Forms? Bold New Hypothesis Could Explain a Cosmic Mystery
    • Researchers Expose Hidden Chemistry of “Ore-Forming” Elements in Biology
    • Geologists Reveal the Americas Collided Earlier Than We Thought
    • 20x Difference: Study Reveals True Source of Airborne Microplastics
    • Scientists Uncover Hidden Force Powering Yellowstone’s Supervolcano
    Copyright © 1998 - 2026 SciTechDaily. All Rights Reserved.
    • Science News
    • About
    • Contact
    • Editorial Board
    • Privacy Policy
    • Terms of Use

    Type above and press Enter to search. Press Esc to cancel.