Close Menu
    Facebook X (Twitter) Instagram
    SciTechDaily
    • Biology
    • Chemistry
    • Earth
    • Health
    • Physics
    • Science
    • Space
    • Technology
    Facebook X (Twitter) Pinterest YouTube RSS
    SciTechDaily
    Home»Biology»Going Beyond Human Brains: Deep Learning Takes On Synthetic Biology
    Biology

    Going Beyond Human Brains: Deep Learning Takes On Synthetic Biology

    By Wyss Institute at Harvard UniversityOctober 7, 20202 Comments12 Mins Read
    Facebook Twitter Pinterest Telegram LinkedIn WhatsApp Email Reddit
    Share
    Facebook Twitter LinkedIn Pinterest Telegram Email Reddit
    Ribocomputing
    Work by Wyss Core Faculty member Peng Yin in collaboration with Collins and others has demonstrated that different toehold switches can be combined to compute the presence of multiple “triggers,” similar to a computer’s logic board. Credit: Wyss Institute at Harvard University

    DNA and RNA have been compared to “instruction manuals” containing the information needed for living “machines” to operate. But while electronic machines like computers and robots are designed from the ground up to serve a specific purpose, biological organisms are governed by a much messier, more complex set of functions that lack the predictability of binary code. Inventing new solutions to biological problems requires teasing apart seemingly intractable variables — a task that is daunting to even the most intrepid human brains.

    Two teams of scientists from the Wyss Institute at Harvard University and the Massachusetts Institute of Technology have devised pathways around this roadblock by going beyond human brains; they developed a set of machine learning algorithms that can analyze reams of RNA-based “toehold” sequences and predict which ones will be most effective at sensing and responding to a desired target sequence. As reported in two papers published concurrently today (October 7, 2020) in Nature Communications, the algorithms could be generalizable to other problems in synthetic biology as well, and could accelerate the development of biotechnology tools to improve science and medicine and help save lives.

    “These achievements are exciting because they mark the starting point of our ability to ask better questions about the fundamental principles of RNA folding, which we need to know in order to achieve meaningful discoveries and build useful biological technologies,” said Luis Soenksen, Ph.D., a Postdoctoral Fellow at the Wyss Institute and Venture Builder at MIT’s Jameel Clinic who is a co-first author of the first of the two papers.


    In this animation, Wyss Institute Postdoctoral Fellow Alex Green, Ph.D., the lead author of “Toehold Switches: De–Novo–Designed Regulators of Gene Expression”, narrates a step–by–step guide to the mechanism of the synthetic toehold switch gene regulator. Credit: Wyss Institute at Harvard University

    Getting ahold of toehold switches

    The collaboration between data scientists from the Wyss Institute’s Predictive BioAnalytics Initiative and synthetic biologists in Wyss Core Faculty member Jim Collins’ lab at MIT was created to apply the computational power of machine learning, neural networks, and other algorithmic architectures to complex problems in biology that have so far defied resolution. As a proving ground for their approach, the two teams focused on a specific class of engineered RNA molecules: toehold switches, which are folded into a hairpin-like shape in their “off” state. When a complementary RNA strand binds to a “trigger” sequence trailing from one end of the hairpin, the toehold switch unfolds into its “on” state and exposes sequences that were previously hidden within the hairpin, allowing ribosomes to bind to and translate a downstream gene into protein molecules. This precise control over the expression of genes in response to the presence of a given molecule makes toehold switches very powerful components for sensing substances in the environment, detecting disease, and other purposes.

    However, many toehold switches do not work very well when tested experimentally, even though they have been engineered to produce a desired output in response to a given input based on known RNA folding rules. Recognizing this problem, the teams decided to use machine learning to analyze a large volume of toehold switch sequences and use insights from that analysis to more accurately predict which toeholds reliably perform their intended tasks, which would allow researchers to quickly identify high-quality toeholds for various experiments.

    Deep Learning Framework RNA
    After generating a data set of thousands of toehold switches, one team used a computer vision-based algorithm to analyze the toehold sequences as two-dimensional images, while the other team used natural language processing to interpret the sequences as “words” written in the “language” of RNA. Credit: Wyss Institute at Harvard University

    The first hurdle they faced was that there was no dataset of toehold switch sequences large enough for deep learning techniques to analyze effectively. The authors took it upon themselves to generate a dataset that would be useful to train such models. “We designed and synthesized a massive library of toehold switches, nearly 100,000 in total, by systematically sampling short trigger regions along the entire genomes of 23 viruses and 906 human transcription factors,”  said Alex Garruss, a Harvard graduate student working at the Wyss Institute who is a co-first author of the first paper. “The unprecedented scale of this dataset enables the use of advanced machine learning techniques for identifying and understanding useful switches for immediate downstream applications and future design.”

    Armed with enough data, the teams first employed tools traditionally used for analyzing synthetic RNA molecules to see if they could accurately predict the behavior of toehold switches now that there were manifold more examples available. However, none of the methods they tried — including mechanistic modeling based on thermodynamics and physical features — were able to predict with sufficient accuracy which toeholds functioned better.

    A picture is worth a thousand base pairs

    The researchers then explored various machine learning techniques to see if they could create models with better predictive abilities. The authors of the first paper decided to analyze toehold switches not as sequences of bases, but rather as two-dimensional “images” of base-pair possibilities. “We know the baseline rules for how an RNA molecule’s base pairs bond with each other, but molecules are wiggly — they never have a single perfect shape, but rather a probability of different shapes they could be in,” said Nicolaas Angenent-Mari, a MIT graduate student working at the Wyss Institute and co-first author of the first paper. “Computer vision algorithms have become very good at analyzing images, so we created a picture-like representation of all the possible folding states of each toehold switch, and trained a machine learning algorithm on those pictures so it could recognize the subtle patterns indicating whether a given picture would be a good or a bad toehold.”

    Deep Learning Framework Models
    By using both models sequentially, the researchers were able to predict which toehold sequences would produce high-quality sensors. Credit: Wyss Institute at Harvard University

    Another benefit of their visually-based approach is that the team was able to “see” which parts of a toehold switch sequence the algorithm “paid attention” to the most when determining whether a given sequence was “good” or “bad.” They named this interpretation approach Visualizing Secondary Structure Saliency Maps, or VIS4Map, and applied it to their entire toehold switch dataset. VIS4Map successfully identified physical elements of the toehold switches that influenced their performance, and allowed the researchers to conclude that toeholds with more potentially competing internal structures were “leakier” and thus of lower quality than those with fewer such structures, providing insight into RNA folding mechanisms that had not been discovered using traditional analysis techniques.

    “Being able to understand and explain why certain tools work or don’t work has been a secondary goal within the artificial intelligence community for some time, but interpretability needs to be at the forefront of our concerns when studying biology because the underlying reasons for those systems’ behaviors often cannot be intuited,” said Jim Collins, Ph.D., the senior author of the first paper. “Meaningful discoveries and disruptions are the result of deep understanding of how nature works, and this project demonstrates that machine learning, when properly designed and applied, can greatly enhance our ability to gain important insights about biological systems.” Collins is also the Termeer Professor of Medical Engineering and Science at MIT.

    Now you’re speaking my language

    While the first team analyzed toehold switch sequences as 2D images to predict their quality, the second team created two different deep learning architectures that approached the challenge using orthogonal techniques. They then went beyond predicting toehold quality and used their models to optimize and redesign poorly performing toehold switches for different purposes, which they report in the second paper.

    The first model, based on a convolutional neural network (CNN) and multi-layer perceptron (MLP), treats toehold sequences as 1D images, or lines of nucleotide bases, and identifies patterns of bases and potential interactions between those bases to predict good and bad toeholds. The team used this model to create an optimization method called STORM (Sequence-based Toehold Optimization and Redesign Model), which allows for complete redesign of a toehold sequence from the ground up. This “blank slate” tool is optimal for generating novel toehold switches to perform a specific function as part of a synthetic genetic circuit, enabling the creation of complex biological tools.

    “The really cool part about STORM and the model underlying it is that after seeding it with input data from the first paper, we were able to fine-tune the model with only 168 samples and use the improved model to optimize toehold switches. That calls into question the prevailing assumption that you need to generate massive datasets every time you want to apply a machine learning algorithm to a new problem, and suggests that deep learning is potentially more applicable for synthetic biologists than we thought,” said co-first author Jackie Valeri, a graduate student at MIT and the Wyss Institute.

    The second model is based on natural language processing (NLP), and treats each toehold sequence as a “phrase” consisting of patterns of “words,” eventually learning how certain words are put together to make a coherent phrase. “I like to think of each toehold switch as a haiku poem: like a haiku, it’s a very specific arrangement of phrases within its parent language — in this case, RNA. We are essentially training this model to learn how to write a good haiku by feeding it lots and lots of examples,” said co-first author Pradeep Ramesh, Ph.D., a Visiting Postdoctoral Fellow at the Wyss Institute and Machine Learning Scientist at Sherlock Biosciences.

    Ramesh and his co-authors integrated this NLP-based model with the CNN-based model to create NuSpeak (Nucleic Acid Speech), an optimization approach that allowed them to redesign the last 9 nucleotides of a given toehold switch while keeping the remaining 21 nucleotides intact. This technique allows for the creation of toeholds that are designed to detect the presence of specific pathogenic RNA sequences, and could be used to develop new diagnostic tests.

    The team experimentally validated both of these platforms by optimizing toehold switches designed to sense fragments from the SARS-CoV-2 viral genome. NuSpeak improved the sensors’ performances by an average of 160%, while STORM created better versions of four “bad” SARS-CoV-2 viral RNA sensors whose performances improved by up to 28 times.

    “A real benefit of the STORM and NuSpeak platforms is that they enable you to rapidly design and optimize synthetic biology components, as we showed with the development of toehold sensors for a COVID-19 diagnostic,” said co-first author Katie Collins, an undergraduate MIT student at the Wyss Institute who worked with MIT Associate Professor Timothy Lu, M.D., Ph.D., a corresponding author of the second paper.

    “The data-driven approaches enabled by machine learning open the door to really valuable synergies between computer science and synthetic biology, and we’re just beginning to scratch the surface,” said Diogo Camacho, Ph.D., a corresponding author of the second paper who is a Senior Bioinformatics Scientist and co-lead of the Predictive BioAnalytics Initiative at the Wyss Institute. “Perhaps the most important aspect of the tools we developed in these papers is that they are generalizable to other types of RNA-based sequences such as inducible promoters and naturally occurring riboswitches, and therefore can be applied to a wide range of problems and opportunities in biotechnology and medicine.”

    Additional authors of the papers include Wyss Core Faculty member and Professor of Genetics at HMS George Church, Ph.D.; and Wyss and MIT Graduate Students Miguel Alcantar and Bianca Lepe.

    “Artificial intelligence is wave that is just beginning to impact science and industry, and has incredible potential for helping to solve intractable problems. The breakthroughs described in these studies demonstrate the power of melding computation with synthetic biology at the bench to develop new and more powerful bioinspired technologies, in addition to leading to new insights into fundamental mechanisms of biological control,” said Don Ingber, M.D., Ph.D., the Wyss Institute’s Founding Director. Ingber is also the Judah Folkman Professor of Vascular Biology at Harvard Medical School and the Vascular Biology Program at Boston Children’s Hospital, as well as Professor of Bioengineering at Harvard’s John A. Paulson School of Engineering and Applied Sciences.

    References:

    “A deep learning approach to programmable RNA switches” by Nicolaas M. Angenent-Mari, Alexander S. Garruss, Luis R. Soenksen, George Church and James J. Collins, 7 October 2020, Nature Communications.
    DOI: 10.1038/s41467-020-18677-1

    “Sequence-to-function deep learning frameworks for engineered riboregulators” by Jacqueline A. Valeri, Katherine M. Collins, Pradeep Ramesh, Miguel A. Alcantar, Bianca A. Lepe, Timothy K. Lu and Diogo M. Camacho, 7 October 2020, Nature Communications.
    DOI: 10.1038/s41467-020-18676-2

    This work was supported by the DARPA Synergistic Discovery and Design program, the Defense Threat Reduction Agency, the Paul G. Allen Frontiers Group, the Wyss Institute for Biologically Inspired Engineering, Harvard University, the Institute for Medical Engineering and Science, the Massachusetts Institute of Technology, the National Science Foundation, the National Human Genome Research Institute, the Department of Energy, the National Institutes of Health, and a CONACyT grant.

    Never miss a breakthrough: Join the SciTechDaily newsletter.
    Follow us on Google and Google News.

    Artificial Intelligence Bioinformatics Biotechnology DARPA Harvard University Machine Learning Wyss Institute
    Share. Facebook Twitter Pinterest LinkedIn Email Reddit

    Related Articles

    AI Decodes the Secret Language of Your Gut Bacteria

    AI Cracks Secret Language of Sticky Proteins Linked to Alzheimer’s

    Nanotechnology Enables 3D Visualization of Crucial RNA Structures at Near-Atomic Resolution

    Engineering a New Toehold for RNA Therapeutics, Cell Therapies, and Diagnostics

    DeepMind Releases Accurate Picture of the Human Proteome – “The Most Significant Contribution AI Has Made to Advancing Scientific Knowledge to Date”

    Artificial Intelligence Discovers Surprising Patterns in Earth’s Biological Mass Extinctions

    Major Scientific Advance: DeepMind AI AlphaFold Solves 50-Year-Old Grand Challenge of Protein Structure Prediction

    Oscillating Genetic Circuit: A Reliable Clock for Your Microbiome

    SWIFT 3D Bioprinting Leads Way Towards Artificially Grown Human Organs [Video]

    2 Comments

    1. Patrycja on October 15, 2020 12:29 am

      Hi! Great article! Thank you for your work and for spreading knowledge about Machine Learning. I just wanted to let you know that we included this piece in the Weekly Roundup on our blog neptune.ai/blog. Cheers!

      Reply
    2. Peter M Foster on March 4, 2021 5:37 am

      You know how many nanometers there are in an atom of helium, not one. The radius of the structure is 31 picometers with a beta of 1, so you see the basis of useless educational hypocrisies that steel our families well being and pad the pockets of the people who call themselves big shots with more than a million each year. And laugh early our attempts to reason the inaccurate construction of things that we expected to be so as we were required to present on the exams so that they say ha…here it is you accepted it… these cunning linguists with curcumsized lips have found the reason for the sphincter stinking quick. For the fission factories vent gas to our homes and offices as they prove now as tritium streams own the river called Hudson through the edge of the Atlantic trench… the gall of a tidal river if you catch ny drift its called the famous river with a certain lisp. The call of those who fight injustice at every turn without expecting reward but for so it is right, is the consequence even of giving ones life is what we the people of the United states call being an American who knows it is our civic duty to help those in need not turn up the gas… assiduity which pictures the brotherhood of life… for where would we be without those who met us at the shores of the free and the brave people whose families were here also… looking for the same light from the love one sees in the eyes of their children who love the beauty of life being explored with the love of the Lord who knows what it means to be loving the present and the eternal universe still waiting for us to explore the same light. Proton to proton every which way but loose… ready for take off the baby imm strapped in…. nice… feeling the love, if a dream realized in the determined endurance to persevere. For there said my old friend is the reason we are here.. . Now you see… the strength of light shiring to bee right.

      Reply
    Leave A Reply Cancel Reply

    • Facebook
    • Twitter
    • Pinterest
    • YouTube

    Don't Miss a Discovery

    Subscribe for the Latest in Science & Tech!

    Trending News

    Breakthrough Bowel Cancer Trial Leaves Patients Cancer-Free for Nearly 3 Years

    Natural Compound Shows Powerful Potential Against Rheumatoid Arthritis

    100,000-Year-Old Neanderthal Fossils in Poland Reveal Unexpected Genetic Connections

    Simple “Gut Reset” May Prevent Weight Gain After Ozempic or Wegovy

    2.8 Days to Disaster: Scientists Warn Low Earth Orbit Could Suddenly Collapse

    Common Food Compound Shows Surprising Power Against Superbugs

    5 Simple Ways To Remember More and Forget Less

    The Atomic Gap That Could Cost the Semiconductor Industry Billions

    Follow SciTechDaily
    • Facebook
    • Twitter
    • YouTube
    • Pinterest
    • Newsletter
    • RSS
    SciTech News
    • Biology News
    • Chemistry News
    • Earth News
    • Health News
    • Physics News
    • Science News
    • Space News
    • Technology News
    Recent Posts
    • Scientists Just Made Carbon Capture Much Cheaper and Easier
    • Harvard Breakthrough Brings Powerful UV Light Sources Onto a Chip
    • This Strange Quantum “Dance” Could Rewrite Superconductivity
    • Scientists Make Breakthrough in Turning Plastic Trash Into Clean Fuel Using Sunlight
    • Scientists Complete Largest 3D Map of the Universe to Probe Dark Energy
    Copyright © 1998 - 2026 SciTechDaily. All Rights Reserved.
    • Science News
    • About
    • Contact
    • Editorial Board
    • Privacy Policy
    • Terms of Use

    Type above and press Enter to search. Press Esc to cancel.