Close Menu
    Facebook X (Twitter) Instagram
    SciTechDaily
    • Biology
    • Chemistry
    • Earth
    • Health
    • Physics
    • Science
    • Space
    • Technology
    Facebook X (Twitter) Pinterest YouTube RSS
    SciTechDaily
    Home»Chemistry»MIT Researchers Propose a New Way To Create Synthesizable Molecules
    Chemistry

    MIT Researchers Propose a New Way To Create Synthesizable Molecules

    By Lauren Hinkel, Massachusetts Institute of TechnologyApril 7, 2022No Comments6 Mins Read
    Facebook Twitter Pinterest Telegram LinkedIn WhatsApp Email Reddit
    Share
    Facebook Twitter LinkedIn Pinterest Telegram Email Reddit
    Generative Model Graph Grammar
    MIT and IBM researchers have use a generative model with a graph grammar to create new molecules belonging to the same class of compound as the training set.

    An efficient machine-learning method uses chemical knowledge to create a learnable grammar with production rules to build synthesizable monomers and polymers.

    Chemical engineers and materials scientists are constantly looking for the next revolutionary material, chemical, and drug. The rise of machine-learning approaches is expediting the discovery process, which could otherwise take years. “Ideally, the goal is to train a machine-learning model on a few existing chemical samples and then allow it to produce as many manufacturable molecules of the same class as possible, with predictable physical properties,” says Wojciech Matusik, professor of electrical engineering and computer science at MIT. “If you have all these components, you can build new molecules with optimal properties, and you also know how to synthesize them. That’s the overall vision that people in that space want to achieve”

    However, current techniques, mainly deep learning, require extensive datasets for training models, and many class-specific chemical datasets contain a handful of example compounds, limiting their ability to generalize and generate physical molecules that could be created in the real world.

    Now, a new paper from researchers at MIT and IBM tackles this problem using a generative graph model to build new synthesizable molecules within the same chemical class as their training data. To do this, they treat the formation of atoms and chemical bonds as a graph and develop a graph grammar — a linguistics analogy of systems and structures for word ordering — that contains a sequence of rules for building molecules, such as monomers and polymers. Using the grammar and production rules that were inferred from the training set, the model can not only reverse engineer its examples, but can create new compounds in a systematic and data-efficient way. “We basically built a language for creating molecules,” says Matusik “This grammar essentially is the generative model.”

    How Graph Grammar Simplifies Molecule Generation

    Matusik’s co-authors include MIT graduate students Minghao Guo, who is the lead author, and Beichen Li as well as Veronika Thost, Payal Das, and Jie Chen, research staff members with IBM Research. Matusik, Thost, and Chen are affiliated with the MIT-IBM Watson AI Lab. Their method, which they’ve called data-efficient graph grammar (DEG), will be presented at the International Conference on Learning Representations.

    “We want to use this grammar representation for monomer and polymer generation, because this grammar is explainable and expressive,” says Guo. “With only a few number of the production rules, we can generate many kinds of structures.”

    A molecular structure can be thought of as a symbolic representation in a graph — a string of atoms (nodes) joined together by chemical bonds (edges). In this method, the researchers allow the model to take the chemical structure and collapse a substructure of the molecule down to one node; this may be two atoms connected by a bond, a short sequence of bonded atoms, or a ring of atoms. This is done repeatedly, creating the production rules as it goes, until a single node remains. The rules and grammar then could be applied in the reverse order to recreate the training set from scratch or combined in different combinations to produce new molecules of the same chemical class.

    “Existing graph generation methods would produce one node or one edge sequentially at a time, but we are looking at higher-level structures and, specifically, exploiting chemistry knowledge, so that we don’t treat the individual atoms and bonds as the unit. This simplifies the generation process and also makes it more data-efficient to learn,” says Chen.

    Optimizing Grammar for Synthesizable Molecules

    Further, the researchers optimized the technique so that the bottom-up grammar was relatively simple and straightforward, such that it fabricated molecules that could be made.

    “If we switch the order of applying these production rules, we would get another molecule; what’s more, we can enumerate all the possibilities and generate tons of them,” says Chen. “Some of these molecules are valid and some of them not, so the learning of the grammar itself is actually to figure out a minimal collection of production rules, such that the percentage of molecules that can actually be synthesized is maximized.” While the researchers concentrated on three training sets of less than 33 samples each — acrylates, chain extenders, and isocyanates — they note that the process could be applied to any chemical class.

    To see how their method performed, the researchers tested DEG against other state-of-the-art models and techniques, looking at percentages of chemically valid and unique molecules, diversity of those created, success rate of retrosynthesis, and percentage of molecules belonging to the training data’s monomer class.

    “We clearly show that, for the synthesizability and membership, our algorithm outperforms all the existing methods by a very large margin, while it’s comparable for some other widely-used metrics,” says Guo. Further, “what is amazing about our algorithm is that we only need about 0.15 percent of the original dataset to achieve very similar results compared to state-of-the-art approaches that train on tens of thousands of samples. Our algorithm can specifically handle the problem of data sparsity.”

    Future Applications of Data-Efficient Graph Grammar

    In the immediate future, the team plans to address scaling up this grammar learning process to be able to generate large graphs, as well as produce and identify chemicals with desired properties.

    Down the road, the researchers see many applications for the DEG method, as it’s adaptable beyond generating new chemical structures, the team points out. A graph is a very flexible representation, and many entities can be symbolized in this form — robots, vehicles, buildings, and electronic circuits, for example. “Essentially, our goal is to build up our grammar, so that our graphic representation can be widely used across many different domains,” says Guo, as “DEG can automate the design of novel entities and structures,” says Chen.

    Reference: “Data-Efficient Graph Grammar Learning for Molecular Generation” by Minghao Guo, Veronika Thost, Beichen Li, Payel Das, Jie Chen and Wojciech Matusik, 28 September 2021, ICLR 2022 Conference.
    OpenReview

    This research was supported, in part, by the MIT-IBM Watson AI Lab and Evonik.

    Never miss a breakthrough: Join the SciTechDaily newsletter.
    Follow us on Google and Google News.

    Artificial Intelligence Chemical Engineering Machine Learning MIT Molecules Polymers
    Share. Facebook Twitter Pinterest LinkedIn Email Reddit

    Related Articles

    Machine Learning Meets Chemistry: New MIT Model Predicts Transition States With Unprecedented Speed

    A Smarter Way To Develop New Drugs Using Artificial Intelligence

    Plastic-Eating Enzyme Could Supercharge Recycling and Eliminate Billions of Tons of Landfill Waste

    Accelerating Development of New Medicines: Artificial Intelligence System Rapidly Predicts How Proteins Will Attach

    Machine Learning Using Collective Knowledge to Crack the Oxidation States of Crystal Structures

    DeepBAR: Faster Drug Discovery Through Machine Learning

    Scientists to Communicate Polymers More Easily With New Notation System

    Robotic Platform Powered by AI Automates Molecule Production

    MIT Researchers Develop New Strategy for Stronger Polymers

    Leave A Reply Cancel Reply

    • Facebook
    • Twitter
    • Pinterest
    • YouTube

    Don't Miss a Discovery

    Subscribe for the Latest in Science & Tech!

    Trending News

    Scientists Discover How Coffee Impacts Memory, Mood, and Gut Health

    Why Did the Neanderthals Disappear? Scientists Reveal Humans Had a Hidden Advantage

    Physicists Propose Strange Experiment Where Time Goes Quantum

    Magnesium Magic: New Drug Melts Fat Even on a High-Fat, High-Sugar Diet

    Weight-Loss Drugs Like Ozempic May Come With an Unexpected Cost

    Mezcal “Worm” in a Bottle Mystery: DNA Testing Reveals a Surprise

    New Research Reveals That Your Morning Coffee Activates an Ancient Longevity Switch

    This Is What Makes You Irresistible to Mosquitoes

    Follow SciTechDaily
    • Facebook
    • Twitter
    • YouTube
    • Pinterest
    • Newsletter
    • RSS
    SciTech News
    • Biology News
    • Chemistry News
    • Earth News
    • Health News
    • Physics News
    • Science News
    • Space News
    • Technology News
    Recent Posts
    • Quantum Breakthrough Turns Simple Forces Into Powerful New Interactions
    • Blue Origin’s New Moon Lander Passes a Crucial Test for NASA Missions
    • NASA Fires Up Record-Breaking Plasma Thruster for Future Mars Missions
    • This Popular Supplement May Boost Your Brain, Not Just Your Muscles
    • What Happened in Childhood Could Be Causing Your Gut Issues Today
    Copyright © 1998 - 2026 SciTechDaily. All Rights Reserved.
    • Science News
    • About
    • Contact
    • Editorial Board
    • Privacy Policy
    • Terms of Use

    Type above and press Enter to search. Press Esc to cancel.