Sequencing Key Molecules in Just Minutes Instead of Years

Heparan Sulfate — A nanopore and image recognition software can sequence a sulfated glycosaminoglycan in real-time. Credit: Rensselaer Polytechnic Institute

Research Demonstrates the Potential for Rapid, Accurate Glycan Sequencing

Researchers have demonstrated the potential of using a nanopore to reduce the time required for sequencing a glycosaminoglycan from years to minutes. A glycosaminoglycan is a class of long chain-linked sugar molecules as important to our biology as DNA.

As published this week in the Proceedings of the National Academies of Sciences, a team from Rensselaer Polytechnic Institute showed that machine-learning and image recognition software could be used to quickly and accurately identify sugar chains — specifically, four synthetic heparan sulfates — based on the electrical signals generated as they passed through a tiny hole in a crystal wafer.

Teaching Machines to Read the Language of Sugars

“Glycosaminoglycans are a complex repertoire of sequences, as the work of Shakespeare or a poem by Yates is a complex collection of letters. It takes an expert to write them and an expert to read them,” said Robert Linhardt, lead researcher and a professor of chemistry and chemical biology at Rensselaer Polytechnic Institute. “We’ve trained a machine to quickly read the equivalent of words with four letters like ‘ababab’ or ‘bcbcbc.’ These are simple sequences that don’t have any meaning, but they show us that the machine can be taught to read. If we extend and develop this technology, it has the potential to sequence the glycans or even proteins in real-time, eliminating years of effort.”

Commercial nanopore sequencing devices are used to sequence DNA, which is composed of four nucleic acid units, known by the letters A, C, G, and T, strung together in an endless variety of configurations. The device relies on an ionic current running through a hole only a few billionths of a meter wide in a membrane. Strands of DNA are placed on one side of the hole, and drawn through with the flow of the current. Each nucleic acid blocks the hole somewhat as it passes through, disrupting the current and yielding a particular signal associated with that nucleic acid. The devices, currently used for fieldwork, are only one of several relatively rapid and automated techniques for sequencing DNA.

Glycosaminoglycans, or GAGs, are a structurally complex class of glycans — the essential sugars present in living organisms — found on the cell surfaces and extracellular matrix of all animals and perform many functions in cell growth and signaling, anticoagulation and wound repair, and maintaining cell adhesion. GAGs, currently extracted from slaughtered animals, are used as drugs and nutraceuticals.

Like DNA, GAGs can be subdivided into their constituent disaccharide sugar units. But whereas DNA is made of only four letters in a linear string, these glycans have dozens of basic units, some with attached sulfate groups, acid groups, and amide groups. For example, even a relatively small naturally occurring heparan sulfate molecule of six sugar units could have 32,768 possible sequences. Because of the challenge, glycan-sequencing remains onerous, relying on painstaking lab work and sophisticated analysis, involving techniques with names like liquid chromatography-tandem mass spectrometry and nuclear magnetic resonance spectroscopy.

As part of his work, Linhardt, a glycans expert who developed a synthetic variant of the common blood-thinner heparin, sequences GAGs to understand naturally occurring forms and develop synthetic variants.

“Using standard analytical methods, it took us two years to sequence the first simple GAG,” said Linhardt, a member of the Rensselaer Center for Biotechnology and Interdisciplinary Studies. “We have another one that we’ve worked most of the sequence out, and it’s taken us five-plus years — and it’s probably going to take us another five years to finish it,”

Building a Simplified Model for Complex Sugar Chains

Reasoning that nanopore sequencing could be used to identify the disaccharide units in a GAG, the research team built its own nanopore device and synthesized four heparan sulfate GAG chains using the chemoenzymatic process developed by the Linhardt Lab. Importantly, these four heparan sulfates were very simple — made with combinations of only four different types of sugar units, assembled in a chain about 40 units long, and with a carefully controlled composition and sequence.

The team passed each heparan sulfate through the nanopore and produced a graph depicting the voltage over time output of the device. Each of the four variants was run through the device more than 2,000 times, increasing the statistical likelihood of an accurate read given the rudimentary design of the experimental nanopore.

“The device sequenced the simplest heparan sulfate in real-time and produced a pattern that our eyes could easily recognize right away for each of the four samples,” Linhardt said. “You can immediately tell that they’re different.”

Machine Learning Adds Precision and Accuracy

To ensure an unbiased analysis, the team fed the results into free machine-learning and image recognition software using Google’s deep neural network, training the software to distinguish between the four different patterns and identify each variant of heparan sulfate. The most successful machine-learning model produced an analysis that was nearly 97% accurate.

“The information content in a GAG sequence can greatly surpass that of a similar quantity of DNA or RNA, meaning that the ability to rapidly read GAG sequences opens up a new window of understanding into the complex biochemistry of life,” said Curt Breneman, dean of the Rensselaer School of Science. “This proof of concept study links innovative nano-detection methods with state-of-the-art machine learning tools, and shows the power of interdisciplinary thinking to expand the frontiers of knowledge.”

Reducing the speed at which the GAGs pass through the nanopore could increase accuracy, and the device can be trained on additional sugar units, and more complex sequences, all of which are future research goals. Linhardt said the machine would have to learn somewhere from 10 to 20 sugar units to fully sequence a GAG.

“This is a proof of concept; we’ve made it read two-letter words,” Linhardt said. “Once we teach it the full alphabet, it will be able to read every different sequence. It’ll be able to read all the words.”

Reference: “Synthetic heparan sulfate standards and machine learning facilitate the development of solid-state nanopore analysis” by Ke Xia, James T. Hagan, Li Fu, Brian S. Sheetz, Somdatta Bhattacharya, Fuming Zhang, Jason R. Dwyer and Robert J. Linhardt, 9 March 2021, Proceedings of the National Academy of Sciences.
DOI: 10.1073/pnas.2022806118

This study was published with support from the National Science Foundation and the National Institutes of Health. At Rensselaer, Linhardt was joined in the research by Ke Xia, James T. Hagan, Li Fu, Brian S. Sheetz, Somdatta Bhattacharya, Fuming Zhang, and Jason R. Dwyer.

Never miss a breakthrough: Join the SciTechDaily newsletter.
Follow us on Google and Google News.

Sequencing Key Molecules in Just Minutes Instead of Years

MeniFluidics: New Technique for Engineering Living Materials

Tube Worm Slime Displays Self-Powered, Long-Lasting Bioluminescent Glow

A Small Answer to One of the Biggest Problems on the Planet

Stanford Scientists Genetically Reprogram Cells to Build Artificial Structures

Programmable DNA Technique ‘Prints’ Cells to Create Diverse Biological Environments

Artificial Intelligence Composes New Proteins – The Building Blocks of Life – as Musical Scores

New Protein Design Technique Could Streamline Drug Creation

3D-Printing Living Skin With Blood Vessels Included [Video]

Advancing Biomanufacturing by Uncovering the Mechanisms Behind Magnetogenetics

Two Drinks a Day May Be Riskier Than Many Americans Think

A Lost Human Lineage May Have Left a Genetic Legacy in People Today

Study Reveals a Surprising Link Between Birth Control Pills and Binge Eating

NASA’s HiRISE Captures Perseverance Rover Completing a Marathon on Mars

Ancient DNA Reveals the Hidden Origins of China’s Mysterious Shimao Civilization

Scientists Discover a Surprising Link Between Sleep, Genes, and Alzheimer’s

Popular Childhood Drinks Linked to Higher Blood Pressure Later in Life

Scientists Just Challenged a 70-Year-Old Myth About the Human Brain

Sequencing Key Molecules in Just Minutes Instead of Years

Research Demonstrates the Potential for Rapid, Accurate Glycan Sequencing

Teaching Machines to Read the Language of Sugars

Building a Simplified Model for Complex Sugar Chains

Machine Learning Adds Precision and Accuracy

Related Articles