In a new study, a group of scientists detail how they fused the power of statistical physics and artificial intelligence into a mathematical toolkit that can turn cancer-mutation data into multidimensional models that show how specific mutations alter the social networks of proteins in cells.
While this may sound like the setup to some late-night nerd sketch, researchers have taken this premise and applied it to an increasingly cumbersome problem in modern biology, namely, finding meaning in the rising oceans of genomic data.
In this specific instance, the data comprises reams of cancer mutations that genome-wide studies are publishing at a dizzying rate. The challenge is finding new and efficient ways to parse the signal from the noise (and there is no shortage of noise).
As a new way to tackle this, a group of scientists have fused the power of statistical physics and artificial intelligence into a mathematical toolkit that can turn cancer-mutation data into multidimensional models that show how specific mutations alter the social networks of proteins in cells. From this they can deduce which mutations among the myriad mutations present in cancer cells might actually play a role in driving disease.
At the core of this new approach is an algorithm based on statistical mechanics, a branch of theoretical physics that describes large phenomena by predicting the macroscopic properties of its microscopic components.
“Here we have found that a fundamental concept in statistical mechanics, which many of us learned as undergraduates in theoretical physics courses and then largely forgot because it didn’t apply to our everyday lives as biologists, can be relevant to one of the most difficult problems in cancer genetics,” said Peter Sorger, the HMS Otto Krayer Professor of Systems Pharmacology and senior author on the paper.
These findings, which are among the first to be produced from the new Laboratory of Systems Pharmacology (LSP), are published November 2 online in Nature Genetics.
Dark Matter Matters
Many of the most widely studied cancer genes, such as P53 and Ras, were discovered after decades of work by many groups. But today, in the era of high throughput genomics, we have thousands of times more data from thousands of samples. As a result, the sheer volume of cataloged cancer mutations is vast. But not all mutations actually influence tumor behavior. Many appear to be along for the ride, so to speak, and are as a result called “passenger mutations.” In order to separate the drivers from the passengers, researchers typically use a kind of “polling” strategy in which they identify the most common mutations, reasoning that those are the significant ones. Only the most promising candidates are then subjected to the detailed and painstaking analysis that has been applied to P53 and Ras.
Mohammed AlQuraishi, an independent HMS Systems Biology fellow associated with the LSP and Sorger lab and lead author of the paper, reasoned that biologists were in dire need of much more biophysically rigorous tools for scouring this data. With a background in genetics, statistics, and physics, AlQuraishi realized that biologists can exploit the statistical power from live data sets and marry it to theoretical physics. “It’s the way that Silver and Feynman together would do it,” he joked.
Statistical mechanics is a precise physical description of how collections of individual molecules give rise to the macroscopic properties we perceive, such as temperature and pressure. AlQuraishi used its core principles as the basis for a platform that would analyze information housed in the Cancer Genome Atlas. As a result he was able to generate detailed schematics of how certain mutations altered the vast, complex cellular world of protein social networks—networks that largely determine a cell’s health, or lack thereof. In doing so, he stumbled upon a few unexpected findings.
Again, many cancer mutations are common, and many more cancer mutations are rare—some so rare that they only occur in a handful of patients. AlQuraishi found that common and rare mutations are equally likely to affect the protein network.
“Both kinds of mutations are equally strong,” he said. “In both cases, about one percent of the common and one percent of the rare mutations alter the tumor networks we studied. But rare mutations are being largely ignored. We need to start paying attention to them.”
For every common mutation, there are approximately four rare ones, so, based on numbers, rare mutations might be much more significant than previously suspected. “That’s where much of the action is, in the rare mutations. We’ve long considered this large universe of rare mutations to be dark matter, but here we have just found that all this dark matter actually matters.”
The researchers also found that mutations are not really the blunt force that they expected. Rather than knocking out an entire branch of a network, e.g., a neighborhood power outage, or inserting an entirely new character, i.e., a protein, mutations cause a subtle, almost surgically precise, altering of the communication pathway.
“From the perspective of the mutation, it is hard to be so precise,” said AlQuraishi. “But cancer can’t be too disruptive, or else it might die. It needs to fly under the radar. This subtle altering of networks achieves that objective. Drug companies can exploit this and possibly develop more targeted therapies.”
A final area that these findings address is the problem of reproducing published results in the scientific literature. Here, however, the researchers are able to use fundamental physical principles to process datasets from different laboratories (including their own) in a way that removes the false positives and enriches for the true positives. The model is therefore more accurate and reproducible than any single data set.
“We can clean up the experiments by only using data that both the model and experiments agree on,” said AlQuraishi.
“In general, much of the problem with irreproducibility in science is a problem of poor statistics,” said Sorger. “We addressed that directly here.”
This work was supported by the National Institutes of Health grants GM68762, GM107618 and GM072872.
Reference: “A multiscale statistical mechanical framework integrates biophysical and genomic data to assemble cancer networks” by Mohammed AlQuraishi, Grigoriy Koytiger, Anne Jenney, Gavin MacBeath and Peter K Sorger, 2 November 2014, Nature Genetics.