An international research team led by scientists at Georgetown University have demonstrated the power of artificial intelligence to predict which viruses could infect humans — like SARS-CoV-2, the virus that led to the COVID-19 pandemic — which animals host them, and where they could emerge.
Their ensemble of predictive models of likely reservoir hosts, published January 10 in Lancet Microbe (“Optimizing predictive models to prioritize viral discovery in zoonotic reservoirs”), was validated in an 18-month project to identify specific bat species likely to carry betacoronaviruses, the group that includes SARS-like viruses.
“If you want to find these viruses, you have to start by profiling their hosts — their ecology, their evolution, even the shape of their wings,” explains the study’s senior author, Colin Carlson, PhD, an assistant research professor in the Department of Microbiology & Immunology and a member of Georgetown’s Center for Global Health Science and Security at Georgetown University Medical Center. “Artificial intelligence lets us take data on bats and turn it into concrete predictions: where should we be looking for the next SARS?”
Despite global investments in disease surveillance, it remains difficult to identify and monitor wildlife reservoirs of viruses that could someday infect humans. Statistical models are increasingly being used to prioritize which wildlife species to sample in the field, but the predictions being generated from any one model can be highly uncertain. Scientists also rarely track the success or failure of their predictions after they make them, making it hard to learn and make better models in the future. Together, these limitations mean that there is high uncertainty in which models may be best suited to the task.
This new study suggests that the search for closely-related viruses could be non-trivial, with over 400 bat species around the world predicted to host betacoronaviruses, a large group of viruses that includes those responsible for SARS-CoV (the virus that caused the 2002-2004 outbreak of SARS) and SARS-CoV-2 (the virus that causes COVID-19). Although the origin of SARS-CoV-2 remains uncertain, the spillover of other viruses from bats is a growing problem due to factors like agricultural expansion and climate change.
Greg Albery, PhD, a postdoctoral fellow in Georgetown’s Biology Department, says COVID-19 provided the impetus to expedite their research. “This is a really rare opportunity,” explains Albery. “Outside of a pandemic, we’d never learn this much about these viruses in this small a timeframe. A decade of research has been collapsed into about a year of publications, and it means we can actually show that these tools work.”
In the first quarter of 2020, the researcher team trained eight different statistical models that predicted which kinds of animals could host betacoronaviruses. Over more than a year, the team then tracked discovery of 40 new bat hosts of betacoronaviruses to validate initial predictions and dynamically update their models. The researchers found that models harnessing data on bat ecology and evolution performed extremely well at predicting new hosts. In contrast, cutting-edge models from network science that used high-level mathematics – but less biological data – performed roughly as well or worse than expected at random.
“One of the most important things our study gives us is a data-driven shortlist of which bat species should be studied further,” says Daniel Becker, PhD, assistant professor of biology at the University of Oklahoma. “After identifying these likely hosts, the next step is then to invest in monitoring to understand where and when betacoronaviruses are likely to spill over.”
Carlson says that the team is now working with other scientists around the world to test bat samples for coronaviruses based on their predictions.
“If we spend less money, resources, and time looking for these viruses, we can put all of those resources into the things that actually save lives down the road. We can invest in building universal vaccines to target those viruses, or monitoring for spillover in people that live near bats,” says Carlson. “It’s a win-win for science and public health.”
Reference: “Optimising predictive models to prioritise viral discovery in zoonotic reservoirs” by Daniel J Becker, PhD; Gregory F Albery, PhD; Anna R Sjodin, PhD; Timothée Poisot, PhD; Laura M Bergner, PhD; Binqi Chen; Lily E Cohen, MPhil; Tad A Dallas, PhD; Evan A Eskew, PhD; Anna C Fagre, DVM; Maxwell J Farrell, PhD; Sarah Guth, BA; Barbara A Han, PhD; Nancy B Simmons, PhD; Michiel Stock, PhD; Emma C Teeling, PhD and Colin J Carlson, PhD, 10 January 2022, The Lancet Microbe.
Additional study authors also included collaborators from the University of Idaho, Louisiana State University, University of California Berkeley, Colorado State University, Pacific Lutheran University, Icahn School of Medicine at Mount Sinai, University of Glasgow, Université de Montréal, University of Toronto, Ghent University, University College Dublin, Cary Institute of Ecosystem Studies, and the American Museum of Natural History.
The authors are a part of the Viral Emergence Research Initiative (VERENA) consortium, which curates the largest ecosystem of open data in viral ecology, and builds tools to help predict which viruses could infect humans, which animals host them, and where they could someday emerge. Carlson and Albery are co-founders.
The authors report having no personal financial interests related to the study. Support for VERENA is provided by L’Institut de Valorisation de Donne´es through the Universite´ de Montreal and by US National Science Foundation (BII 2021909). Additional funding for the study was provided the Wellcome Trust and the Research Foundation, the Flemish Government under the Onderzoeksprogramma Artificie¨le Intelligentie Vlaanderen program.