Viruses have an uncanny ability to rapidly evolve. Covid-19 is a stark example. As the virus mutated from beta to delta to omicron, the pandemic dragged on and the world shut down. Scientists scrambled to adapt vaccines and treatments to new variants. The virus had the upper hand; we were playing catch-up.
An AI developed by Harvard University could turn the tide by allowing us to predict new variants before they arrive. Called EVEscape, the AI is a kind of machine “oracle” for viral evolution.
Trained on data collected before the pandemic, the algorithm was able to predict frequent mutations and troubling variants for Covid-19 and generated a list of future concerning variants too. The heart of the tool is a generative AI model, like the ones powering DALL-E or ChatGPT, but it includes several carefully selected biological factors to better reflect viral mutations.
The tool wasn’t built for Covid-19 only: It also accurately predicts variants for flu viruses, HIV, and two understudied viruses that could spark future pandemics.
“We want to know if we can anticipate the variation in viruses and forecast new variants,” said Dr. Debora Marks, who led the study at the Blavatnik Institute at Harvard Medical School. “Because if we can, that’s going to be extremely important for designing vaccines and therapies.”
There was a strong push to use AI to predict viral mutations during the acute phases of the pandemic. While useful, most models relied on information about existing variants and could only produce short-term predictions.
EVEscape, in contrast, uses evolutionary genomics to peek into a virus’s ancestry, resulting in longer forecasts and, potentially, enough time to plan ahead and fight back.
“We want to figure out how we can actually design vaccines and therapies that are future-proof,” said study author Dr. Noor Youssef.
Evolved to Evolve
Though viruses are extremely adaptable to the pressures of natural selection, they still evolve like other living creatures. Their genetic material randomly mutates. Some mutations decrease their ability to infect hosts. Others kill their hosts before they can multiply. But sometimes, viruses stumble across a Goldilocks variant, one that keeps the host healthy enough for the bug to reproduce and spread like wildfire. While great for the survival of viruses, these variants spark global catastrophes for humanity, as in the case of Covid-19.
Scientists have long sought to predict viral mutations and their effects. Unfortunately, it’s impossible to predict all possible mutations. A typical coronavirus has roughly 30,000 genetic letters. The number of potential variants is greater than all the elementary particles—that is, electrons, quarks, and other fundamental particles—in the universe.
The new study zoomed in on a more practical solution. Forget mapping each variant. With limited data, can we at least predict the dangerous ones?
Let’s Play Villain
The team turned to EVE, an AI previously developed to hunt down disease-causing genetic variants in humans. At the algorithm’s core is a deep generative model that can predict protein function without solely relying on human expertise.
The AI learned from evolution. Like archeologists comparing skeletons from hominin cousins to peek into the past, the AI screened DNA sequences encoding proteins across species. The strategy turned up genetic variants in humans critical for health—for example, those implicated in cancer or heart problems.
“You can use these generative models to learn amazing things from evolutionary information—the data have hidden secrets that you can reveal,” said Marks.
The new study retrained EVE to predict concerning genetic variants in viruses. They used SARS-CoV-2, the virus behind Covid-19, as a first proof of concept.
The key was integrating the virus’s biological needs into the AI’s data set.
A virus’s core drive is survival. They rapidly mutate, which sometimes leads to genetic changes that can dodge vaccines or antibody treatments. However, the same mutation may damage a virus’s ability to grasp onto its host and reproduce—an obvious disadvantage.
To rule out these kinds of mutations, the AI compared protein sequences from a broad range of coronaviruses discovered before the pandemic—the original SARS virus, for example, and the “common cold” virus. This comparison revealed which parts of the viral genome are conserved. These genetic stewards are foundational to the virus’s survival. Because other coronaviruses and SARS-CoV-2 share a common genetic ancestry, mutations to these genes likely result in death rather than viable variants.
By contrast, the AI predicted spike proteins to be the flexible component of the virus mostly likely to evolve. Dotted along the virus’s surface, these proteins are already targets for vaccines and antibody therapies. Changes to these proteins could lower the efficacy of current therapies.
Back to the Future
Hindsight is 20/20 when analyzing a pandemic. But having a glimpse of what may come—rather than trying to play catch-up—is essential if we’re to nip the next pandemic in the bud.
To test the AI’s predictive powers, the team matched its predictions to the GISAID (Global Initiative on Sharing All Influenza Data) database to gauge their accuracy. Despite its name, the database contains 750,000 unique sequences of coronavirus genetic sequences.
EVEscape identified variants most likely to spread—like delta and omicron, for instance—with 50 percent of its top predictions seen during the pandemic as of May 2023. When pitted against a previous machine learning method, EVEscape was twice as good at predicting mutations and forecasting which variants were most likely to escape from antibody treatments.
Remembering the Past
EVEscape’s superpower is that it can be used with other viruses. Covid has dominated our attention for the past three years. But lesser-known viruses lurk in silence. Lassa and Nipah viruses, for example, sporadically break out in West African and Southwest Asian countries and have pandemic potential. The viruses can be treated with antibodies, but they rapidly mutate.
Using EVEscape, the team predicted escape mutations in these viruses, including those already known to evade antibodies.
Combining evolutionary genetics and AI, the work shows that “the key to future success relies on remembering the past,” said Drs. Nash D. Rochman and Eugene V. Koonin at the National Center for Biotechnology Information and National Library of Medicine in Maryland, who were not involved in the study.
EVEscape has the power to predict future variants of viruses—even those yet unknown. It could estimate the risk of a pandemic, potentially keeping us one step ahead the next outbreak.
The team is now using the tool to predict the next SARS-CoV-2 variant. They track mutations biweekly and rank each variant’s potential for triggering another Covid wave. The data is shared with the World Health Organization and the code is openly available.
To Rochman and Koonin, the new AI toolkit could help thwart the next pandemic. We can now hope “COVID-19 will forever remain known as the most disruptive pandemic in human history,” they wrote.
Image Credit: A SARS-CoV2 virus particle / National Institute of Allergy and Infectious Diseases, NIH