Machine Learning Spots Treasure Trove of Elusive Viruses

Artificial intelligence could speed up metagenomic studies that look for species unknown to science. From a report: Researchers have used artificial intelligence (AI) to discover nearly 6,000 previously unknown species of virus. The work illustrates an emerging tool for exploring the enormous, largely unknown diversity of viruses on Earth. Although viruses influence everything from human health to the degradation of trash, they are hard to study. Scientists cannot grow most viruses in the lab, and attempts to identify their genetic sequences are often thwarted because their genomes are tiny and evolve fast.

For the latest study, Simon Roux, a computational biologist at the DOE Joint Genome Institute (JGI) in Walnut Creek, California, trained computers to identify the genetic sequences of viruses from one unusual family, Inoviridae. These viruses live in bacteria and alter their host’s behaviour: for instance, they make the bacteria that cause cholera, Vibrio cholerae, more toxic. But Roux, who presented his work at the meeting in San Francisco, California, organized by the JGI, estimates that fewer than 100 species had been identified before his research began. Roux presented a machine-learning algorithm with two sets of data — one containing 805 genomic sequences from known Inoviridae, and another with about 2,000 sequences from bacteria and other types of virus — so that the algorithm could find ways of distinguishing between them.