Highlights

In brief

The VarNet deep learning system transforms raw DNA data from tumour and matched normal samples into image-like representations, enabling easier associations between specific DNA data patterns and somatic mutations.

© Shutterstock

A deep dive into cancer’s big data

1 Dec 2023

A new deep learning method outperforms traditional methods for identifying genetic mutations from DNA sequences and can be a valuable tool for improved cancer diagnostics.

Deep within the genomes of cancer cells lie subtle clues to their malignant origins. Somatic variants are genetic alterations that point to DNA replication errors or exposure to carcinogens and are known to contribute to the development of tumours.

However, using automated platforms to find these genetic fingerprints of cancer in DNA sequences from patient samples has, until now, been exceedingly difficult as tumours are often highly heterogenous and DNA sequencing is prone to errors.

Anders Skanderup, a Group Leader from A*STAR’s Genome Institute of Singapore (GIS), said that breakthroughs in machine learning combined with the availability of large, multidimensional training datasets can help realise the full potential of diagnostic technologies powered by artificial intelligence.

“The ability to generate and use large scale next-generation sequencing data of cancer genomes can enable the training of large deep learning models,” said Skanderup.

Using this approach, Skanderup worked with first author Kiran Krishnamachari and colleagues to develop a deep learning system designed to detect somatic variants in tumours called VarNet. The platform was trained using 4.6 million high-confidence somatic variants found in 356 tumour genomes spanning seven cancer types.

The team built VarNet using ground-truth labels with an ensemble method which enabled it to recognise genetic mutations in unlabelled genetic data. “While there are many cancer sequencing datasets available, they do not contain ground-truth mutation labels that can be used to train large models,” explained Skanderup, adding that they overcame the challenge using scale and weak supervision.

They also devised two distinct deep learning models to identify single letter DNA changes (single nucleotide variants) and insertions or deletions to the DNA code (indels). Finally, the system was engineered to generate image-like representations of mutation sites which allowed VarNet to better ‘see’ mutations and make mutation probability predictions at each site.

Prior machine learning platforms tended to struggle with ‘low purity’ tumour samples containing healthy tissues that can make it harder to distinguish somatic variants. However, validation tests proved that VarNet’s performance often exceeded current state-of-the-art methods in these challenging scenarios.

“VarNet was shown to be more accurate than existing systems in benchmarks of low-tumour-purity settings, which improves its potential for practical use,” Skanderup remarked, adding that the platform was specifically designed to mimic human experts who would use visualisations of sequencing data to make side-by-side comparisons of normal and tumour samples.

VarNet’s unprecedented accuracy can be a game-changer both in research and commercial settings, said Skanderup, who suggested that it can enhance specialised mutation detection technologies often used by medical diagnostic companies.

The A*STAR-affiliated researchers contributing to this research are from the Genome Institute of Singapore (GIS).

Want to stay up to date with breakthroughs from A*STAR? Follow us on Twitter and LinkedIn!

References

Krishnamachari, K., Lu, D., Swift-Scott, A., Yeraliyev, A., Lee, K., et al. Accurate somatic variant detection using weakly supervised deep learning. Nature Communications 13, 4248 (2022). | article

About the Researcher

View articles

Anders Jacobsen Skanderup

Principal Investigator and Group Leader

Genome Institute of Singapore (GIS)
Anders Jacobsen Skanderup is a Group Leader at A*STAR’s Genome Institute of Singapore. He holds adjunct positions at the Department of Computer Science at National University of Singapore as well as the National Cancer Centre Singapore. His group is interested in computational and data-driven approaches to decipher the molecular basis of cancer and improve treatments.

This article was made for A*STAR Research by Wildtype Media Group