In brief

Using a computational pangenome analysis, researchers identified a softcore genome of ~3,000 gene families shared by 95 percent of E. coli strains, and antibacterial tailocin-encoding genes unique to pandemic strain ST131.

© Unsplash

Bacterial superweapons exposed

27 Jul 2023

By exploring the genetic diversity within Escherichia coli bacteria, researchers uncover distinct gene families and a molecular weapon tied to disease-causing strains.

Few microbes are as well-known to science as Escherichia coli. Once considered a harmless bacterial resident of animal and human guts, we now know E. coli as a vast collection of over 700 strains, some of which pose threats to human health. If you’ve ever had a severe bout of food poisoning, chances are you’ve met a pathogenic member of the species.

According to Frank Eisenhaber, a Senior Fellow at A*STAR’s Genome Institute of Singapore (GIS), the secrets behind an E. coli strain’s disease-causing potential lie hidden in its genome, which holds 4,000 to 5,500 genes. While researchers have historically relied on a ‘typical’ E. coli's genome sequence to represent the species, this can mask subtle differences between benign and disease-causing strains.

"E. coli strains have enormous genomic and mutational diversity; only a few hundred gene families are shared among all of them,” said Eisenhaber. “A single reference genome can’t completely represent that diversity.”

To create a more comprehensive reference, Eisenhaber and colleagues from GIS and A*STAR’s Bioinformatics Institute (BII) built an E. coli pangenome using computational tools and the publicly available sequences of 1,324 complete strain genomes. The team created a systematic map of over 25,000 E. coli gene families, unlocking new insights on the evolutionary history, adaptability and functional diversity of the species.

“To date, our E. coli pangenome study is by far the largest in terms of the number of complete E. coli genomes included,” said Eisenhaber.

The distribution of 1,324 sequenced E. coli genomes across eight E. coli phylogroups, showing the proportion of genomes in each phylogroup that fell into one of four virulence categories. Based on the total number of virulence factors (VF) identified in each genome, the team classified them as non-pathogenic (<6 VFs); likely virulent (6 to 14 VFs); highly virulent (14–22 VFs) or very highly virulent (22< VFs).

©️ A*STAR Research

The team found that a set of around 3,000 gene families made up a stable ‘softcore’ genome: one shared by at least 95 percent of E. coli strains. There were also three divergent groups of strains (phylogroups) with distinct genetic profiles—B1, B2 and E—which had acquired specialised functions. For example, phylogroup B2 had multiple genes to efficiently acquire iron, an important nutrient for survival.

Curiously, the team also noticed that the ST131 strain from phylogroup B2 had viral DNA integrated into its genome, allowing it to produce tailocin: a distinctive protein structure used by bacteriophages to ‘pop’ bacterial membranes. This suggests that ST131's rising dominance in global disease outbreaks may be partly due to this uniquely lethal weapon, which can kill other closely-related bacterial neighbours in a host.

These results present exciting new angles that challenge long-held beliefs about bacterial virulence. “So far, virulence factors were almost exclusively seen as tools for undermining host defences. The interbacterial competition for access to the host was never in the spotlight,” said Eisenhaber. He added that these findings can open up a new possibility: the engineering of ‘good’ bacteria to safely destroy ‘bad’ bacteria as an alternative to antibiotics.

With help from Lars Jensen of the University of Copenhagen, Denmark, Eisenhaber’s team published a follow-up paper that mapped the existing literature on E. coli gene families and biomolecular functions to their pangenome, revealing that many of its genetic secrets remain unexplored. The team noted it may take up to 30 years for the scientific community to fully characterise the E. coli softcore genome’s gene functions; a painstaking but necessary effort to shed light on an iconic species.

The A*STAR-affiliated researchers contributing to this research are from the Genome Institute of Singapore (GIS) and the Bioinformatics Institute (BII).

Want to stay up to date with breakthroughs from A*STAR? Follow us on Twitter and LinkedIn!


Tantoso, E., Eisenhaber, B., Kirsch, M., Shitov, V., Zhao, Z., et al. To kill or to be killed: pangenome analysis of Escherichia coli strains reveals a tailocin specific for pandemic ST131. BMC Biology 40, 146 (2022). | article

Tantoso, E., Eisenhaber, B., Sinha, S., Jensen, L.J. and Eisenhaber, F. About the dark corners in the gene function space of Escherichia coli remaining without illumination by scientific literature. Biology Direct 18, 7 (2023). | article

About the Researcher

View articles

Frank Eisenhaber

Executive Director (BII) and A*STAR Senior Fellow (GIS)

Bioinformatics Institute
Frank Eisenhaber holds an MD and a degree in biophysics from the Pirogov Russian National Research Medical University, Moscow (1985) and a PhD from the Engelhardt Institute of Molecular Biology, Moscow. After working as a Principal Investigator at the Institute of Molecular Pathology (IMP), Vienna (1999-2007), he joined A*STAR’s Bioinformatics Institute (BII) in August 2007 as Executive Director. He is also an A*STAR Senior Fellow at the Genome Institute of Singapore (GIS)’s Laboratory of Gene Function Discovery. Eisenhaber’s research interests are focused on discovering new biomolecular mechanisms with theoretical and biochemical approaches, and functionally characterising yet-uncharacterised genes and pathways. He is one of several scientists credited with the discovery of the SET domain methyltransferases, ATGL, kleisins and many new protein domain functions (eg, in the GPI lipid anchor biosynthesis pathway); and with the development of accurate prediction tools for post-translational modifications and subcellular localisations.

This article was made for A*STAR Research by Wildtype Media Group