© Westend61/Getty

A decline in gene discoveries

21 Feb 2019

The rate of discovery of human gene functions has fallen significantly

The number of papers reporting new protein-function discoveries in 2017 declined by two-thirds compared with 2000 output, according to research led by A*STAR.

The outer ring in the diagram above visualizes the level of darkness in the human proteome. About 1600 proteins belong to the intensively studied proteome (>500 FPEs), another ~3200 are also well analyzed (>100 FPEs). At the other end of the spectrum are ~4000 proteins not mentioned in any article (super-dark proteome), ~6500 proteins are left with <10FPEs (dark proteome). About 6500 proteins have between 10 and 100 FPEs (illuminated proteome). The middle insert shows the trends how many proteins cross the threshold of 0 (T0), 10 (T10), 20 (T20), 50 (T50), 100 (T100) or 500 (T500) FPEs in a given year. A larger version of the image can be downloaded here.

© 2019 A*STAR Bioinformatics Institute

While the Human Genome Project has made the entire human genetic code available to researchers, making sense of this vast trove of data is challenging.

“For many biologists, discovery of a gene function completely changes their lives — it is their main scientific achievement,” says Frank Eisenhaber, director of A*STAR’s Bioinformatics Institute (BII), who led the study.

The BII team, together with Lars Juhl Jensen from the University of Copenhagen, wanted to explore how the rate of new gene structure and function discoveries changed between 1901 and 2017, by looking at how many papers and patents appeared in the biomedical literature describing previously unknown gene and protein function discoveries.

To do this, they came up with a score, called a ‘full publication equivalent’ or FPE, representing the published equivalent of one whole paper dedicated solely to a single genomic entity, whether a gene, a protein, or a non-coding RNA.

Overall, they found references to 17,824 human proteins and 2,641 human noncoding RNAs in the literature over that period. Of these proteins, 1,610 proteins (9 per cent) scored more than 500 FPEs and accounted for 78 per cent of all relevant papers published. Some of the most frequently mentioned proteins included insulin, serum albumin, tumor necrosis factor and p53.

A further 16 per cent of the literature was dedicated to another 3,207 proteins (18 per cent of the total), which scored between 100 and 500 FPEs. Just over one-third of all proteins mentioned in the literature — 6,439 genomic entities — had 10—100 full FPEs. But only 6 per cent of the literature was left to cover more than 13,000 genomic entities.

The rate of protein function discoveries over time steadily increased from 1980—2000, such that by the year 2000, there were around 500 new protein names being reported in the literature each year.

“The appearance of a new gene name in the literature means that there is a new opening and people seriously start thinking what this gene might mean in terms of physiology and biomedical application,” Frank Eisenhaber says.

Then in 2000, it changed. Despite the fact that the draft human genome sequence became available in 2001, which should have made genomic discoveries easier, the publication rate began a sustained decline. In 2017, the number of genes appearing in the literature for the first time was one-third of the number of genes that appeared in the literature in 2000.

“That’s a huge drop,” Frank Eisenhaber says. “And since function discoveries mainly come from elite institutions, it means they are also affected on a great scale, and that this is a worldwide phenomenon.”

He suggests that the decline in new gene and protein publications may be the result of a diversion in research funding from core budgets towards more short-term, grant-based funding, as well as shorter contracts for academic and research staff.

“For a well-characterized gene, plasmids and antibodies and everything is available, whereas for new genes, you don’t have an antibody or plasmid, you need to produce them yourself,” he says. “It can easily take another year to technically prepare the research besides the scientific challenges when nothing is known, but if you have only two years of a post-doc, can you afford the time to do that?”

The concern is that focusing so much research effort and funding on known genes and their function will leave large areas of the human genome in darkness, and reduce scientists’ ability to explore the full function and structure of our genetic material and apply these results for biomedical benefits.

The A*STAR-affiliated researchers contributing to this research are from the Bioinformatics Institute.

Want to stay up-to-date with A*STAR’s breakthroughs? Follow us on Twitter and LinkedIn!


Sinha, S., Eisenhaber, B., Jensen, L. J., Kalbuaji, B., & Eisenhaber, F., Darkness in the human gene and protein function space: Widely modest or absent illumination by the life science literature and the trend for fewer protein function discoveries since 2000. Proteomics 18, 1800093 (2018). | article

About the Researcher

Frank Eisenhaber

Bioinformatics Institute
Frank Eisenhaber holds an MD and a degree in biophysics from the Pirogov Medical University in Moscow (1985) and a PhD from the Engelhardt Institute of Molecular Biology in Moscow. After working as PI at the Institute of Molecular Pathology (IMP) in Vienna (1999-2007), he joined the Bioinformatics Institute A*STAR Singapore in August 2007 as Executive Director. Frank Eisenhaber's research interest is focused on the discovery of new biomolecular mechanisms with theoretical and biochemical approaches and the functional characterization of yet uncharacterized genes and pathways. As mechanistic insight is the driver for biotechnology, biomedical and clinical applications, this work has catalyzed various lines of applied research. Frank Eisenhaber is one of the scientists credited with the discovery of the SET domain methyltransferases, ATGL, kleisins, many new protein domain functions (for example in the GPI lipid anchor biosynthsis pathway) and with the development of accurate prediction tools for posttranslational modifications and subcellular localizations.

This article was made for A*STAR Research by Nature Research Custom Media, part of Springer Nature