The number of papers reporting new protein-function discoveries in 2017 declined by two-thirds compared with 2000 output, according to research led by A*STAR.
While the Human Genome Project has made the entire human genetic code available to researchers, making sense of this vast trove of data is challenging.
“For many biologists, discovery of a gene function completely changes their lives — it is their main scientific achievement,” says Frank Eisenhaber, director of A*STAR’s Bioinformatics Institute (BII), who led the study.
The BII team, together with Lars Juhl Jensen from the University of Copenhagen, wanted to explore how the rate of new gene structure and function discoveries changed between 1901 and 2017, by looking at how many papers and patents appeared in the biomedical literature describing previously unknown gene and protein function discoveries.
To do this, they came up with a score, called a ‘full publication equivalent’ or FPE, representing the published equivalent of one whole paper dedicated solely to a single genomic entity, whether a gene, a protein, or a non-coding RNA.
Overall, they found references to 17,824 human proteins and 2,641 human noncoding RNAs in the literature over that period. Of these proteins, 1,610 proteins (9 per cent) scored more than 500 FPEs and accounted for 78 per cent of all relevant papers published. Some of the most frequently mentioned proteins included insulin, serum albumin, tumor necrosis factor and p53.
A further 16 per cent of the literature was dedicated to another 3,207 proteins (18 per cent of the total), which scored between 100 and 500 FPEs. Just over one-third of all proteins mentioned in the literature — 6,439 genomic entities — had 10—100 full FPEs. But only 6 per cent of the literature was left to cover more than 13,000 genomic entities.
The rate of protein function discoveries over time steadily increased from 1980—2000, such that by the year 2000, there were around 500 new protein names being reported in the literature each year.
“The appearance of a new gene name in the literature means that there is a new opening and people seriously start thinking what this gene might mean in terms of physiology and biomedical application,” Frank Eisenhaber says.
Then in 2000, it changed. Despite the fact that the draft human genome sequence became available in 2001, which should have made genomic discoveries easier, the publication rate began a sustained decline. In 2017, the number of genes appearing in the literature for the first time was one-third of the number of genes that appeared in the literature in 2000.
“That’s a huge drop,” Frank Eisenhaber says. “And since function discoveries mainly come from elite institutions, it means they are also affected on a great scale, and that this is a worldwide phenomenon.”
He suggests that the decline in new gene and protein publications may be the result of a diversion in research funding from core budgets towards more short-term, grant-based funding, as well as shorter contracts for academic and research staff.
“For a well-characterized gene, plasmids and antibodies and everything is available, whereas for new genes, you don’t have an antibody or plasmid, you need to produce them yourself,” he says. “It can easily take another year to technically prepare the research besides the scientific challenges when nothing is known, but if you have only two years of a post-doc, can you afford the time to do that?”
The concern is that focusing so much research effort and funding on known genes and their function will leave large areas of the human genome in darkness, and reduce scientists’ ability to explore the full function and structure of our genetic material and apply these results for biomedical benefits.
The A*STAR-affiliated researchers contributing to this research are from the Bioinformatics Institute.