Asians are underrepresented in population genetics studies which typical enrol individuals of European descent.

© 2020 A*STAR

Asian genomes in the spotlight

24 Feb 2020

A*STAR scientists have created the world’s largest multi-ethnic Asian genetic database to further biomedical research and shed light on the origins of three major ethnic groups in the region.

In 1948, a group of 5,209 adult subjects from Framingham, Massachusetts in the United States were recruited as part of a population health study to understand the common factors or characteristics contributing to cardiovascular disease. Known as the Framingham Heart Study, the cohort comprised men and women between 30 and 62 years of age who had no prior history of heart attack or stroke at the time of recruitment.

The findings from the study paved the way for the development of a gender-specific algorithm known as the Framingham Risk Score (FRS), which allows clinicians to estimate the ten-year cardiovascular risk of an individual. Widely used in clinics and hospitals across the globe, the FRS has been useful in shaping policy to prevent the development of cardiovascular disease.

However, the metric is not without its limitations—since the Framingham Heart Study was carried out on Americans, many of whom were of European descent, the FRS may underestimate or overestimate the risk of cardiovascular disease in non-US populations. As nations awaken to the ideal of precision medicine, wherein treatments are tailored to the individual, the issue of diversity in clinical and biomedical research has been thrust into the glare of the scientific spotlight.

Finding the Asian connection

Asian populations, in particular, are severely underrepresented in population health analyses. Only five percent of the 15,496 genomes in the Genome Aggregation Database are from East Asians. Similarly, in the Trans-Omics for Precision Medicine Program, only ten percent of its 145,000 samples are Asians.

Spearheading efforts to boost Asian representation in population health and genome-wide association studies are researchers led by Jianjun Liu, Deputy Executive Director and Senior Group Leader, Human Genetics, at A*STAR’s Genome Institute of Singapore (GIS). In the journal Cell, they reported the creation of the world’s largest genetic databank of Asian populations.

The team, which also included scientists from the National University of Singapore (NUS), Duke-NUS Medical School and public hospitals in Singapore, performed whole-genome sequencing (WGS) on 4,810 Singaporeans, capturing 80 percent of Asia’s diversity in three main ethnic groups—Chinese, Malay and Indian. This cohort represents only the first batch of Asian genomes to be sequenced as part of the SG10K project, which was conceived back in 2015. As its name implies, the project’s goal is to have the genomes of 10,000 Singaporeans sequenced and analyzed.

“The objectives are to characterize the genetic structure and variation of the Singapore population, generate a large control dataset for future WGS-based genetic association studies of disease, and create a WGS reference panel for accurate genotype imputation in the Singapore population,” said Liu.

“Upon completion, this study will provide valuable genetic information to facilitate clinical and pharmaceutical research in Singapore populations and empower genetic studies of Singapore and Asian-centric diseases,” he added.

Of snips and natural selection

Even with less than half of the targeted number of genomes sequenced, Liu’s team noted several unique characteristics of Asian genomes. For example, the researchers uncovered 89,160,286 single nucleotide polymorphisms (SNPs), which are specific points in the genome where a DNA base differs among members of a species. They also found 9,113,420 small insertions and deletions (indels). 51 percent of the SNPs and 70 percent of the indels were novel, not having been reported before in public databases.

Probing deeper, the researchers identified seven genomic regions (also known as loci) that are more commonly altered in Asians, suggesting that these stretches of DNA play roles in survival and adaptation to the environment.

“For example, the EDAR and PRSS53 loci showed strong signals of positive selection and are associated with hair morphology in East Asians. The EDAR locus, notably, has been shown to be associated with increased scalp hair thickness and tooth morphology in humans,” said Liu. The remaining five loci are known to influence skin color, alcohol metabolism, cellular response to ultraviolet light exposure and immune response to disease-causing organisms.

These findings thus reaffirm that Asian-specific genomic variation exists, and that these variations have implications on health and disease. With a better understanding of which SNPs and indels are important, researchers and clinicians may be able to develop algorithms to predict the risk of a variety of health conditions in the Asian context. This, in turn, would allow for interventions to prevent disease onset and progression, auguring better health for individuals in the region.

Peeking through a window into history

Beyond the biomedical applications of their work, the analyses by Liu’s team revealed details about the origins of the three major ethnic groups in the region. The researchers report that Chinese, Malays and Indians shared a common ancestral population around 80,000 years ago. Their findings also indicate that about 45,000 years ago, Indians had already split from Chinese and Malays, while the split between Chinese and Malays occurred more recently, some 24,800 years ago.

By examining the overlap among genome sequences of the study cohort, the researchers further noted that Singaporean Chinese are related to the Southern Han Chinese in Beijing and north China. Meanwhile, Singaporean Indians were closely related to the Sri Lankan Tamil, Telugu and Bengali from South India, with a small proportion genealogically affiliated with the Gujarati from west India and Punjabi from Pakistan. The data also showed that Singapore’s Chinese and Indian populations likely intermarried with indigenous people on the Malay Archipelago after migrating from their countries of origins.

Since the publication of their findings, the team has deposited their dataset into the European Genome-phenome Archive, which is a global database accessible to researchers pursuing genetics research. Combined with other databases such as the one developed under the 1,000 Genomes Project, wherein 10 of the 26 populations studied are Asian, the study by Liu’s team could pave the way for more accurate ancestry tracking and population genetics studies.

“We envisage that scientists studying the genetics of human disease in populations originating from or involving participants from the traditionally underrepresented regions of Asia will witness the greatest impact from the SG10K project,” said Liu.

Want to stay up to date with breakthroughs from A*STAR? Follow us on Twitter and LinkedIn!


Wu, D., Dou, J., Chai, X., Bellis, C., Wilm, A. et al. Large-Scale Whole Genome Sequencing of Three Diverse Asian Populations in Singapore. Cell 179, 736-749 (2019) | article

This article was made for A*STAR Research by Wildtype Media Group