
Small changes in the human genome can be as telling about historical events as large human constructions.
istockphoto.com/PEDRE
The subtle sequence variations that naturally emerge in a genome over time—even at the single-nucleotide level —provide invaluable landmarks for geneticists. With a sufficiently large and detailed dataset, it becomes possible to identify useful indicators for complex traits and heritable disorders, and even to infer evolutionary histories that mark migration and intermingling of distinct human populations.
The Han Chinese represent 90% of the most populous nation on Earth, and yet genomic analysis of this ethnic group has been seriously impaired by the limited genomic data currently available. “It has been suggested that the Han Chinese are not as homogenous as people have thought, and that some sub-population structure exists,” says Jianjun Liu of the A*STAR Genome Institute of Singapore. “However, because previous studies were done using a very limited number of DNA markers and a small number of subjects, their findings are only suggestive.”
Since 2007, Liu and collaborators across China have been performing an ongoing series of disease-related genomic studies; by working with this data collection, his team was subsequently able to perform a ground-breaking study that incorporates the analysis of more than 350,000 genomic markers from 6,580 Han Chinese subjects living in ten Chinese provinces.
Their investigation of single nucleotide polymorphisms in the subjects revealed a clear trend of increasing change in the relative distribution of sequence variations along a distinct geographic axis running from north to south, with the greatest genetic variation observed between the northernmost and southernmost provinces. The authors point out that this striking pattern—and the absence of an additional east–west axis—reflects prevailing theories of an early Chinese history that involved a steady process of north-to-south migration and military expansion.
The investigators also compared these provincial samples against more than 1,000 urban Han from Beijing and Shanghai and 570 Han Chinese living abroad in Singapore. These cities exhibited a greater mixture of individuals from across the north–south axis, although southern provinces such as Guangdong were markedly overrepresented among Singaporeans. Intriguingly, a closer examination of single nucleotide polymorphism data from Guangdong also enabled the researchers to identify discrete subpopulations that roughly mirrored the distribution of particular Chinese dialects within the province.
Although these variations are subtle, their effects can add up, and Liu’s team is already applying lessons from this study to their ongoing research. “We’re starting to analyze samples according to geographic origin along the north–south axis to minimize the impact of sub-population structure on our genetic association studies,” he says.
The A*STAR-affiliated authors in this highlight are from the Genome Institute of Singapore.