A pangenome reference of 36 Chinese populations
Yang Gao,
Xiaofei Yang,
Hao Chen,
Xinjiang Tan,
Zhaoqing Yang,
Lian Deng,
Baonan Wang,
Shuang Kong,
Songyang Li,
Yuhang Cui,
Chang Lei,
Yimin Wang,
Yuwen Pan,
Sen Ma,
Hao Sun,
Xiaohan Zhao,
Yingbing Shi,
Ziyi Yang,
Dongdong Wu,
Shaoyuan Wu,
Xingming Zhao,
Binyin Shi,
Li Jin,
Zhibin Hu,
Yan Lu (),
Jiayou Chu (),
Kai Ye () and
Shuhua Xu ()
Additional contact information
Yang Gao: Fudan University
Xiaofei Yang: Xi’an Jiaotong University
Hao Chen: Chinese Academy of Sciences
Xinjiang Tan: Chinese Academy of Sciences
Zhaoqing Yang: Chinese Academy of Medical Sciences
Lian Deng: Fudan University
Baonan Wang: Fudan University
Shuang Kong: Fudan University
Songyang Li: Fudan University
Yuhang Cui: Fudan University
Chang Lei: Fudan University
Yimin Wang: Chinese Academy of Sciences
Yuwen Pan: Chinese Academy of Sciences
Sen Ma: Chinese Academy of Sciences
Hao Sun: Chinese Academy of Medical Sciences
Xiaohan Zhao: Fudan University
Yingbing Shi: Fudan University
Ziyi Yang: Fudan University
Dongdong Wu: Chinese Academy of Sciences
Shaoyuan Wu: Jiangsu Normal University
Xingming Zhao: MOE Frontiers Center for Brain Science Fudan University
Binyin Shi: The First Affiliated Hospital of Xi’an Jiaotong University
Li Jin: Fudan University
Zhibin Hu: Nanjing Medical University
Yan Lu: Fudan University
Jiayou Chu: Chinese Academy of Medical Sciences
Kai Ye: Xi’an Jiaotong University
Shuhua Xu: Fudan University
Nature, 2023, vol. 619, issue 7968, 112-121
Abstract:
Abstract Human genomics is witnessing an ongoing paradigm shift from a single reference sequence to a pangenome form, but populations of Asian ancestry are underrepresented. Here we present data from the first phase of the Chinese Pangenome Consortium, including a collection of 116 high-quality and haplotype-phased de novo assemblies based on 58 core samples representing 36 minority Chinese ethnic groups. With an average 30.65× high-fidelity long-read sequence coverage, an average contiguity N50 of more than 35.63 megabases and an average total size of 3.01 gigabases, the CPC core assemblies add 189 million base pairs of euchromatic polymorphic sequences and 1,367 protein-coding gene duplications to GRCh38. We identified 15.9 million small variants and 78,072 structural variants, of which 5.9 million small variants and 34,223 structural variants were not reported in a recently released pangenome reference1. The Chinese Pangenome Consortium data demonstrate a remarkable increase in the discovery of novel and missing sequences when individuals are included from underrepresented minority ethnic groups. The missing reference sequences were enriched with archaic-derived alleles and genes that confer essential functions related to keratinization, response to ultraviolet radiation, DNA repair, immunological responses and lifespan, implying great potential for shedding new light on human evolution and recovering missing heritability in complex disease mapping.
Date: 2023
References: Add references at CitEc
Citations: View citations in EconPapers (1)
Downloads: (external link)
https://www.nature.com/articles/s41586-023-06173-7 Abstract (text/html)
Access to the full text of the articles in this series is restricted.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:nat:nature:v:619:y:2023:i:7968:d:10.1038_s41586-023-06173-7
Ordering information: This journal article can be ordered from
https://www.nature.com/
DOI: 10.1038/s41586-023-06173-7
Access Statistics for this article
Nature is currently edited by Magdalena Skipper
More articles in Nature from Nature
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().