The Chinese National Twin Registry (CNTR) currently includes data from 61 566 twin pair from 11 provinces or cities in China. Of these, 31 705, 15 060 and 13 531 pairs are monozygotic, same‐sex dizygotic and opposite‐sex dizygotic pairs, respectively, determined by opposite sex or intrapair similarity. Since its establishment in 2001, the CNTR has provided an important resource for analysing genetic and environmental influences on chronic diseases especially cardiovascular diseases. Recently, the CNTR has focused on collecting biologic specimens from disease‐concordant or disease‐discordant twin pairs or from twin pairs reared apart. More than 8000 pairs of these twins have been registered, and blood samples have been collected from more than 1500 pairs. In this review, we summarize the main findings from univariate and multivariate genetic effects analyses, gene–environment interaction studies, omics studies exploring DNA methylation and metabolomic markers associated with phenotypes. There remains further scope for CNTR research and data mining. The plan for future development of the CNTR is described. The CNTR welcomes worldwide collaboration.

Introduction It has been nearly 150 years since Francis Galton noted in 1875: ‘The history of twins, as a criterion of the relative powers of nature and nurture 1,although at that time he did not realize that there were two different types of twins, monozygotic (MZ) and dizygotic (DZ) twins. MZ twins develop from a single fertilized egg and consequently share the same genes. DZ twins develop from two fertilized eggs and share the same number of genes as nontwin siblings. Both MZ and DZ twins share prenatal and early environments. Thus, twins provide invaluable opportunities for controlling for genetic and early environmental confounding factors when exploring the causes of diseases. Twin registries are considered to be an effective way to recruit twins for research. During the past decades, at least 71 twin registries in 28 countries have been established and, globally, over 1.5 million twins and their family members participate in twin studies 2. The birth rate of twins in China was approximately 0.78% in 1989 3, and increased to 1.88% during the period 2007–2014 4. The rising rate of twin births over recent years may be due to more advanced maternal age and increased use of assisted reproductive technology, according to three studies conducted in Southwest China, Zhengzhou (in central China) and Qingdao (in Eastern China) 5-7. China has the largest population of any country worldwide. According to the Chinese National Bureau of Statistics, 17.23 million babies were born in 2017. Therefore, it is estimated that there are at least 130 000 twin or multiple births each year in China, providing an enormous potential resource for research.

History of the Chinese National Twin Registry The Chinese National Twin Registry (CNTR) was established in 2001, after a pilot study was initiated in 1999. Financial support for the CNTR was first received from an independent American foundation, the Rockefeller Foundation's China Medical Board, from 2001 to 2005. In 2002, the CNTR was first reported in the journal Twin Research 8. By the end of 2005, the CNTR had recruited twins from four locations (Beijing, Shanghai, Qingdao and Lishui located in the north and south, and urban and rural areas of China, respectively) (Figure 1a). In 2006, the status of the CNTR was reported in a special review issue of Twin Research and Human Genetics for twin registries 9. In 2010, the CNTR obtained a grant from the Chinese government (the special fund for health scientific research in public welfare, 201002007), which expanded the twin registry from four areas to nine provinces or cities (Figure 1b). In 2013, the status of the CNTR was updated 10. From 2015 to 2018, the CNTR continued to receive funding from the Chinese government (201502006) and had recruited 61 566 twin pairs (including multiple sets) in 11 provinces or cities by 15 February 2019 (Figure 1c). The CNTR is the first population‐based twin registry in China and is now the largest twin registry in China as well as in other Asian countries. Figure 1 Open in figure viewer PowerPoint Distribution of the Chinese National Twin Registry in 2005 (a), 2013 (b) and 2019 (c).

Aims of the CNTR Unlike many other twin registries focusing on behavioural genetics, the CNTR initially aimed to study the genetic and environmental contributions to complex diseases, with particular emphasis on cardiovascular diseases, via the establishment of a population‐based twin registry in several selected areas to represent northern and southern, as well as urban and rural, areas in China. During the past 20 years of growth, however, the CNTR has not only provided important insights into cardiovascular diseases but has also served as a valuable resource within a broad range of study areas. The protocol of the CNTR was reviewed and approved by the Ethics Committee for Human Subject Studies of the Peking University Health Science Center in 2001 (no approval number), 2011 (IRB00001052‐11029), 2013 (IRB00001052‐13022) and 2014 (IRB00001052‐14021). Over the past 4 years, the CNTR has been reviewed annually by this ethics committee.

Twin recruitment The CNTR is administered by the Department of Epidemiology and Biostatistics, School of Public Health, Peking University Health Science Center, China. The CNTR does not currently have access to an effective data collection system, such as that used by Nordic countries including Sweden to obtain twin data. The CNTR collects twin data through face‐to‐face interviews; this may be time‐consuming but accurate and detailed data can be obtained. Twins are recruited mainly through the Centers for Disease Control and Prevention (CDC) system at all levels (province, city and county) in China. As early as the 1930s, China built a three‐level healthcare service system in rural areas; that is, there were preventive health workers at the village/community, town and county levels. The CDC system includes health workers at the village/community level in China, who are usually familiar with the distribution of twins in their villages or communities. In addition to the CDC system, the CNTR also recruits twins through the ‘Hukou system’ in China, which is administered by the public security bureau. Several CDCs exchange birth data with the public security bureau. The CDCs match the birthdays of members of the same family to find ‘suspected’ twins; authentic twins can be identified after verification by health workers. In addition, advertisements in newspapers or online media are also used to recruit twins. Fig. 2 shows the current logo of the CNTR. The two dancing figures resemble the Chinese character ‘双’ which is pronounced ‘shuang’ and means ‘double’. The green colour of the logo implies health and cheerfulness. The traditional character element ‘双’ is designed into the image of the logo, which is not common in twin registries in other countries. Figure 2 Open in figure viewer PowerPoint Logo of the Chinese National Twin Registry.

Questionnaire The CNTR mainly uses two types of questionnaire: a simple questionnaire for twins under 18 years of age and a more detailed questionnaire for adults (18 years and above). For the simple questionnaire, demographic information, parents’ contact details, self‐reported height and weight, medical history and zygosity information are collected. Twins <18 years are also asked about their birthweight, whether they were breastfed, and whether they were conceived with the aid of assisted reproductive technology. Adult twins ≥18 years are questioned about their contact details, socio‐economic status, birthweight, height, weight, waist circumference, and medical history (themselves and their family), zygosity, and whether reared apart, and about their lifestyles, including smoking, alcohol consumption, physical activity and diet. Tea drinking is a traditional part of Chinese culture, therefore the questionnaire for twins ≥18 years collects information about the types, quantities and frequency of tea drinking.

Zygosity determination There are many different methods for determination of twin zygosity. Gene data are the gold standard for determining zygosity, but this method is costly and time‐consuming in large‐scale research studies. The most popular method is the ‘Peas in the Pod Questionnaire (PPQ)’ which asks about the degree of similarity shared by twins when they were of school age 11. The accuracy of the PPQ in Western countries is usually as high as 95% 11, 12. In China, however, according to our own comparison studies, the accuracy of this method reaches 86%–90% 13, 14. We use the PPQ for twin zygosity assessment when genotyping information is not available. For twins for whom blood samples are available, we use several methods including blood group typing, analysis of four or nine short tandem repeat (STR) genetic markers, single‐nucleotide polymorphisms (SNPs) using the Illumina HumanOmniZhongHua‐8 BeadChip, SNP information from the Illumina Infinium HumanMethylation450 BeadChip, and MethylationEPIC ‘850K’ BeadChip array.

Distribution of twins Table 1 shows demographic information collected for twins in the CNTR. Because the initial aim of the CNTR was cardiovascular disease research, twins ≥18 years were preferred by the registry, but younger twins seemed to be more easily recruited. Thus, Table 1 cannot represent the real distribution of local twins by zygosity, gender and age group. The proportion of twins <20 years of age is almost half in the CNTR. Same‐sex DZ twins are slightly more common than DZ twins of opposite sex. This is not consistent with Weinberg's differential rule, which states that DZ twins are equally likely to be of the same or opposite sex 15. Possible explanations for this discrepancy are that the CNTR is not a birth registry or that real‐world data may deviate from this assumption of independence of sexes within DZ twin pairs 16. Table 1. Distribution of twins by zygosity, gender and age Age Twin pairs Triplet/quadruplet sets Others Total MZ DZ MZ DZ MZM MZF DZM DZF DZO Same‐sex DZO 0–4 2336 2283 1496 1260 2763 21 11 5 312 10 487 5–9 1764 1722 912 762 1677 26 19 1 243 7126 10–14 1363 1272 703 657 1201 17 8 0 161 5382 15–19 1423 1502 648 523 1168 6 3 5 119 5397 20–29 3662 3766 1856 1144 3055 10 9 14 143 13 659 30–39 2582 1723 1458 945 1796 1 6 3 122 8636 40–49 2368 1220 1073 471 1122 2 0 3 81 6340 50–59 1196 504 528 214 516 0 1 0 45 3004 ≥60 705 228 272 78 198 0 0 0 41 1522 Missing 1 2 2 1 4 0 0 0 3 13 Total 17 400 14 222 8948 6055 13 500 83 57 31 1270 61 566

Discordant and concordant twin pairs Disease‐discordant twins, particularly MZ twins, are excellent subjects for matched case–control studies in which confounding effects of age, sex, genetic background, and intrauterine and early environments are perfectly controlled. In a computer simulation study, Tan et al. found that disease‐concordant twins can increase the power of genetic association studies. Compared with cases recruited from unrelated patients, the sample sizes generally needed for disease‐concordant twin cases are only half (DZ) and a quarter (MZ) of those in ordinary case–control designs 17. Thus, both disease‐discordant and disease‐concordant twin pairs provide invaluable data. In the CNTR, we focused on obesity, hypertension, diabetes, asthma and genetic diseases in twins <18 years of age, and on obesity, hypertension, hyperlipidaemia, diabetes, coronary heart disease, stroke, chronic bronchitis/emphysema and cancers in twins ≥18 years. All diseases are self‐reported at baseline and follow‐up investigations; however, obesity is calculated based on self‐reported height and weight. The distribution of disease‐discordant and disease‐concordant twin pairs is shown in Table 2; obesity, hypertension and diabetes are the three most common conditions. Lifestyle factors including smoking, alcohol consumption, fruit and vegetable consumption, and physical activity are shown for discordant and concordant twin pairs in Table 3. These twin data provide opportunities for prospective cohort studies. Table 2. Disease‐discordant and disease‐concordant twin pairs in the Chinese National Twin Registry Disease Y/Y Y/N N/N Total MZ % DZ % MZ % DZ % MZ % DZ % Obesity 653 1.2 297 0.5 748 1.3 1202 2.2 27 777 49.9 24 938 44.8 55 615 Hypertension 487 0.8 210 0.4 610 1.0 584 1.0 29 580 50.7 26 824 46.0 58 295 Diabetes 175 0.3 60 0.1 256 0.4 290 0.5 30 910 51.8 27 960 46.9 59 651 Hyperlipidaemia 134 0.3 64 0.2 299 0.7 266 0.7 21 839 53.5 18 214 44.6 40 816 Coronary heart disease 75 0.2 24 0.1 185 0.5 148 0.4 21 372 54.1 17 710 44.8 39 514 Chronic bronchitis/emphysema 68 0.2 33 0.1 160 0.4 149 0.4 22 050 54.0 18 368 45.0 40 828 Cancer 45 0.1 22 0.1 149 0.4 126 0.3 22 090 54.1 18 407 45.1 40 839 Genetic disease 94 0.2 44 0.1 66 0.1 53 0.1 28 186 51.7 26 097 47.8 54 540 Stroke 38 0.1 20 0.0 109 0.3 75 0.2 22 121 54.2 18 452 45.2 40 815 Asthma 42 0.2 23 0.1 50 0.2 94 0.4 11 309 48.2 11 940 50.9 23 458 Table 3. Lifestyle‐discordant and concordant twin pairs in the Chinese National Twin Registry Exposure Y/Y Y/N N/N Total MZ % DZ % MZ % DZ % MZ % DZ % Smokinga 3267 10.1 1430 4.4 1751 5.4 3432 10.6 12 519 38.7 9917 30.7 32 316 Alcohol consumption 2712 8.4 1367 4.2 1471 4.6 2671 8.3 13 349 41.3 10 743 33.2 32 313 Fruit and vegetable consumptionb 5249 18.8 3913 14.0 623 2.2 711 2.5 9295 33.3 8111 29.1 27 902 Physical activityc 5740 19.4 4278 14.4 1906 6.4 2086 7.0 8487 28.6 7154 24.1 29 651

Blood sampling At the initiation of the CNTR, we collected fasting blood samples from 1008 twin pairs, regardless of their illness, in 2001–2002 and followed 579 twin pairs in 2004–2005. We performed biochemical tests and collected DNA from these blood samples. The zygosity of these twins was determined by gender, blood group typing and STR marker analysis. From 2010, we mainly collected blood samples from targeted twin pairs, where at least one of the twins suffered from the above‐mentioned chronic diseases. In addition, twins reared apart provide an opportunity for research into the influence of the early family environment 18. Fasting blood samples from such twins were also collected. Table 4 shows the zygosity of targeted twin pairs with available blood samples. Levels of total cholesterol, triglycerides, low‐density lipoprotein cholesterol, high‐density lipoprotein cholesterol, apolipoprotein (Apo) AI, ApoB, glucose, glycosylated haemoglobin, insulin, hypersensitive C‐reactive protein and creatinine were determined in these serum samples. Physical examinations, including height, weight, waist and hip circumference, blood pressure, pulse rate and body fat percentage, were also conducted in these twins. Table 4. Disease distribution (twin pairs with blood sample available) MZ DZ Zygosity unknown Total Disease‐discordant Obesity 123 168 1 292 Hypertension 189 140 0 329 Diabetes 100 65 0 165 Hyperlipidaemia 106 51 1 158 Coronary heart disease 53 24 1 78 Chronic bronchitis/emphysema 51 29 0 80 Cancer 30 21 0 51 Stroke 29 14 0 43 Disease‐concordant Obesity 88 40 0 128 Hypertension 212 67 1 280 Diabetes 73 22 0 95 Hyperlipidaemia 20 16 1 37 Coronary heart disease 14 3 0 17 Chronic bronchitis/emphysema 12 3 0 15 Cancer 0 1 0 1 Stroke 3 1 0 4 Twins reared apart 88 52 1 141

Scientific research Based on the twin registry, we conducted two types of scientific research projects. The first involved descriptive analysis of twin births and of twins reared apart, and validation of zygosity determination methods 13, 14, 19. Our main findings included that the birth rate of twins in particular areas increased from 1987 to 2002 7, that raising twins separately would become less common in the future 18, and that the accuracy of the PPQ in China was slightly lower than in Western countries 13, 14. The need was recognized to find more similarity features suitable for eastern populations apart from hair colour, hair type or eye colour, which are typically used in western countries 13. The other set of research projects involved classical twin studies combined with complex molecular genetic analyses 20. The classical twin studies compared the phenotypic resemblance of MZ and DZ twins and estimated the heritability of phenotypes. So far, the heritability of 29 phenotypes, mainly including risk factors for chronic diseases, has been calculated for the CNTR participants (Table 5). Table 5. Heritability of 29 phenotypes based on subjects in the Chinese National Twin Registry Phenotype Heritability BMI (≥18 years) 0.88 31 32 33 34 35 BMI (<18 years) 0.07–0.30 for girls, 0.15–0.58 for boys in different age groups 36 Height (<18 years) 0.07–0.32 for girls, 0.13–0.72 for boys in different age groups 36 Weight (<18 years) 0.13–0.35 for girls, 0.28–0.63 for boys in different age groups 36 Waist circumference 0.75 32 34 Waist–hip ratio 0.61 32 Waist–height ratio 0.48 (0.45, 0.51) 34 Glucose 0.47 31 Total cholesterol 0.34 31 Triglycerides 0.17 31 HDL cholesterol 0.26 31 Systolic blood pressure 0.78 31 33 Diastolic blood pressure 0.67 31 33 Homoeostasis model assessment 0.35 for men, 0.46 for women 37 Diabetes 0.41 (0.15, 0.75) Apolipoprotein AI 0.60 38 Apolipoprotein B100 0.69 38 Smoking behaviour 0.69 (0.65, 0.73) 39 40 Age at starting smoking 0.00 39 40 Smoking cessation 0.27 (0.19, 0.37) 40 Physical activity 0.78 (0.35, 0.96), 0.59 (0.00, 0.94) 41 Sedentary behaviour 0.68 (0.59, 0.75), 0.32 (0.07, 0.62) 41 Personality disorder 0.68 (0.60, 0.75) 42 Social support (objective support) 0.00 43 Social support (subjective support) 0.30 (0.00, 0.64) 43 Social support (utilizing support) 0.28 (0.00, 0.53) 43 Left‐handedness 0.46 44 Hand‐clasping 0.14 44 Arm‐folding 0.06 44 Multivariate designs, in which more than one trait per person is analysed, can be used to investigate the causes of association and comorbidity between phenotypes 20. The CNTR has conducted several multivariate analyses (Table 6) which focused on smoking and alcohol consumption 21, body composition measurements and serum lipid, glucose, and insulin profiles 22, obesity indicators and blood pressure 23, and education level and marital status with obesity 24. Gene–environment interaction studies demonstrated that alcohol consumption and physical activity could attenuate the genetic effects on BMI 25, 26. Table 6. Genetic correlation amongst phenotypes based on the Chinese National Twin Registry Phenotypes Genetic correlation Current tobacco use–alcohol use 0.32 (95% CI 0.17, 0.46) 21 BMI–insulin 0.685 (95% CI 0.472, 0.999) 22 BMI–HOMA‐IR 0.682 (95% CI 0.461, 0.999) 22 WC–TG 0.552 (95% CI 0.187, 0.999) 22 WC–GLU 0.530 (95% CI 0.163, 0.999) 22 WC–LDL‐C 0.303 (95% CI 0.061, 0.600) 22 PBF–TG 0.795 (95% CI 0.427, 0.999) 22 PBF–TC 0.442 (95% CI 0.032, 0.801) 22 PBF–LDL‐C 0.414 (95% CI 0.042, 0.725) 22 SBP–DBP 0.764 (95% CI 0.578, 0.827) 23 BMI–SBP 0.310 (95% CI 0.071, 0.640) 23 WC–SBP 0.309 (95% CI 0.046, 0.656) 23 BMI–DBP 0.369 (95% CI 0.145, 0.678) 23 WC–DBP 0.374 (95% CI 0.128, 0.722) 23 WHtR–DBP 0.392 (95% CI 0.079, 0.780) 23 WHtR–DBP 0.457 (95% CI 0.031, 1.000) 23 BMI–educational level (male) 0.015 (95% CI 0.006, 0.952) 24 BMI–educational level (female) −0.369 (95% CI −0.585, −0.148) 24 In the current omics era, twin studies are considered to have continuing value: ‘The comparison of molecular profiles of phenotypically discordant monozygotic twin pairs is a powerful method to identify molecular characteristics associated with complex traits, including differentially methylated genes and metabolic profiles’ 27. The CNTR has conducted several studies to explore differentially methylated sites that correlated with childhood obesity 28, childhood blood lipids, adult obesity 29 and adult hypertension 30. Table 7 shows significant CpG markers correlated with the target phenotypes. Overall, 118 MZ and 97 DZ twin pairs were tested using Illumina 450k methylation array, and 87 MZ and 51 DZ twin pairs using Illumina 850k methylation array in the CNTR. The methylation study was also combined with a metabolomics study in obesity‐discordant twin pairs (data not shown). Table 7. The main findings from the DNA methylation study Phenotypes Suspected DNA methylation markers correlated BMI (7–16 years) Cg05684382, cg26188191 28 BMI (adults) Cg15053022 in ATP4A gene 29 SBP, DBP, mean arterial pressure (adults) Cg07761116 30

Future of the CNTR Despite the progress made in twin studies, there is still an opportunity for future CNTR research and data mining. In the next stage, the CNTR will continue to follow the twins who have been recruited and to design matched case–control and cohort studies in disease/lifestyle‐discordant twin pairs. Most importantly, the CNTR is planning to establish a twin health information platform (contact for further information: ChineseNTR@126.com) (Fig. 3). First, researchers are encouraged to submit proposals to apply for access to the CNTR data of interest for the purpose of collaboration. Next, a committee of experts at the CNTR will review the proposal. Non‐Chinese researchers who receive approval need to obtain further permission from China Human Genetic Resources Administration Office, before gaining access to the data. At the end of the project, principal investigators should submit a research report and, if applicable, share their data, which will eventually contribute to the development of a twin health information platform. Such a virtuous circle will attract increasingly more researchers, and the database will become larger. Based on the platform, the CNTR welcomes worldwide collaboration. The CNTR has developed international cooperation with researchers from the Swedish Twin Registry, Harvard University and the University of Helsinki, including jointly applying for international cooperative projects and joining consortia such as CODATwins. International collaborators are encouraged to compare or interpret the results of data analysis without access to the raw data. Figure 3 Open in figure viewer PowerPoint Twin health information platform.

Acknowledgements The CNTR is supported by the special fund for health scientific research in public welfare, China (201002007, 201502006), Key Project of Chinese Ministry of Education (310006), National Natural Science Foundation of China (81573223,81473041,81202264, 81711530051) and China Medical Board (01‐746). We gratefully acknowledge support from the Centers of Disease Control and Prevention in Qingdao, Dezhou, Zhejiang, Jiangsu, Sichuan, Beijing, Shanghai, Tianjin, Qinghai, Heilongjiang agricultural area, Handan, and Yunnan, and School of Public Health, Harbin Medical University. We appreciate the collaboration with the Swedish Twin Registry. We thank Dr Xia Li for her English editing and Mr. Songjian Chen for his contribution to data checking of this review.

Conflict of interest All authors declare that they have no conflicts of interest.