In this study, we investigated the association of CNVs and longevity in Han Chinese by genotyping 4007 individuals obtained from the Chinese Longitudinal Healthy Longevity Survey (CLHLS) database. We have identified a few CNVs, and most of them were new. Some of them encode the known genes and pathways that have been associated with longevity only in this study. The deletions of the cancer-causing and aging-related genes encoded in these CNVRs may extend healthy lifespan for long-lived individuals. Future research on these CNVRs will shed light on the regulatory mechanism controlling human longevity.

A few studies have investigated the association of CNV with human lifespan using genome-wide approaches [ 6 – 10 ]. Kuningas [ 7 ] first reported this association in 11442 human samples representing two cohorts with the ages ranging from 62.0 to 75.3 and 34.0 to 69.8. They found large deletions in 11p15.5 ( p = 2.8×10 -6 , Hazard Ratio (HR) = 1.59) and 14q21.3 ( p = 1.5×10 -3 , HR = 1.57) among the oldest people. Another study uncovered a deletion in the CNTNAP4 gene in a female group of 80 years of age (Odds Ratio (OR) = 0.41, p = 0.007), but not the male group (OR = 0.97, p = 1) [ 9 ]. Recently, a study in Caucasians (n = 388; cases: 81-90 year-olds; controls: 65-75 year-olds) revealed an insertion allele of the CNTNAP2 gene in esv11910 CNV of males (OR = 0.29, 95%; CI: 0.14–0.59; p = 4×10 -4 ), but not females (OR = 0.82, 95%; CI: 0.42–1.57, p = 0.625) [ 10 ].

Human lifespan has long been observed as a complex trait with approximately 25% genetic contributions [ 1 ]. To date, only very few genes have been shown consistently associated with it [ 2 – 5 ]. Recent studies reported that copy number variation (CNV) may directly contribute to human lifespan [ 6 – 10 ]. CNV is a general term for all the chromosomal rearrangements, such as deletions, duplications [ 11 ]. CNVs can change gene structures, thus affecting gene expression and phenotypes. In human, CNVs have been implicated in numerous diseases, such as autism and diabetes [ 12 , 13 ]. CNVs also contribute significantly to the genome instability of cancer cells [ 14 – 16 ]. For example, recently, Pelttari [ 14 ] reported a unique duplication encompassing most of the RAD51 homolog C gene in breast and ovarian cancers. Habibi [ 15 ] showed that copy number changes in the gene loci, UDP Glucuronosyltransferase Family 2 members, B28 and B17 were associated with prostate cancer. In addition, Zhou and his colleagues linked 93 CNVs to hepatocellular carcinoma [ 16 ].

Our three other CNVRs were found to be comparable with the CNVRs identified in the U.S. or Danish study ( Table 4 ). The deletion in 20q13.33 (Chr20: 60872280 - 60909518) partially overlapped with the Danish CNVR (Chr20: 60872280 - 60909518). Its population frequencies were 1.07%, 0.33%, and 0 in Han Chinese, Danish, and the 1000 genomes database, respectively. The other two CNVRs consisted of one duplication in 19p12 (Chr19: 22145147 – 22231026) and one deletion in 8p23 (Chr8: 1789006 – 1796964). Both shared 9% and 12% overlapping regions with a similar duplication and deletion in the ‘wellderly’, respectively. The population frequencies in Han Chinese, Ad Mixed American, and the 1000 genomes database for the duplication CNVR were 2.36%, 0.1%, and 0, and 1.02%, 0.59%, and 0 for the deletion CNVR, correlatively These results also indicated that the CNV frequencies were higher in long-lived population than those represented by the 1000 Genomes Project, which composed of randomly picked adults.

Among the 11 CNVRs, the duplication in 7p11.2 (Chr7: 57787663- 57839005) was interesting. The Danish study identified a similar duplication in 7p11.2 (Chr7: 57208666 - 57882950) with a population frequency of 1%. Another similar duplication was also found in the U. S. ‘wellderly’ with a population frequency of 0.2% and a 19% overlap ( Table 4 ). To further determine the distribution of this duplication in different ethnicities, we examined the genomes in the 1000 Genomes Project, which contained randomly picked adults older than 18 years old from healthy populations of multiple ethnics. A similar duplication (Chr7: 57729553 - 57807369) was found in the phase 3 database with low frequencies or no distribution across all populations: 0.1% in European, 0.8% in African, and none in East Asian, South Asian, and mixed American (an admixed population of Americans). In contrast, the frequency of this duplication was significantly higher in long-lived individuals: 15.89%, 1%, and 0.2% in the Han Chinese, Danish, and American population, respectively. Thus, these studies strongly supported that the duplication in 7p11.2 associated with longevity.

To determine if our identified CNVRs in Han Chinese were also found in other ethnicities, we compared our data to a Danish and an American study. The Danish study investigated 603 nonagenarians or long-lived individuals of 90.0–102.5 years of age or 96.9 on average [ 6 ], while the U.S. ‘wellderly’ healthy aging cohort included 1,354 individuals from 80 to 105 years old or 84.2 on average [ 26 ].

To determine if SNPs could alter gene expression in CNVRs, we investigated our 11 CNVRs using the Haploreg database [ 25 ]. Four of them were found to contain multiple SNPs, defining ≥10 eQTLs in tissue database that might affect gene function ( Supplementary Table 4 ). These CNVRs are the deletions in 21q22.13, 20q13.33, and 11p15.4, and a duplication in 19p12. The eQTLs in the 21q22.13 deletion region could alter the expressions of the PIGP and TTC3 genes primarily in brain tissue. According to the annotation database Genecards ( http://www.genecards.org/ ), PIGP resides in the protein complex that catalyzes the transfer of N-acetylglucosamine from UDP-GlcNAc to phosphatidylinositol, the first step of the glycosylphosphatidylinositol (GPI) biosynthesis. TTC3 is an ubiquitin-protein ligase that mediates the ubiquitination and subsequent degradation of phosphorylated AKT proteins. Several SNPs in the 20q13.33 deletion might affect the LAMA5 and CABLES2 gene expression. LAMA5 has been shown previously to affect cellular adhesion, migration, and organization. Two SNPs, rs1661052 and rs450244, in the 11p15.4 deletion region, showed strong cis-eQTL characteristics and might affect the SLC22A18 expression and its function in transporting organic cations. Finally, the SNPs located in the 19p12 duplication region can affect the expression of the ZNF gene family member such as ZNF208 and ZNF 257 .

To determine if the difference in CNVR could be found between the nonagenarians (90-99) and centenarians (>100), we conducted the CNVR analyses in these two groups separately We detected four new CNVRs that were unique for the nonagenarians, and six new CNVRs only in the centenarians ( Supplementary Table 2 ).

To further understand whether these CNVRs affect specific pathways regulating aging process, The Database for Annotation, Visualization and Integrated Discovery (DAVID), Gene Ontology (GO), and Functional Enrichment analysis tool (FunRich) [ 22 – 24 ] were used for the enriched pathway analyses ( Table 3 ). Six pathways were found to be enriched using FunRich including FOXA1 and FOXA transcription factor networks, which were most directly related to the regulation of longevity.

Some of these genes have previously been linked to longevity. For example, it has been shown that the rs4925386 in LAMA5 gene in 20q13.33 belongs to laminin alpha family. The age-stratified analyses showed that the rs4925386-T allele was positively associated with longevity ( p = 0.001) [ 18 ]. The CNVR in 9q34.1 encodes genes SH2D3C and TOR2A . SH2D3C is a signaling adapter protein required for T cell activation, and a decrease in SH2D3C might cause B cell dysfunction and impaired immune function, which could affect lifespan [ 19 ]. In addition, the CNVR in 14q21.1 encodes the gene, FOXA1 ; upregulation of FOXA1 has been shown to decrease diet-restriction-induced longevity in C. elegans [ 20 ]. Additionally, NR2F2, a gene about 14kb away from 15q26.2 ( p = 8.43×10 -5 , OR = 4.57) showed higher expression in older samples and was related to vascular development during oxidative stress-induced cellular senescence; vascular-related disease could be used as a marker for aging people [ 21 ].

To determine if our identified CNVRs can affect biological function, we analyzed these loci for encoded genes. We found several genes within or near 11 CNVRs including Zinc Finger Protein 716 ( ZNF716) , Zinc Finger Protein 208 ( ZNF208), Zinc Finger Protein 257 ( ZNF257), Zic Family Member 5 (ZIC5), Zic Family Member 2 (ZIC2), Pleckstrin Homology Like Domain Family A Member 2 (PHLDA2), Solute Carrier Family 22 Member 18 (SLC22A18), Family With Sequence Similarity 53 Member A (FAM53A), Peptidyl-TRNA Hydrolase 1 Homolog (PTRH1), SH2 Domain Containing 3C (SH2D3C), Torsin Family 2 Member A (TOR2A), Tetratricopeptide Repeat Domain 16 (TTC16), Laminin Subunit Alpha 5 (LAMA5), Rho Guanine Nucleotide Exchange Factor 10 (ARHGEF10), Phosphatidylinositol Glycan Anchor Biosynthesis Class P (PIGP), Tetratricopeptide Repeat Domain 3 (TTC3), Tetratricopeptide Repeat Domain 6 (TTC6), Forkhead Box A1 (FOXA1) and Nuclear Receptor Subfamily 2 Group F Member 2 (NR2F2) ( Table 2 ).

Figure 2. The distribution of CNVs among 4007 individuals. The first circle indicates the positions of chromosomal bands; the second is a histogram representing the frequencies of CNVs in long-lived individuals (red: deletions; blue: duplications; height: CNV frequencies); the third is also a histogram showing the CNV frequencies in the long-lived (orange) and middle-aged (orange) individuals. Frequency: orange and green, >1%; grey, <1%; red, the validated CNVRs by qPCR. The heatmap shows the p-values of CNVRs: orange for the long-lived, green for the middle-aged, and the color gradience towards dark indicates decreasing p-values. The text presents the names of the CNVRs identified in this study.

We identified the CNVRs associated with longevity, according to a previously published method [ 17 ]. These regions included 62 deletions and 5 duplications identified from the Northern cohorts, 21 deletions and 6 duplications in the Southern cohorts, and 99 deletions and 27 duplications in the combined samples ( Figure 2 ). Among them, we identified eleven CNVs (2 duplications and 9 deletions) with case frequency > 1%, p value < 3.97×10 -4 and length > 10k in long-lived individuals from the north as well as the north and south combined samples ( Table 2 , Supplementary Table 1 ). These CNVRs have been previously deposited in the database of genomic variants (DGV, http://dgv.tcag.ca/dgv/app/home ) through other studies unrelated to longevity ( Supplementary Figure 2 ). In order to confirm the accuracy of genotyping and CNV detecting methods, top six CNVRs (with p-value less than 10 -5 ) were chose to do the quantitative PCR. ( Supplementary Figure 3 ) and all these six CNVRs were validated accurate.

Figure 1. CNV burden in different age groups. ( a ) The numbers of CNVs in different age groups. ( b ) The added lengths of CNVs in different age groups. Triangles represent the numbers of the CNVs. The areas in each histogram represent the percentage of different lengths. * p<0.05 by t-test.

In all participating subjects including both cases and the controls, we identified 10046 deletions and 6932 duplications in total. We found that the numbers of CNVs increased significantly in older ages (Spearman rho = 0.386, p = 0.002; Figure 1 ), particularly, the deletions increased much more than the duplications (Spearman rho = 0.356, p = 0.004). On average, the centenarians (101.51 ± 0.07 years of age) contained 4.15 ± 0.20 deletions and 2.30 ± 0.10 duplications, showing a significant increase ( p = 0.001) compared to the middle-aged (48.22 ± 0.16 years of age), who had 3.25 ± 0.16 deletions and 2.14 ± 0.07 duplications. The Spearman correlation was also calculated to show the significance between ages and total added lengths of CNVs (Spearman rho = 0.31, p = 0.017); long-lived people had 508.54 ± 19.8 kb of total CNVs, which was significantly longer than 453.69 ± 17.25 kb in middle-aged controls ( p = 0.024). In summary, we have shown that the CNV numbers, especially the deletion numbers, increased significantly in the genomes of long-lived people.

Using the Principal Component Analysis (PCA) method, no extra sub-cluster was found between case and control comparison ( Supplementary Figure 1 ). Our data obtained from the Northern cohorts revealed a significantly increased number of CNVs among long-lived individuals compared to the controls (6.94 ± 0.35 vs 5.96 ± 0.27, p = 0.027). The similar results were found in the Southern replicate cohorts (5.96 ± 0.27 vs 4.60 ± 0.17; p = 0.001). Total identified CNVs were summarized in detail in Table 1 .

To identify the CNVs associated with longevity in Chinese population, we recruited 1950 long-lived individuals called “cases”, and 2057 middle-aged Chinese as “controls. These individuals were further separated into four cohorts: 1000 long-lived and 1215 middle-aged who lived in the north of China, and the rest of 950 long-lived and 842 middle-aged who resided in the south. The two Northern cohorts were also called “discovery cohorts”, and the two Southern cohorts were referred as “replicate cohorts”.

Discussion

In a genome-wide association study, we investigated the genomes obtained from 1950 long-lived and 2057 middle-aged Han Chinese people and identified 11 CNVRs that were associated with longevity. Four of them had partially overlapping regions with the CNVRs uncovered from the long-lived Danish or U.S population, while the rest seven were first reported in this study. Our statistical analyses indicated that the four overlapped CNVRs in the 7p11.2, 20q13.33, 19p12, and 8p23.3 bands were the strongest candidates with p values, 8.68×10-7, 4.42×10-5, 1.89×10-6, and 4.45×10-5, respectively.

It has been well-accepted that an increasing number or total length of deletions or duplications indicate genomic instability. Forsberg et al. has shown that genome instability increases with longevity [27]. In this study, we observed a significant increase in CNV numbers between long-lived and middle aged individuals (Figure 1). The added length of total CNVs also reflected this increase. Nygaard et al. has shown that mortality had a significant increase per 10 kb of CNV length increase in long-lived individuals [6]. We hypothesize that the long-lived CNVs are not hazard, for example, the deletions of carcinogenesis-related regions may help people improve lifespan.

In this study, we identified 11 CNVRs associated with long-lived Han Chinese including 2 duplications and 9 deletions. These CNVRs might affect the expressions of 19 known genes (Table 2), some of these genes were known to regulate cellular aging process.

For example, several studies have shown that the variant rs8105767 (ZNF208), which locates 20kb away from our 19p12 CNV region, led to several diseases by shortening the length of telomeres. A population study [28] demonstrated that the mutation of this gene could cause neuroblastoma. A genome-wide meta-analysis identified seven loci affecting telomere, and one of which was the rs8105767. This variant was also shown to be involved in shortened telomeres in leucocytes and increased the risk of coronary artery disease in the European descent population [29]. Thus, these results indicated that this CNV may affect telomere, therefore, the human longevity.

ARHGEF10 in 8p23.3 has been shown to play a key role in the RhoA signaling; an animal research demonstrated that the inactivation of ARHGEF10 could inhibit the platelet aggregation and protect mice from thrombus formation [30]. A genome-wide association study indicated that the variant rs7862362 A>T significantly decreased the cutaneous melanoma-specific survival [31]. Interestingly, one research showed that the deletion of 8p23.3 could induce intellectual disability and delay developmental processes [32], specifically in the Chinese population. These results suggest the deletion of ARHGEF10 may promote healthy aging by improving vascular function and suppress cancer occurrence.

The LAMA5 gene locates in 20q13.33. An Italian GWAS study has associated its rs4925386 T allele with longevity (p = 0.001) and shorter stature (p = 0.01) [18]. LAMA5 was reported to mediate cellular adhesion, migration, and organization [18]. Overexpression of LAMA5 can induce colorectal cancer through KRAS, while the genetic inactivation of LAMA5 impairs the adhesion of KRAS-mutant colorectal cancer cells [33]. It was showed in a Chinese population that the alterations of the duplication DNVR in 20q13.33 could increase the risk of ovarian endometriosis and glioma [34,35]. Thus, the deletion of this region may reduce the risk of the carcinogenesis and extend lifespan for these long-lived individuals.

Among all the 19 CNVR related genes, FOXA1 might have the strongest effects on longevity. The FOXA1 transcription factor network plays a key role in DNA repairing, therefore, maintaining the integrity of the genome. It has been shown that inhibiting the expression of FOXA1 can extend the lifespan in C.elegans [20]. Some studies suggested that FOXA1 was potentially an oncogene because overexpression or increased copy numbers of FOXA1 were found in a variety of cancers [36,37]. In our samples, 1.53% of cases had deletions in FOXA1, which may potentially reduce the risk of cancer, leading to a healthier lifespan. It has been hypothesized that some people live longer were due to the reduced copy numbers in certain oncogene regions.

Besides their direct effects on gene structures, SNPs in CNVs can also change gene expression. We showed, in three of our identified CNVRs, that eQTLs resulting from multiple SNPs could affect the expression of nearby genes. For example, the SNPs in the 21q22.13 CNVR correlated significantly with the expression of the PIGP and TTC3 genes in brain tissue. PIGP regulates glycosylphosphatidylinositol-anchor biosynthesis in animals and has also been related to Down syndrome [38]. Variants of this gene could lead to age-dependent Alzheimer’s disease [39]. Another example is that the 20q13.33 deletion region also contains multiple SNPs that could affect the expression of LAMA5 and CABLES2 genes. A study involving a SNP from the 11p15.4 deletion region led to the discovery that SLC22A18 in this region may function as a tumor suppressor [40]. Our analyses showed that most of our identified CNVRs contain multiple SNPs, thus affecting gene expression.

In conclusion, we analyzed 4007 samples in a genome-wide association study and identified 11 CNVRs including 9 deletions and 2 duplications that were strongly associated with long-lived Han Chinese. Four of them were also found in the similar chromosomal regions in a Danish or a U.S. long-lived populations, suggesting these might be commonly shared in long-lived human. The other seven were first identified in this study. We also found that the number of deletions increased significantly with longevity. Based on our gene and pathway analysis results, we conclude that some of our identified CNVRs encode cancer-causing or aging-related genes, and the deletions of these regions may extend healthy lifespan for long-lived individuals. Future research on these CNVRs will shed light on the regulatory mechanisms controlling human longevity.