A total of 3,416 normal healthy male subjects between 20–30 years of age were recruited by the Institute of Ayurveda and Integrative Medicine (IAIM), Bangalore, Karnataka (‘B’ in tables); Sinhgad College of Engineering (SCE) Pune, Maharashtra (‘P’ in tables); and Shri Dharmasthala Manjunatheshwara College of Ayurveda (SDMCA), Udupi, Karnataka (‘U’ in tables). Since the hormonal fluctuations during premenstrual and menstrual phases result in numerous physical and psychological disturbances, which may have confounding effect at the time of Prakriti assessment, we have excluded females from this study (detailed justification on inclusion of only males is given in the Methods section). However, several studies have included of both male and female subjects for Ayurveda-based studies7,8,15,16. The subjects belonged to diverse ethnic and linguistic groups and inhabited different geographical regions. The health status of every individual was ascertained by modern as well as Ayurvedic methods (details given in the Methods). The composition of Prakriti was determined by senior Ayurvedic physicians and confirmed independently by ‘AyuSoft’ (http://ayusoft.cdac.in), a software developed based on information from classical Ayurvedic literature. The subjects, whose Prakriti was in concordance between the assessment by Ayurvedic physicians and by AyuSoft were only selected for this study. Of the total 3,416 individuals evaluated, 971 had 60%–93% dominance of one Prakriti (Table S1), of which 262 individuals (94 Vata-dominant, 75 Pitta-dominant and 93 Kapha-dominant) with the highest proportion of one predominant Prakriti were randomly selected and subjected to genome-wide SNP analysis (Affymetrix array, 6.0) and genotypes were fetched using Birdsuite software17. The proportions of each dominant and co-dominant Prakritis are given in Fig. 1; Figure S1.

Figure 1 Box-plot representing the Prakriti proportion of subjects with Vata (94), Pitta (75) and Kapha (93) dominant characteristics. (A) Average percentage of Vata is 67%, while Pita and Kapha are 12% and 18.5%, respectively. (B) Average percentage of Pita is 65%, while Vata and Kapha are 12% and 17%, respectively. (C) Average percentage of Kapha is 70%, while Vata and Pita are 12% and 17%, respectively. Full size image

Out of 262 individuals analyzed, 245 passed the quality controls (QC) with the call rate 0.966 ± 0.0162 (Table S2). In order to validate the high-throughput data set, we randomly selected 48 markers from Affymetrix array and genotyped 48 individuals using custom-designed VeraCode GoldenGate Genotyping Assay System (Illumina, San Diego, USA). The call rate of VeraCode analysis was 99.61% and the genotype matched with Affymetrix data set (Table S3), suggesting that the genotypes obtained from Affymetrix array was genuine with minimum error (0.39%). Further, to increase the statistical power, we used Indian population data set as reference and imputation analysis was performed using Beagle (v3.3.1) software18 (Figure S1). As we had demonstrated earlier that Indian population has unique genetic architecture, we were skeptical of using non-Indian samples as a reference for imputation19. To evaluate our assumption, we masked 2%, 5% and 10% genotype of 207 unrelated Dravidian and Indo-European population samples and performed 110 simulations on chromosome 22 with four-reference populations i.e. Indian population (28 trios of Dravidian and Indo-Europeans; IN), different HapMap populations (CEU, YRI, CHB, CHS and JPT; HM), different South-Asian populations of 1000 genome project (BEB, GIH, ITU, PJL and STU; SA) and Indian along with HapMap populations (IH). As expected, imputed genotypes were more accurate with Indian samples (IN) [2% (0.9518 ± 0.0012); 5% (0.95045 ± 0.00109); 10% (0.9476 ± 0.0005)] compared to HM [2% (0.9462 ± 0.0013); 5% (0.9436 ± 0.0017); 10% (0.9396 ± 0.0005)], IH [2% (0.9463 ± 0.0014); 5% (0.9448 ± 0.0016); 10% (0.9417 ± 0.00066)] and SA [2% (0.9481 ± 0.0013); 5% (0.9471 ± 0.00098); 10% (0.9441 ± 0.00061)] samples (Table S4; Figure S2). In all the three masked data (2%, 5% and 10%), IN showed high imputation performance compared to HM, SA and IH. Even with ~10% masked data, the imputed genotypes were more accurate with IN than other references, suggesting that it is appropriate to use Indian data set for imputation. The data set of Gujarati Indians in Houston (GIH) is the only one available in the public domain, which was admixed recently and hence does not truly represent the ANI-ASI ancestry of Indian population19,20. As the data were not suitable reference for imputation, we prepared our own reference panel of Indian population (http://www.ccmb.res.in/bic/database_pagelink.php?page=snpdata). To achieve this, we followed two steps (i) imputation of 15 trios of Indo-European and 15 trios of Dravidian and (ii) imputation of 229 unrelated individuals imputed with the reference genotype obtained from step-I. Further, we used this reference for imputing the Prakriti individuals. In the first step, we found 10.5% and 17.8% Mendelian inconsistency in two trios, (Kashmiri Pandit) (Table S5), which were removed from the analysis. Finally, we obtained 791186 SNP markers with 0.95 ≤ R2 ≤ 1, for further analysis.

To make sure that the Prakriti samples were collected randomly and there was no major ancestral bias while collecting samples, we performed the principal component analysis (PCA)21 of 245 Prakriti samples (Figure S3). PCA analysis revealed no significant overall differences among the Prakritis (ANOVA p-value on eigenvector 1 V vs. K-0.434; V vs. P-0.89; P vs. K-0.51; and eigenvector 2 V vs. K-0.09; V vs. P-0.06; P vs. K-0.02). In order to check the ancestry of Prakriti individuals, we used our published data set of 297 Indian population samples with known ancestry19,20. These 297 samples include; 150 Dravidians, 80 Indo-European, 35 Austro-Asiatic, 27 Tibeto-Burman and 5 Great Andamanese (Table S6). We found 7,89,309 SNPs were common between Prakriti and Indian ancestral samples. In order to remove the differentiation on spurious axes21, we pruned 3,76,138 SNPs, which were in strong linkage disequilibrium (LD) (r2 > 0.75) and performed PCA with 4,13,171 SNPs. Our analysis showed that most of the Prakriti samples clustered with Dravidian and Indo-European (the two major ancestral population of India) and only 3 samples seemed to be Tibeto-Burman and admixed recently (Figure S4). Previous studies have shown that stratification could cause spurious association22,23,24,25, hence, PCA was performed21 using 4,05,782 SNPs (3,85,404 SNPs were pruned with r2 > 0.75) for 245 Prakriti samples, of which 40 were outliers and have been removed in 10 iterations with σ ≥ 6 on eigenvector 1 to 10 (Table S7; Figure S3 and S5). ANOVA analysis revealed that the Prakriti groups were not significantly different (p-value: V vs. P - 0.40 ± 0.28; V vs. K - 0.51 ± 0.32 and P vs. K - 0.48 ± 0.29) (Table S8); and 205 Prakriti samples were used for further analysis (Figure S1).

Association analysis was performed using plink software26. Since the present study has no cases and controls (patients and healthy), we considered one Prakriti as case and the remaining two Prakritis as controls and performed association analyses in three combinations: Vata vs. Kapha and Pitta (V vs. PK); Pitta vs. Kapha and Vata (P vs. VK); Kapha vs. Pitta and Vata (K vs. VP). Prior to association analysis, 3,890; 4,153 and 4,124, respectively, markers were removed from 791186 markers, which were not in Hardy-Weinberg equilibrium (HWE) i.e. p-value < 0.001 in controls of V vs. PK, P vs. VK and K vs. VP; respectively. The three combination association results were further used to identify the SNPs that were significant. Considering the fact that none of the samples represents 100% single Prakriti, we did not expect very low p-value in the association analysis. In this scenario, truly associated loci may co-exist with false positive markers and can be identified by permutation analysis. As expected, we observed that SNPs having approximately same p-value in the extreme tail of theoretical distribution failed to achieve 106 permutations (Table 1). For example, rs2939743 having p-value 7.61 × 10−5 dropped at 142717th permutation while rs10197747 having p-value 2.50 × 10−5 achieved 106 permutations, which of course revealed that rs2939743 is false positive. Similarly, we found 52 true positive SNPs achieved 1 million simulations with theoretical p-value ≤ 1 × 10−5 (details are given in Table 1; Figure S6).

Table 1 Empirical and theoretical distribution of the associated markers (p-value ≤1 × 10−5) of Vvs. PK, Pvs.VK and Kvs.VP. Full size table

It is well known that some markers differ in allele frequency more across ancestral population, compared to other set of markers. Moreover, natural selection might be the reason for this phenomenon because it acts locus-specific manner21. We speculate that the above so-called true positive loci might be artifacts of population stratification because of high probability of false positive results at the p-value, which observed in association analysis. Hence, we performed extensive statistical analyses to control these confounding factors and/or population stratification. Prevailing methods include genomic control and EIGENSTRAT to find such confounding effect of stratification. Genomic control uses uniform inflation factor to correct stratification, which is not sufficient for those SNPs having high frequency differences between ancestors21. Hence, we proceeded with EIGENSTRAT and found p-value did not change drastically (Table S9). To further confirm, we used variance component model (implemented in EMMAX)27 and mixed-linear model of association analysis (implemented in GCTA)28, which can correct sample structure in association, but have different statistics comparative to eigenstrat. Intriguingly, even with this analysis, we did not observe any drastic change in the p-value (Table S9). This has proved that these 52 SNPs were genuine characteristics of Prakriti and not derived from ancestry. Moreover, we also explored the allele frequency differences between centers; however, we did not find any significant difference for these 52 SNPs (Table S10). We further explored the power of 52 SNPs in Prakritis genetic differentiation (Figure S1). In principal component analysis, 19 SNPs were excluded with r2 > 0.75 and, as expected, we found striking separation of subjects according to their Prakriti (Fig. 2A). On eigenvector-1 (eigenvalue = 18.168248) Pitta significantly differentiated against Vata and Kapha (p-value = 1.11022 × 10−16, 4.44089 × 10−16, respectively); while on eigenvector 2 (eigenvalue = 15.890861) Kapha was significantly different compared to Vata and Pitta (p-value = 3.33067 × 10−16 and ~0 respectively).

Figure 2 Principal component analysis (PCA) with 52 SNPs that showed p-value of <1 × 10-5 (A) PCA of Prakriti individuals showing three clusters (Vata, Pitta, Kapha), despite their linguistic, ethnic and geographical diversity. (B) PCA projection of Indian population samples with Prakriti individuals. Full size image

To examine the statistical power of these 52 markers for categorizing the samples with unknown Prakriti, we generated a statistical model (see methods). Initially, we applied it on 205 samples and found 23.9% (49 out of 205) were explained by the proposed model (Table S11). Further, we applied it on 297 Indian (population) samples and found 37 individuals (5 Austro-Asiatic; 22 Dravidian; 8 Indo-European and 2 Tibeto-Burman) satisfying the model. According to the model, 7 individuals were Vata, 20 were Pitta and 11 were Kapha. Interestingly, Indian population samples, which belong to one Prakriti were from different ancestry (Table S12), suggesting that these makers could separate the Prakritis, irrespective of their ancestry. To confirm the proposed model, we projected these 37 individuals on eigenvector of Prakriti samples and found that these individuals clustered with Prakriti as predicted in the model (Fig. 2B). It suggests that the cluster is based on Prakriti and is not due to the ancestry of samples. That would also suggest that the phenotypic variations have a genetic basis, which would be shared by Prakritis of Ayurveda.

Further, we used these 52 markers to find the genotype-phenotype correlations. We observed that 2 markers (rs10518915 and rs986846) were associated with two different Prakriti; rs10518915 with Vata and Pitta, while rs986846 with Kapha and Vata. This observation prompted us to believe that different alleles of the same locus might be influencing different Prakriti (Table 1). In order to correlate the functional relevance of these SNPs, we divided them into genic and non-genic. The SNPs, which are within 10 kb of gene, were considered genic; while others as non-genic29,30. We found 28 were genic SNPs, of which 12 were in Vata (7 genes), 11 in Pitta (7 genes) and 6 in Kapha (7 genes) (Table 1). To correlate the function of these genes with respect to the characteristics of Prakritis, we searched in Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway and Reactome event and found PGM1 gene associated with the Pitta phenotype. In Ayurveda, characteristics of Pitta include digestion, metabolism and energy production. Interestingly, we found PGM1 gene is in the center of many metabolic pathways i.e. glycolysis or gluconeogenesis (hsa00010); pentose phosphate pathway (hsa00030); galactose metabolism (hsa00052); purine metabolism (hsa00230) and; starch and sucrose metabolism (hsa00500) (Figure S7). Our finding suggests that the function of the gene directly correlates with the role of Pitta in metabolism as described in Ayurvedic literature.

In addition, we have checked the PGM1 gene markers in Affymetrix data set and found 4 markers (rs2269241, rs2269240, rs2269239,and rs2269238) were associated with Pitta Prakriti and all are in strong Linkage Disequilibrium (LD) (Figure S8). Therefore, to find the functionally relevant variants, we sequenced the whole exons and UTRs of the PGM1 gene in 78 individuals using Ion Torrent PGM (Life Technologies, USA). We found 23 variations in the gene, of which 8 were novel (Table S13). Interestingly, one non-synonymous; c.1258T > C (p.Tyr420His) (rs11208257) variant was present in the LD block and found in association with Pitta Prakriti (p-value–7.049 × 10−3). The frequency of the mutant allele “C” was 5.8% in Pitta and 20% in Kapha Prakriti (Table S14). This result prompted us to replicate the marker (rs11208257) in additional samples. We genotyped this marker (rs11208257) for 665 Prakriti individuals (299 Vata, 164 Pitta and 202 Kapha) using Sanger sequencing method. Initially, we analyzed the distribution of the genotype among participating centres and found “U” samples (collected from Udupi centre) were not in HWE (p-value - 0.04) (Table S15). Hence, we excluded 169 “U” samples from the analysis. Association analysis revealed that allelic and genotype distribution of the marker rs11208257 is significantly different in Pitta Prakriti against Vata and Kapha with p-value- 2.06 × 10−2; p-value- 6.16 × 10−3, respectively. Further, we explored the association between P vs. V and P vs. K; and found significant p-value - 7.61 × 10−3 and 2.35 × 10−2, respectively. The results would therefore suggest that Vata differs more from the Pitta Prakriti than Kapha (Table S16). We further screened 1108 randomly selected Indians and 992 HapMap samples and found that the frequency of mutant allele “C” was 17.9% among Indians, 15.5–17.6% in the Europeans, 14.5–18.8% in East Asians, 42% in Mexican, 15.3% in admixed Indians (GIH) and 12.8–28.3% in Africans. Indians have comparable frequency with Europeans and GIH (Table S17). Interestingly, we found Pitta has less frequency of mutant “C” allele and Vata and Kapha have comparable frequency with overall Indian population. To explore the functional relevance of the variant, we used SIFT software and found that the mutation is damaging with 0.01 score and thus substitution at this position may affect the protein function. Our data suggest that the SNP (rs11208257) in PGM1 gene is linked with one of the main features (energy production), which is more homogenous and constant in Pitta than with Vata and Kapha and a genotype correlation exists for the characteristics of Prakriti classification.

In conclusion, our preliminary study suggests that the Prakriti classification, as a foundation for the practice of Ayurveda, has a genetic basis and does provide clues for further studies.