Targeted Metabolomics of Cannabinoids

Two cannabinoids for which standards were obtained, CDBV and CBL, were not detected in any strain. The 11 remaining cannabinoids with available chemical reference standards were identified and quantified. THCA content ranged from 0.76 to 20.71% w/w, with almost a linear increase in content from the lowest to highest strain with an r2 of 0.97, while CBDA content ranged from <MDL to 18.11% w/w, with the highest CBDA strains having the lowest THCA contents (Fig. 1). In THC abundant strains the CBDA levels were less than 0.15%, while in CBD abundant strains the content was greater than 5%. THC, the decarboxylated form of THCA was present in strains from <LOQ up to 2% by weight in some strains, while CBD contents ranged from <MDL to 0.8%. CBD was most prevalent in high CBDA strains. In addition, 7 cannabinoids present at lower levels were quantified using individual calibration standards: THCV, CBG, CBN, CBC, CDBVA, CBGA and Δ8-THC.

Figure 1 Biosynthetic pathway of cannabinoids originating from olivetolic acid and geranyl pyrophosphate. Graphs describe the cannabinoid contents within the 33 strains obtained arranged from lowest to highest total THC. Full size image

Classification of Strains

We hypothesized that individual plant breeders selected for cannabis strains by up-regulating and down-regulating specific enzymes within the biosynthetic pathways resulting in a redirection of metabolites between THCA and CBDA. Our data analysis identified 5 clusters of strains that fall within a narrow range of total CBD/THC values consistent with this hypothesis (Table 1). The branch of the biosynthetic pathway with olivetolic acid and geranyl pyrophosphate as precursors produces CBGA, CBG, CBCA, CBC, THCA, THC, CBDA and CBD (Fig. 1). Strains from all clusters contained measurable amounts of CBGA, CBG, THCA and THC (Fig. 1). Nine strains from the clusters with higher concentrations of THCA (blue and purple) did not contain detectable levels of CBC (Fig. 1). Two of the clusters were not found to contain significant quantities of CBDA and CBD (Fig. 1; blue and purple). One strain was different from all others and had a greater CBDA content and detectable levels of CBGA, CBG CBC, and CBD with minimal THCA and THC (Fig. 1; red).

Table 1 Strains of cannabis were clustered into 5 distinct groups that could be separated by the flow of metabolites through the CBD and THC pathways. Full size table

Compounds produced from the precursors divarinolic acid and geranyl pyrophosphate via CBGVA were also found to differ by strain cluster (Fig. 2). CBGVA appears to be a branch point for allocation of resources in cannabis between THCV and CBDVA indicating that the enzyme activity or the resource allocation mechanism for production of THCV was lost in the breeding process of strains clustered in the red, orange and green groups (Fig. 2).

Figure 2 Biosynthetic pathway of cannabinoids originating from divarinolic acid and geranyl pyrophosphate. Graphs describe the cannabinoid contents within the 33 strains obtained arranged from lowest to highest total THC. Full size image

Untargeted Metabolomics Analysis

In addition to the 11 cannabinoids that corresponded with authentic standards, 21 peaks were identified in the chromatograms with UV spectra characteristic of cannabinoids. By comparison to THC, the contents were estimated from <MDL up to 0.34% by weight. Two unknown cannabinoids (CMPD-7 and CMPD-11) were detected in all strains, while CMPD-3 and CMPD-20 were each only detected in a single strain.

Relationships Between Known and Unknown Cannabinoids

A principal component analysis (PCA) of the autoscaled cannabinoid data was plotted to show the clustering of the samples in an unsupervised fashion (Fig. 3). In the PCA plot, the first two principal components (PC) captured 36.6% of the variance in the data. Based on the loadings plot, the first PC was most highly influenced by the THCA and CBDA content of the strains, which are negatively correlated. There are two high THC strains (CAN17 and CAN21) and one CBD strain (CAN34) that were separated from the data clustered within the 95% confidence limit of the total data variance. Based on the loadings plot (Fig. 3B), CAN17 and CAN21 may be influenced by a significant number of low abundance cannabinoids including CBGA, CMPD-12, and CMPD-11. CAN34 is likely due to its significantly higher CBDA content relative to the other strains and because it contained less than 1% total THC.

Figure 3 Principal Component Analysis (PCA) of cannabinoid profiles classified according to THC/CBD contents (a) scores plot (b) loadings plot. Full size image

While the first two principal components of PCA describe 36% of the variance, there is a remaining 64% of the variance in the cannabinoids not being described with this model. Therefore, additional models were employed to understand the relationships between cannabinoids and to identify additional strain classes based on the content of these 32 different cannabinoids. Multiple linear regression (MLR) analysis showed that 14 cannabinoids were better suited compared to all cannabinoids for predicting THCA content with validation r2 values improving from 0.02 and 0.88, respectively and for predicting CBDA content 14 cannabinoids improved the validation r2 values from 0.49 to 0.95 when compared with using the entire data set.

Pearson correlations were used to determine whether any of the unidentified cannabinoids could be associated with the major cannabinoids THCA, THC, CBDA and CBD (Table 2). There was no significant correlation of THCA or THC and any of the unknown compounds (Table 2). The CBDA content was positively correlated with CMPD1, CBDVA, CMPD5, CMPD6. CMPD16 and CMPD18 (Table 2). CBD was potentially weakly correlated with CMPD1, CMPD6 and CBDA (Table 2).

Table 2 Pearson correlation coefficients of all cannabinoids relative to the four major cannabinoids (THCA, CBDA, THC and CBD) in addition to UV spectral analysis describing cannabinoids as acidic or neutral. Full size table

Putative Identifications and Pathways

Ten of the unknowns were found across multiple strains from all of the clusters (Fig. 4). CMPD1 was strongly correlated with CBDA according to Pearson’s correlation (Table 2) and although it was found in many of the strains classified as blue or purple, it was at much higher concentrations in the red, green and orange clusters (Fig. 5a). Compounds 3,5,6,15 and 18 were found only in the CBD-rich clusters red, green and orange (Fig. 5b–f). Compounds 2, 12, and 20 were found only in THC dominant strains (Fig. 6a–c).

Figure 4 Unknown cannabinoids determined by untargeted metabolomics analysis to be common to all clusters of strains. (a) CMPD4, (b) CMPD7, (c) CMPD8, (d) CMPD9, (e) CMPD10, (f) CMPD11, (g) CMPD14, (h) CMPD16, (i) CMPD19, (j) CMPD21. Full size image

Figure 5 Unidentified cannabinoids determined by untargeted metabolomics analysis to be unique to CBD-rich strains. (a) CMPD1, (b) CMPD3, (c) CMPD5, (d) CMPD6, (e) CMPD15, (f) CMPD18. Full size image