For millennia, humans have grown Cannabis for fiber, food, oil and — yes — for that telltale buzz. And today, Cannabis is a booming business. Marijuana is legal in ten states and Washington, D.C. An additional 33 states and four territories sport medical marijuana programs. Last year, Canada legalized recreational marijuana outright, and the U.S. Food & Drug administration approved the first drug containing a Cannabis-derived compound.

THC, or not THC

Despite the upsurge, Cannabis cultivators have little firm knowledge about the genetic innards of the devil’s lettuce — particularly about the genetic variations responsible for the differences between Cannabis strains. Cannabis comes in many varieties — from Skunk to Carmen to Acapulco Gold. Today’s pot aficionados are generally on the prowl for strains that produce high levels of cannabinoid compounds — especially the infamous tetrahydrocannabinol, or THC. Other Cannabis varieties, such as hemp strains, produce almost no THC, but are valued for their fiber.

Knots to you

Studies show that gene copy number could explain some of the differences between Cannabis strains, such as levels of THC production. But fully unlocking the genetic secrets of Cannabis traits like cannabinoid production ultimately requires wading headlong into a mire more unpleasant than a dirty bong: the Cannabis genome itself. By all accounts, it is a sticky, complex realm of repeats, duplications and AT-rich mazes spread among 10 chromosome pairs. The half-dozen or so Cannabis genome assemblies that have come out since 2011 are complex collections of 2,000 to 300,000 contigs, depending on assembly.

But advancements like Proximo Hi-C in have revived hope that it’s possible to build increasingly longer and more complete assemblies of the Cannabis genome — moving the field away from fragmented assemblies and toward the highly complete chromosome-length scaffolds that are key to building a better bud.

“A refined genome assembly will enable molecular breeding programs to deploy marker-assisted selection for yield, flowering time, pest resistance and rare cannabinoid expression,” said Kevin McKernan of Medicinal Genomics.

Dash of progress

McKernan heads one of two teams that recently partnered with Phase Genomics to construct new Cannabis genome assemblies. His group used PacBio sequencing and Proximo Hi-C from Phase Genomics to assemble the genome of the Jamaican Lion strain of Cannabis. The team reports that its assembly consists of a record-low number of assembled sequences — 10 nearly complete chromosome-scale scaffolds, with only 25 additional unscaffolded sequences — which is a four-fold improvement in contiguity compared to all other Cannabis assemblies, and a 230-fold improvement over the inaugural Cannabis genome.

“We need to understand order and orientation in Cannabis genomes so that we can better predict the utility of genetic markers for breeding and to help us locate genes that might have variation leading to particular phenotypes,” said Dr. Alisha Holloway of Phylos Bioscience, which has worked with Phase Genomics. “Hi-C helped us get those long-range associations between genomic regions.”

Paying for Cannabis studies is never easy, given cannabis’ mixed legal status between states and on the federal level. The federal government classifies marijuana as a Schedule I substance – putting in the same group as cocaine and heroin. That erects substantial barriers for researchers seeking traditional funding sources. To fund this particular endeavor, McKernan and his colleagues turned to a less traditional source — the open-source cryptocurrency Dash, an offshoot of Bitcoin, to help crowdfund this project.

“We applied for funds in May of 2018 and had the first assembly public on August 2,” said McKernan.

The cannabinoid landscape revealed

Their assembly revealed that Jamaican Lion is quite a hoarder when it comes to cannabinoid synthesis. It harbors multiple copies of THCAS, which encodes the primary enzyme for producing the THC precursor, as well as multiple copies of synthesis genes for precursors of two related cannabinoids: cannabichromene and cannabidiol, commonly known as CBD. The copies, often arrayed in tandem, lie within messy regions of transposable elements and other repeats.

“This region has been an assembly knot for over seven years and I think the only reason it is visible to us today is due to novel sequencing tools we didn’t have in 2011,” said McKernan.

Jamaican Lion also sports about a dozen copies of cannabinoid synthesis genes that, at least by sequence, show high similarity to THCAS. Future experiments will have to divine what roles all of these loci play in cannabinoid production.

Follow the latest news and policy debates on agricultural biotech and biomedicine? Subscribe to our newsletter.

Another perspective

In parallel, Phase Genomics also worked with a Cannabis research team from Harvard University, the University of Minnesota and the J. Craig Venter Institute. This team used Oxford Nanopore sequencing to create a genome assembly for CBDRx — a high-CBD, low-THC Cannabis cultivar. They discovered that CBDRx harbors at least seven copies of THCAS clustered in a region that, like Jamaican Lion, is rich in repeats and transposable elements. The genome also harbored six copies of CBDAS — which encodes the CBD precursor enzyme — five of which are clustered in a separate repeat-riddled region.

New answers and more questions

These studies prove that two, and likely many, Cannabis strains house their cannabinoid synthesis genes in a complex and sticky genetic architecture, which raises questions about how Cannabis strains regulate production of THC, CBD and related compounds. Some copies may be pseudogenes, based on mutations conferring amino acid substitutions in both studies, or they may regulate the expression of other gene copies.

Regulation of cannabinoids may also occur elsewhere in the genome. The second study also sequenced the genomes of the marijuana strain Skunk, the hemp strain Carmen, a Skunk-Carmen F1 hybrid and 96 F2 plants for a quantitative trait locus mapping project to identify genetic loci involved in cannabinoid potency. The THCAS and CBDAS clusters lit up as important regions, but so did loci on other chromosomes, indicating that synthesis enzymes aren’t the only genetic regions affecting cannabinoid levels.

Searching for that future buzz

Understanding cannabinoid synthesis is critical for the Cannabis field. Cultivators face pressure — both commercial and legal — to craft strains with consistently high levels of certain cannabinoids and low levels of others. For example, laws restrict THC content for Cannabis varieties like hemp. And as interest grows in pharmaceutical uses for CBD — which was approved last year to treat a form of epilepsy — there is growing interest in high-CBD cultivars.

“Plants today produce a vast array of chemical profiles, but are not yet bred to produce reliable seed lines that are easy to cultivate,” said Dr. Holloway. “Paired with appropriate clinical trials and focused breeding programs, we’re going to create new, consistent medicines and recreational experiences that we might not have even dreamed of yet!”

As the Cannabis field shifts with these trends and sharpens its techniques, expect more eyes peering at those genome assemblies.

“We are still discovering gems!” said Dr. Holloway.

Kaylee Mueller is an employee at Phase Genomics who often writes about genomics and metagenomics. Follow her on Twitter @Kayleezyme