The marijuana analytics company Steep Hill doesn’t smell dank, or skunky, or “loud”—unless you happen to arrive when a client is dropping off a sample. No seven-pointed-leaf logos ornament the walls; no Tibetan prayer flags flutter from the doorframe. Inside, a half-dozen young scientists work in a glass-walled lab to the sounds of whirring ventilation and soft jazz. The effect is one of professionalism and scientific objectivity.

Still, this place is all about weed. And Reggie Gaudino, Steep Hill’s burly and dreadlocked 53-year-old vice president of scientific operations, does look the part. Steep Hill is headquartered in famously 420-friendly Berkeley, California, after all. “I’ve been smoking since I was 13 years old,” he says, looking down over a railing at the lab. It’s a world he has long appreciated. Now he’d like to give a little back. “There’s so much good that can be done with cannabis, and so little of it is being done.”

Preston Gannaway

As more and more states (23 so far) are finding legal ways for people to consume cannabis, Steep Hill and labs like it are becoming more important. Steep Hill quantifies the numbers you see on labels in dispensaries: how much tetrahydrocannabinol (THC, the molecule that gets you high) and cannabidiol (CBD, the component of weed thought to alleviate seizures) are in a given strain of pot. But any remotely dedicated smoker will tell you that a strain is more than its potency. Purple Kush and Sour Diesel have different characters, different smells and tastes and feels. Those are the result of the interactions of hundreds of molecules—cannabinoids, yes, but also another class called terpenoids. Myrcene, for example, smells like hops and mango (and some fans claim it increases the potency of THC). Beta-caryophyllene has the scent of pepper. There’s also ocimene, nerolidol, pinene—the interaction of all these chemicals creates whatever distinction exists between ’78 LA OG Affie and, say, Green Crack.

So when someone drops off one of those samples at Steep Hill’s reception, the lab swoops in to quantify 27 of the most prominent of these flavorful, experience-defining molecules. After eight years in business, the company has accumulated and tested thousands of samples—it has stacks and stacks of plant tissue in test tubes in a giant freezer. It has analytical chemistry on those, and thanks to a deal with the marijuana review site Leafly, the company also has thousands of crowdsourced reviews. When it comes to data on weed, Steep Hill is, well, the bomb.

It’s one thing, though, to know what molecules are found in different weed strains. It’s another to know what those chemicals actually do—scientifically speaking. Their aromas certainly affect the experience of consumption, somehow. They might even underpin cannabis’s putative medicinal effects—fighting nausea, stimulating appetite, easing seizures, and perhaps even more.

And it’s yet another thing to understand the genetic basis for those differences. That’s the key. It’s what you need if you plan to breed scientifically, to enhance the qualities the market might pay for. Even more than legalization, that’s how you transform marijuana from an illicit pleasure to a licit business. “Every other commercially important agricultural plant in the world has had a ton of research done on it,” Gaudino says. “But here is this commercially important crop that has so much variation, and nobody knows what that variation’s all about.”

Plant biologists would love to understand cannabis better. But marijuana is a Schedule I drug in the United States, as illegal as heroin. Most academic researchers working with it are limited to (pathetic) weed grown at the University of Mississippi. Much of the research funding comes from the National Institute on Drug Abuse, which prioritizes studying ill effects over any potential good.

But Steep Hill has all those samples and all those chemical profiles. Now it just needs the genetics. And Gaudino, a geneticist and former patent agent, has a plan to get that. The problem is, deciphering the pot genome is, like, way harder than it sounds.

In 1993, the average THC content in weed was about 3 percent by weight. Over the next 15 years, breeders tripled the potency. Today, not even a decade later, levels top out at a whopping 37 percent. Thank the war on drugs: As growers moved indoors and out of sight, they drove up THC levels. Then they could charge more to pay for the costs of climate control and artificial lighting.

Smokers have gotten savvier, too. Increasing THC gets you higher but lessens the plant’s ability to make other, arguably more interesting, cannabinoids and terpenoids. So growers also set out to create new breeds that would be as different from one another as a chardonnay and a pinot noir. And it sort of worked: Just like a vintner will rattle off a bottle’s tasting notes and terroir, a Denver budtender can sell a smoker on a plant’s piney nose and its concentration of crystallized trichomes, hairlike protrusions that contain high levels of psychoactive cannabinoids. These kinds of characteristics, the ones you can see (or smell), are a plant’s phenotype.

If you know your plant’s genotype, though—the genes behind those traits—then you can grow the plants with the traits you want much faster and with extreme precision. Called marker-assisted selection, it’s the key to modern agriculture.

When Gaudino joined Steep Hill in 2014, he looked at the company’s vast trove of data and asked CEO David Lampach what kind of research their competitors were doing. Lampach’s response: “What do you mean, what are people doing? There are only three testing labs worth anything in the entire US.”

Gaudino was shocked. “I asked, ‘Have you guys ever considered genetic analysis?’”

Specifically, Gaudino wanted to build a full assembly of marijuana’s 800 million base pairs and 10 chromosomes to help breeders discover more markers for specific traits. Then, ideally, they’d be able to turn up the expression of any of the hundreds of chemicals in weed—some that smell great, some that get you high, and some that might ease pain or maybe even treat a disease. “My mad-scientist dream is a database where you can type in what you’re looking for,” Gaudino says. “You’ll either get out the strain that exists that does that or if it doesn’t exist, it’ll tell you what strains you could begin breeding.”

Preston Gannaway

Other people had already tried it. In 2011, Kevin McKernan, chief scientific officer of a firm called Medicinal Genomics, made public the sequences for strains called Chemdawg and LA Confidential. And Jonathan Page, a biochemist with Canada’s National Research Council, had results for the Purple Kush genome. But these weren’t the kind of sequences anyone could use.

The problem is, geneticists don’t simply unspool all the DNA in a cell and then run it through a scanner, like the roll on an old-time player piano. They break those miles of code into teeny pieces, read those, and then use the overlaps to put them all back together like a jigsaw puzzle. The go-to standard sequencing machine, built by a company called Illumina, scans pieces of DNA from 100 to 350 base pairs long. (A single gene might comprise more than 2,000 base pairs.)

This method isn’t great for plants. Their genomes are naturally full of repeating sequences, which makes it almost impossible to tell which fragments overlap—they all look the same, so you can’t line them up. Worse, plants tend to maintain multiple copies of their useful, core genes as backups in case something goes awry in their environment. (Unlike animals, which can run away from their problems, plants have had to adapt to their protean surroundings.)

Cannabis breeders have made the problems even worse. They’ve been crossbreeding for so long to pump up pot’s psychoactivity that modern strains can have as many as 11 copies of the gene that synthesizes THC. If the crossbred genome were a jigsaw puzzle, most of the picture would be blue sky.

In the end, those first attempts to sequence the cannabis genome yielded hundreds of thousands of tiny fragments, so many that nobody could stitch them together. But Gaudino thought he could do better. “I’m not a gambling man, but this was one of the times that I gambled,” he says. “And I went long.” In 2014, Steep Hill spent $1.1 million on a PacBio RS II sequencer, one of fewer than 200 in the country. It’s a giant white box sitting next to the freezer full of frozen buds, adorned with 8-inch-tall Cheech and Chong dolls that Gaudino got when he was a kid. Unlike the much cheaper Illumina sequencers, the PacBio reads fragments of DNA as long as 53,000 base pairs.

Then Gaudino went to a Berkeley dispensary, bought a citrusy-smelling Kush strain called Pineapple Bubba, and spent $20,000 on reagents and data-crunching to sequence it. It wasn’t a genome yet: 583 million base pairs shattered into 18,000 puzzle pieces. Still, they were longer than anyone else had, easier to reassemble. Gaudino just needed more data to string them together.

Preston Gannaway

The Emerald Cup happens every December at the Sonoma County Fairgrounds in Santa Rosa, California, south of the cannabis-growing heartland of Mendocino and Humboldt counties. In a sawdust-strewn enclosure built for prize livestock, the toasts of the Northern California weed-growing community set up booths to advertise their wares, compete to see whose is best, and distribute samples to anyone with a medical marijuana card. They stack their entries on panes of glass in an LED-lined case, bud upon perfectly pruned bud.

Everybody is here to smoke and share, maybe catch a band. Collie Buddz and the Expanders are playing. But outside an exhibit hall at the other end of the grounds, a trio of geneticists has just presented a panel on weed DNA. These are Gaudino’s competitors, each working on their own sequences and genetic products. And they’re all having trouble pulling it off.

Plant genomes are tricky to sequence, and cannabis DNA is particularly challenging.

The variety in the hundreds of Emerald entries has set the scientists’ heads spinning, and not in a good way. “The plant is amazing because of this diversity,” says Mowgli Holmes, chief scientific officer of a Portland, Oregon, cannabis research lab called Phylos Bioscience. “But all that variation makes genomic assembly a nightmare.”

Holmes is taking a crack at sequencing the high-CBD strain Cannatonic, sending it to genome pioneer Craig Venter’s company Synthetic Genomics in the hope that its PacBio could make sense of it. So far it hasn’t. “I never want to see that plant again,” Holmes says to McKernan, the geneticist who sequenced Chemdawg and LA Confidential.

Cannatonic, like many of the strains at Emerald, is a hybrid, crossbred between different strains to get new traits. These modern plants are more likely to be heterozygous, with two versions of a given gene. They yield stronger offspring. But to put together a good sequence—a reference genome—you need an exemplar that’s homozygous, with two matched sets of chromosomes. That’s what a consortium of federally funded researchers did with corn, for example—sequenced a highly inbred strain.

Without a solid, inbred strain, it’s unlikely that any of the weed scientists can assemble a reference genome. Page, the Canadian biochemist behind that first, piecemeal Purple Kush sequence, says they should try anyway. “A Kush group, a Haze group—we should get references going,” he says. McKernan seems to agree. He has extracted DNA from the winning plants so he can sequence them when he gets home.

But even though no one says it out loud, they’re all thinking the same thing. Trying to find a reference genome is a sucker’s bet. Yes, if you had a ready map for which genes were on which chromosomes, each of their fragmentary sequences would suddenly get that much easier to assemble. Directed breeding would be within reach. “Once we have the real reference,” McKernan says, “they all become much more valuable.”

That’s the trick. If these were academics, they’d work together. Sequencing the maize genome took 33 labs, 157 researchers, $32 million, and four years. But these people are in it for profit. If any one of them invests in a solid reference genome—not just the $20,000 to run the machine but the time and terabytes it takes to assemble the data—everybody else’s crappy sequences increase in value for free. If Gaudino keeps pumping money into his PacBio sequencer to come up with a better and better sequence, all it does is make his competitors more powerful.

Preston Gannaway

So Gaudino doesn’t turn that machine back on. Steep Hill can’t really afford another $20,000 run right now, anyway. Money is yet another advantage that big academic collaborations have over a private lab. Gaudino is also working with a geneticist at the University of Colorado Boulder to meld the PacBio sequence with an Illumina-based genome. The work continues.

None of this has been a waste of time. You don’t actually need a genome to find genetic markers. Geneticists can lump all the unassembled sequences for, say, lemony-smelling plants into one group and search them for a bit of DNA they have in common. That could be a marker for lemony smell. Steep Hill has already found a marker that can tell male and female plants apart, so growers don’t waste time with male plants, which won’t produce buds.

Using a different genetic technique—looking at points of mutation called single nucleotide polymorphisms—Gaudino, Page, and McKernan have begun constructing a crude evolutionary tree for cannabis. (The more SNPs two strains share, the more closely related they are.) More practically, SNPs can distinguish one strain from another. Medicinal Genomics, Steep Hill, and Phylos Bioscience all have strain identification products on the way to fight counterfeits. You can’t patent an illegal product; holding onto IP in the weed business is tough enough.

But the world is changing. Cannabis is becoming an economic force, and more legal. Someone, somewhere, is going to do this work—to figure out how to modify weed with the same ease that Monsanto tweaks corn. And if Steep Hill can be there helping crack the code, it stands to fundamentally change how the $40 billion pot industry works.

That’s great for marketability and for tasting notes, but the community has even higher hopes. “In the old days, you’d smoke what you could get,” Gaudino says. “Now, there’ll be so much diversity in strains that you’ll be able to pick the exact high you want.” Some biologists think terpenoids and cannabinoids work together to activate the brain’s cannabinoid receptors. Changing their balance, then, could alter pot’s effects, calibrating the high or even its medicinal properties. But research into the neurochemistry of weed is just as far behind as the genetics.

A few weeks after the Emerald Cup, in search of at least a hint of what those chemicals might do, Gaudino and Lampach drive up to the Santa Cruz Mountains, through redwood groves, along switchbacks cut deep into rock. At the top they pull into the driveway of a house surrounded by gardens. Kymron deCesare comes out to meet them—he is Steep Hill’s herbalist and the company’s second-most stereotypical stoner, with a long, white beard and hair braided down to his waist. He’s wearing bell-bottoms.

After lunch, deCesare decamps to his garage lab, a small bench littered with dropper bottles and disposable plastic pipettes. The bottles are all filled with concentrated terpenoids. He suctions a half milliliter of alpha-pinene out of a brown bottle, into a rocket-shaped, inch-long micropipette tube, and adds a half milliliter of limonene drawn from a huge plastic container. Then even less than that of beta-caryophyllene.

DeCesare brings the tube to a small outdoor shed next to a roaring fire, where Gaudino is preparing a fat joint. Gaudino sprinkles the contents of the tube onto the weed, rolling quickly before the liquid wets the paper. He lights up, takes a seat on the swing set, and passes to the left.

This isn’t the first time they’ve experimented with terpenoid enhancement. Everyone here has good memories of beta-caryophyllene: Lampach remembers it as a “crown chakra” kind of effect. “We got this really short, intense head rush,” Gaudino says. “I got this white noise in my head, and then I started to get this visual response. All the lights got kind of fractal, and then it was like 30 seconds and it was gone.”

Is it true? Replicable? Salable? Maybe. Are the effects of the extra chemicals real or placebo? It’s hard to tell. This approach to research has some distinct limitations. Soon, Gaudino is pontificating, though about what, even he probably isn’t sure. “That’s one of the coolest things about the world, because—”

A flutter of wings stops him. A flock of pigeons rises from the top of a redwood. The group stares up, marveling in unison: “Whoa.” Regardless of its relative terpenoid concentrations, this is some potent weed.

Senior associate editor Katie M. Palmer (@katiempalmer) covers science at WIRED.

This article appears in the April 2016 issue.

Photographs by Preston Gannaway