Because 98% of the human genome doesn’t serve a direct role in gene expression, many biologists have long thought of them as nothing but “junk DNA.” But might they hold the key to helping stem the formation of deadly cancers? In episode 34, Mike Feigin from Roswell Park Comprehensive Cancer Center talks with us about his discovery of mutations in part of the human genome that most people have so far tended to ignore, but which appears to regulate the expression of genes that drive the formation of cancers. His article “Recurrent noncoding regulatory mutations in pancreatic ductal adenocarcinoma” (public-access PDF) was published with multiple co-authors on May 8, 2017 in the journal Nature Genetics.

Websites and other resources

Press coverage

Cold Springs Harbor Laboratory | eCancer | MedicalXpress | EurekAlert

Bonus Clips

Patrons of Parsing Science gain exclusive access to bonus clips from all our episodes and can also download mp3s of every individual episode.

Support us for as little as $1 per month at Patreon. Cancel anytime.



Clips available to patrons include …

Full episode with available download

On whole gene sequencing

Origins of the study

First algorithm used in project

“Does GECCO require massive computational power?”

Why the algorithm is named GECCO

Why GECCO was developed

Collaboration with computational scientists

Future applications of GECCO

Use of short hairpin RNAs (shRNAs) vs. CRISPR

Present and future applications of CRISPR

Examining mutations in high-risk patients

About Cold Spring Harbor Laboratory

About Roswell Park Comprehensive Cancer Center



Mike Feigin: It’s this behind-the-scenes DNA that no one really thinks about, but that really controls when genes are turned on and turned off. And, no one really thinks about kind of the effects of mutations in those regions.

Doug Leigh: This is Parsing Science. The unpublished stories behind the world’s most compelling science, as told by the researchers themselves. I’m Doug Leigh…

Ryan Watkins: And I’m Ryan Watkins. The South African biologist and Nobel Prize recipient, Sydney Brenner, predicted that, “getting the sequence [of the human genome] [would] be the easy part, [as] only technical issues are involved. The hard part will be finding out what it means, because [it] poses intellectual problems [in understanding how] … genes participate in the functions of living cells.” Today, we’ll talk with Mike Feigin, from Roswell Park Comprehensive Cancer Center in Buffalo, New York, about his discovery of mutations in part of the human genome that most people have so far tended to ignore, but which regulates the expression of genes that drive the formation of deadly cancers.

Feigin: Hi, I’m Mike Feigin. I’m an assistant professor at Roswell Park Comprehensive Cancer Center. I’ve been interested in science my whole life. I don’t really know why. It’s just, I guess, one thing that I was good at in school, and I always enjoyed it. And, so one day in high school, I decided that I wanted to study pharmacology, and that’s what I did. So, I don’t know why that was my choice, but I’ve always been interested in diseases. I thought for a while I wanted to be an MD, but decided I liked research more, based on some research experiences I had. And, all the work I do is really trying to understand cancer, with the goal of trying to find new drug targets to hopefully, one day, help somebody out there.

Leigh: Even when diagnosed early, the five-year survival rate of pancreatic cancer is only 20%, and when it’s not, this figure drops to just 5%. Ryan and I started our conversation by asking Mike what makes this form of cancer so deadly.

Feigin: It’s lethal for a few reasons. One of the reasons is because it’s usually diagnosed at a very very late stage. The pancreas is deep inside you, it’s not one of these things you can kind of feel a tumor, or see it like you can with skin cancer, you can feel with breast cancer; and so, it’s hard to diagnose. And, when patients finally get diagnosed with pancreatic cancer, it’s generally highly advanced at that stage, because the tumor is very large, or it’s already metastasized, and has already started to impact global function of the body or some other organ that that’s critical. But, one of the other reasons it’s incredibly hard to treat is because, you know, when we think of a tumor I used just think of it as a ball of cells, but it’s not that at all. It’s that, tumor cells make up a certain percentage of the tumor, but then there are lots of other cells, like cells in the immune system, who might be pro-tumorigenic or anti-tumorigenic. So, in the pancreas, it turns out that the stroma — all this other stuff — can make up like 90% of the tumor. And, the tumors also don’t have a lot of blood vessels that go through them, and so getting a drug to the tumor cells through all that stuff is extremely difficult. So, even when you can treat it with chemotherapy, it’s very very hard to treat, and it comes back pretty quickly. And so, immunotherapies right now don’t work very well; they’re working well in a lot of other cancers, they don’t work well in pancreatic cancer. Because it turns out that the microenvironment of pancreatic cancer is extremely immunosuppressive, the immune cells that should be going in there to do their job can’t get access to the tumor cells, and so, it’s another reason why, you know, the current really great therapies are not working. But, there’s a lot of research going on to kind of try to make immunotherapy much better in the pancreas.

Watkins: While pancreatic cancer may not cause recognizable symptoms in its early stages, many patients experience substantial pain when it spreads beyond the pancreas itself. Doug and I were interested in learning why this is, as well as whether therapies to improve the management of this pain might improve in the future.

Feigin: There are nerves that can actually infiltrate a bunch of different cancers, including pancreatic cancer, and the tumor can actually start growing inside of these nerves, and so, that can cause pain. And, it turns out that there are lots of attractant signals between the tumor cells and the nerve cells. The nerve cells send out signals that kind of attract the tumor cells to them, and the tumor cells send out signals that attract the nerves in there, because the nerves are what’s helping to support the growth of the tumor cells. And so, because you have these nerves infiltrating the tumor, you can have lots of pain sensation with with pancreatic cancer, and a bunch of other cancer types. But, one of the other huge issues with pancreatic cancer — again why it’s so deadly — is that only about 10 to 20 percent of patients are actually eligible for surgery — because if the tumor has spread, it’s not eligible for surgery. And so, some of those things are just not available. So, my research has mainly dealt with the genetics of cancer, but a lot of what I do is also looking at cancer therapeutics using different models. And, it turns out that one of the drugs that we found that seems to have some activity against pancreatic cancer cells, works on some of these pain receptors. And so, there might be some kind of connection there between the pain receptor signaling and tumor cell growth directly.

Leigh: Permanent changes to the DNA sequence are called genetic mutations, and these can occur within the building blocks of nucleic acids, called nucleotides, or among amino acids which are responsible for forming the vast array of proteins in our metabolism. Ryan and I were curious to learn how such mutations can impact our health.

Feigin: Tumors are driven by genetic mutations. That’s what people have studied for a long time, some cell somewhere in the body — some people think it’s a stem cell, some people think it doesn’t have to be a stem cell — undergoes some kind of mutation, and then that confers a selective advantage to that cell, which then eventually builds up more mutations, kind of as it divides over time. And then, you can have these evolutionary trees that branch off as the cells kind of go and develop their own path, and some cells become more migratory, and may be the ones that go and invade other organs, and other ones might proliferate faster. This buildup of mutations is kind of what starts the process. For some cancers, like pancreatic cancer, we know that a gene called KRAS is mutated and almost every patient that has pancreatic cancer, has along a few other more minor genes. But, for other cancers there’s just large collections of genes that are mutated. So, we’re bombarded with, you know, things that cause mutations to our body all the time. And so, our body has these repair mechanisms that constantly repair our DNA, and make sure that it’s faithfully replicated for new cell division. But, there are tons of genetic changes that don’t get repaired and just have these mistakes that are made, but they have no impact on the amino acid that’s made — if it’s a mutation, that can change the nucleotide, but it doesn’t necessarily change the amino acid. Then, even if you have an amino acid that changes, it can be an amino acid that’s somewhere in the protein that really has no impact on function, whatsoever. So, I’d say only a small minority of mutations actually have any impact at all.

Watkins: The process by which the genetic makeup of a cell, called a genotype, gets interpreted into an observable trait, or phenotype, is known as “gene expression.” Given that Mike and his team identified mutations at sites related to the expression of neighboring genes, we asked him to describe how this process works, particularly when cells interact through the tissue between them in an area called the epithelium.

Feigin: Gene expression is just the amount of protein that is produced by a cell. So, you have your DNA which codes for RNA, the RNA is then turned into protein, and so this process of DNA being transcribed to make RNA, and then making a certain amount of protein. That’s what we call gene expression. Tumor cells can have drastically different gene expression profiles in terms of the specific genes that are turned on in a tumor cell or turned off in a tumor cell, and they can have different genes that are turned on and off, and then the levels of those genes can be drastically different between normal and tumor cells. And, you can also see these changes in gene expression between different normal cell types in the body. And so, gene expression, you know, just has to deal with the amount of mRNA that’s made in the amount of protein that, that’s available in the cell to do any kind of function. And, cell adhesion is the property that keeps cells attached to each other, which is critically important for epithelial cells that need to maintain their barrier function, and also keep them attached to the basement membrane, so they don’t just kind of go crawling off wherever they want. So, cells really need to stick to each other, and a lot of tumor cells kind of lose or alter their cell adhesive properties, so they can break out of the epithelium, and invade, and metastasize. So, a lot of what cancer cells do, is hijack normal cell biological processes for kind of these nefarious ends, I guess. And so, they use the normal things that are there. And so, there’s this normal pathway of axon guidance, which is how axons move through the body during development, to get to the places that axons need to get to. And, cancer cells can use these same pathways to invade and migrate through the micro environment.

Leigh: “Transcription” is the first step of gene expression. Through transcription, segments of DNA are copied into RNA using nucleotides as a complementary language. Mike likens this process to a Broadway production, in which genes and the proteins they encode are the actors, with the backstage crew as the non-coding, regulatory elements of the DNA.

Feigin: If the actors are the proteins that are kind of going around and doing the function, the stagehands are the non-coding elements that are telling them when to go on stage and come off stage, and how many actors should be on stage at any given time, and things like that. The non-coding elements are just pieces of DNA that determine how much of a specific protein is going to be made at any given time. This happens because lots of different proteins, called transcription factors, can bind to certain DNA elements that then turn on the transcription of the genes that are near them. This is really simplified because you can have lots of other DNA elements that are kind of far away from the genes that they regulate, but for our paper we were really focused on the ones that were nearby. So, kind of the way we thought of it was it’s this behind-the-scenes DNA that no one really thinks about, but there really controls when genes are turned on and turned off, and no one really thinks about kind of the effects of mutations in those regions, and what happens if, you know, like a stagehand breaks their arm and can’t go on that day or something: how does that affect the ability of the actors to get on stage and do what they’re supposed to do properly? So, essentially, the difference between the coding and the non-coding space is that the coding region of the DNA are the specific regions of the DNA that actually make up the proteins — the axons, the things that make up the proteins that we know of. And, that only makes up 2% of the genome. The non-coding space are the other 90% of the genome that we really don’t know exactly most of what its function is. There’s lots of repetitive elements, but a lot of what it does is kind of control the motion of when genes are turned on and turned off.

Watkins: Since the 1970s, some scientists have speculated that non-coding genes serve no genetic function, and so have dubbed them “junk DNA.” In fact, though the purpose of many these non-coding elements has yet to be examined at all. So, Ryan and I wondered what led Mike and his team to explore these non-coding regions of the genome, as well as what is known about their potential roles in gene expression. We’ll hear what Mike had to say about this question after this short break.

Ad: We Share Science

Watkins: When we left off, Mike was about to explain why he and his team chose to examine so-called “junk DNA,” as well as what’s known about their roles in gene expression.

Feigin: When I was putting this paper together and giving a presentation on this, I was looking at my cell biology textbook from 15 years ago and it still had the statement, from some pretty impressive biologists, that most of the DNA in our genome is a junk, and we don’t know what it does. But, what we think is that these non-coding mutations are things that help the cancer along, but are probably not the thing that are starting the cancer down its path. And so, because DNA is not just a long string — it’s condensed very very tightly and wrapped tightly — lots of people are trying to understand how this folding happens on lots of different levels; and because that wrapping of DNA is not random. It turns out that you have these promoter elements that’s generally considered to be hundreds to a couple of thousand base pairs, very very close to a gene. So, if you are within, you know, two thousand base pairs from gene, we consider that to be about the size of a promoter element. And, that’s where, you know, a lot of these transcription factors are binding, and a lot of the other things that are turning on and turning off genes are. But, DNA can also make these huge loops, and you can have these enhancer elements, that are thousands of base pairs away, that are kind of brought in close proximity because of the way that DNA loops. We focused on those promoter regions, but other people have focused on the enhancer regions. There are also other DNA elements, called insulators, which insulate one region of the genome from another, even if they’re very very close, and they can kind of stop interactions between two regions that might be close by each other.

Leigh: Promoters are regions of DNA that initiate the chemical bond between nearby genes. They help ensure that those genes are expressed correctly via proteins, known as “transcription factors,” that control the rate at which genetic information from DNA is copied to RNA. Here, Mike describes how he and his team sequenced the entire genome from over 300 patients with pancreatic cancer, in order to identify promoters which may be related to mutations in the non-coding components of DNA.

Feigin: So, one of the things that I thought about was: “well, how genes turned on?” And that you have these promoter regions, and these enhancers that control when genes are turned on and off, and I said: “well, maybe there are mutations in these promoter elements that are going to disrupt transcription factor binding, or things like that, and that’s going to alter the transcription, and thus the regulation of some gene.” And so, at the time, there had been two papers published in Science, back to back, showing that, in melanoma, there is a collection of two non-coding mutations in the promoter of a gene called TERT, which is incredibly important for melanoma and a few other cancers. And so, these non-coding mutations create a new transcription factor binding site which gives the ability of this TERT gene to be more highly active, and this promotes tumorigenesis. And so, I said: “huh, you know, I wonder if these non-coding mutations are regulating lots of other genes besides TERT. How can we go about and think about this?” And so, I started thinking about this, and looking at a bunch of different algorithms that were out there that people have used to look at non-coding mutations in different contexts, and a few papers had come out right around that time showing that if you group all cancers together and look for these non-coding mutations, you find TERT over and over again, and you find a couple of other things that kind of pop out. And so, on a large scale, we knew that TERT was important or the TERT promoter was important, and a few other promoters were important, but no one had ever done an in-depth look at one cancer type; because you really needed lots and lots of patients to be able to kind of look in-depth at one cancer type; because at the time there weren’t a lot of tumors that had lots of whole genome sequence data available, which is really what you need to do this, because you need the whole genome sequence to look at the promoters and the any other non-coding space.

Watkins: To analyze genetic commonalities among these patients Mike and his team developed computational tools to identify and prioritize mutations. the algorithm they programmed, called GECCO, looks through the genome to identify mutations in the non-coding, regulatory elements of genes, as Mike explains next.

Feigin: We eventually developed this algorithm or pipeline, called GECCO. All it does is it looks through your whole genome sequence data, for a huge collection of patients, and then, it tries to prioritize non-coding mutations that might be really important for driving gene expression. And so, we do this essentially by looking for the mutations that are in that promoter region — we want to find those mutations. And, it then looks for correlations between mutations in those regions and changes in expression of the gene that they’re linked to. And then, once it finds mutations that are correlated with change in the gene expression, doing all kinds of statistics to make sure we’re generating false discovery rates and things like that, and then, doing pathway analysis to try to understand the patterns of these things across the genome. No one had ever looked at our non-coding mutations happening near certain subsets of genes. Are they happening near the genes that we know are mutated in the coding regions? No one really know anything about this, and so we wanted to address all those issues. And the second thing was, instead of looking at an individual gene, we wanted to look at groups of transcription factor binding sites. We knew the transcription factor binding sites. We had about 121 of these transcription factor binding site classes, we knew which ones were repressive elements and which ones were elements that are most associated with increases in gene transcription, and we wanted to see if we found increased mutation rates in any specific regulatory site in any transcription factor binding sites. And so, we did that to look for evidence that any of these mutations were actually being selected for in cancer. And then, we used also information due pathway analysis to do patient survival analysis, and try to understand if any of this could tell us something about prognostic information that might help cancer patients someday.

Leigh: The genetic data that Mike and his team used to examine these mutations were obtained from the Pancreatic Cancer Genome Project, which is coordinated by the International Cancer Genome Consortium, or ICGC. We asked Mike to explain how researchers make use of such atlases, as well as how he and his team engaged with international collaborators to check that the cells which he and his team studied derived from typical cells, called somatic cells, rather than those that are specialized for reproduction.

Feigin: What each of these have are huge collections of tumor mutational data, sequencing data, other kinds of data on patient outcomes, about all different tumor types collected at many different locations, all put in a central repository, and this data is then released to the public at large. For ICGC, it was relatively straightforward to get access to the RNA sequencing, or gene expression data, and the whole genome sequencing data, with just a few emails. And, it’s great because it’s a resource we’d never have access to, and people who generated it can’t possibly do every analysis under the sun. And so, they share the data with people who were interested in hundreds of different things, and then we can learn so much more about this instead of if it was just kept under one place. So, what we did really early on, was talked to two of the people who were in charge of the pancreatic cancer projects within the ICGC, one in Australia and one Canada, Andrew Biankin and Lincoln Stein, and got them on board with the project, told them what we were doing, and they really helped us understand the data, and make sure that we were analyzing data from the different projects together, just to make sure that if things were kind of processed separately. We were not seeing changes due to differential processing between the samples of two different groups. One of the things we wanted to make sure is that these mutations, that we were seeing, weren’t germline mutations — we wanted somatic mutations. The germline mutation data is not something that they would release, and so, we sent our data to one of the collaborators and said: “here all our mutations. Can you tell us if they are somatic or germline?” And so, she looked through and told us, you know, what was germline and what wasn’t.

Watkins: Mike and his team not only verified that mutations and regulatory regions exist, but also found patterns of mutations in non-coding regions which appear to be expressed during the progression of tumors. Doug and I asked Mike to provide more details on the specifics of these findings.

Feigin: What we found in our study was — when we looked at the collection of genes that were near these mutations, and we did pathway analysis — we found some pathways that were known to be associated with pancreatic cancer, which I think is really interesting because we already knew this cell has a way of messing with certain pathways by creating mutations in genes that control those pathways. But now, we find out that the cell has another mechanism of altering those same pathways. And, one of the things that we’re really interested in looking at is, do our non-coding mutations happen in a specific pathway in the presence or absence of coding mutations in those same pathways — and, we haven’t even begun to look at that yet. And so, I think this just tells us, you know, another mechanism that the cell can use to alter these pathways. And, one of the first things we looked at was, when we found all these mutations, we said: “well, let’s go to all of the known coding mutations in pancreatic cancer, and look at those genes, and let’s see if we can find any non-coding mutations near those genes.” And, we almost never do. So, there are no non-coding mutations in your KRAS or any of the other top, I think it was 24 to the top 26 genes, we never saw a non-coding mutation near them. And so, the cell I think is really using this different mechanism for turning on those pathways, because I think, you know, the cancer wants as many options as it can get, essentially; so, there’s that. Then, there’s also these new pathways that we found that maybe just mutating genes is not a great way to turn on every pathway. We know that changing gene expression is incredibly important, because this changes the expression of the gene, which has certain activity. And, when you mutate a gene, you generally mutate the function of that gene. And so, these are two kinds of very different things, expression and function. And so, we think that it’s a possibility that some of these new pathways that we found, might be more readily activated by changes in the expression of genes, and not necessarily change in function of those genes.

Leigh: Mike and his team also validated these mutations in the lab experimentally. In doing so, they identified prognostic markers in the promoter region of DNA which might suggest the outcomes that are likely when a patient is diagnosed with cancer. So, Ryan and I asked Mike to discuss what this suggests about the role of mutation in gene expression.

Feigin: Essentially, what GECCO did, was give us, from these millions of possible mutations, a set of high confidence calls that we thought might really be impacting gene expression. But again, these are all correlations, and so we wanted to actually test this directly. And so, one of the standards in the field is doing a luciferase assay. Luciferase is an enzyme that glows. And so, what you do is, you take the gene that encodes luciferase, and right in front of it, you put a sequence of nucleotides that is your promoter. So, what I did is I cloned out the promoters of all of these regions that had the mutations, and either put the wild-type sequence without the mutation, or made the mutation and put the mutated sequence in front of luciferase. And so, either had the wild-type sequence in front of luciferase, or the mutant sequence in front of luciferase. And, the amount of luciferase that’s produced is dependent on the strength of the promoter. So, imagine if you make a mutation that disrupts the binding of a transcription factor to this promoter, you won’t get as much transcription, and you won’t have as much luciferase signal being produced, and so the signal will be lower. And so, what I did is I took all those constructs, put them into of of different cell lines, to see what impact the mutations would have on transcription of luciferase. And, for many of them, almost half the ones that we tested, that we predicted to decrease expression when you have the mutation, we in fact did see a decrease in expression of luciferase, when we made that mutation. And so, that was the best biological evidence that we could show, that these mutations were actually causing a decrease in gene expression. it’s not just a correlation anymore, it actually was proven experimentally, and it was really great, because a lot of the previous algorithms that people had done, when they tried to validate from luciferase, sometimes it wouldn’t validate, sometimes you’d see an extremely small change in the luciferase values, sometimes you would see an increase when you expected to see a decrease. But, in all cases, we saw a decrease where we expected to see a decrease. And, the only time we saw an increase in luciferase upon a mutation, was the one time that we expected to see an increase in expression. This really told us that we were looking at something real and not just some kind of correlation between the mutations and gene expression.

Watkins: Lastly, we asked Mike what he hopes that his study will contribute to cancer research in the future.

Feigin: One of the things that I think is kind of tied to my research now is the importance of gene expression in cancer. I think it’s incredibly important, but I think if you’ve talked to anybody about, you know, what’s important in cancer, they say “mutations mutations mutations.” That’s all anyone cares about — and it’s true. You know, without mutations you’re not going to have cancer, probably. I mean there are some cancers that are caused by translocations, but fine, those are like mutations. But, when you think about things like breast cancer, we characterize breast cancer based on the expression of estrogen receptor, and the expression of progesterone receptors. And, the drugs that we give these patients, are against these proteins, not against the mutant form of these things, but against these genes that are critically important. So, I think that gene expression is much more important than anyone really ever talks about. I know there’s lots of research looking at gene expression, but I think, just the field as a whole, they care about mutations, and that’s it. The other thing, which I’m always talking about, is G protein-coupled receptors …

Watkins: Just a quick interruption to explain that G protein-coupled receptors detect molecules outside of cells, and allow the transmission of a wide variety of signals over long distances in the body.

Feigin: … which I think are incredibly important for cancer, despite the fact that there’s almost no evidence that drug against them is effective in cancer yet. So, I have this belief that we can find the right GPCR to target in cancer, and that’s really what half of the lab works on. Half the lab works on gene expression, half the lab works on GPCRs — based on my, hopefully, not irrational beliefs in those things.

Leigh: That was Mike Feigin discussing the article, “recurrent non-coding regulatory mutations in pancreatic ductal adenocarcinoma,” which he published with David Tuveson and thirteen other researchers on May 8, 2017 in the journal Nature Genetics. You’ll find a link to their paper at: https://www.parsingscience.org/2018/10/16/mike-feigin/, along with bonus content and other material that he discussed during the episode.

Leigh: Next time on Parsing Science, we’ll be joined by Jean-Francois Gauvin, from Laval University in Quebec City. He’ll talk with us about a set of toys developed to teach quantum mechanics in the 1960s which arrived, without explanation, on his desk at Harvard University’s Collection of Historical Scientific Instruments in 2014.

Jean-Francois Gauvin: That thing was so strange. Those toys and those cubes were so strange, were so different than anything else we had in the connection that I just said, “I have to understand how they work.”

Leigh: We hope that you’ll join us again.