I recently read “The Gene: An Intimate History” by Siddhartha Mukherjee. It’s a 600-page deep dive into genetics. I hadn’t looked at genetics since high school biology, so I had a lot of questions as I read. Anything that wasn’t answered by the book, I answered for myself by digging around online. Here are my questions and their answers:

What is a gene?

You can think of your DNA as a huge string of 3 billion base pairs (bps). There are stretches of DNA that encode a specific function, usually by describing how to create one or more proteins. That stretch is called a gene. Genes usually start and end with specific sequences of base pairs that signal the start and end of the gene. There are huge sections of “junk” DNA between genes, and smaller sections of “junk” DNA between functional parts of a gene.

What is a protein?

Proteins are the main mechanism by which DNA encodes behavior. A gene can encode how to make a protein. Each set of 3 base pairs encodes one amino acid. A chain of amino acids makes a protein. Proteins are molecules that perform all kinds of functions in your body. Insulin is a protein that allows cells to absorb glucose from the blood. An antibody is a protein that binds to viruses and bacteria. Some proteins act as signals from one cell to another.

Actin and myosin are proteins that form the stretchy structure of muscles. This is why you will see such a high protein content on the nutritional labels for meat. Your body breaks down the structure of those muscles into its component amino acids. Your body can then repurpose those amino acids into new proteins, allowing your cells to create new structures and perform all sorts of other functions.

What is a chromosome?

Your 3 billion base pairs are not in one unbroken strand. You have 46 separate molecules of DNA in each of your cells. These molecules are called chromosomes. Sex cells (sperm and eggs) only contain 23 chromosomes. 46 is our “diploid” number of chromosomes. 23 is our “haploid” number of chromosomes.

23 homologous pairs of chromosomes (https://en.wikipedia.org/wiki/Chromosome)

Meiosis is the process during which your diploid cells produce haploid sex cells. During meiosis, each chromosome pairs up with another, then one from each pair ends up in a different sex cell. These pairs are called “homologous” from greek words meaning “same” (homo) and “proportion” (logos). Each sex cell contains one chromosome from each homologous pair. Each diploid cell contains a full set of 46 chromosomes.

Are homologous chromosomes connected?

No. They are often depicted next to each other in diagrams, but they do not connect to each other. They only align with each other during meiosis, one on each side of the cell, so that each ends up in a different daughter cell.

Where do chromosomes get their name?

Chromosome comes from the greek words for “color” and “body”. They were first studied by staining them with dyes under a microscope.

Each homologous pair is numbered according to its size. Chromosome 1 is the largest. Chromosome 22 is the smallest. Additionally, there are chromosomes X and Y which are the only homologous pair to have different sizes

Do the X and Y chromosomes contain the same genes?

No. Chromosome X contains about 800 genes and Chromosome Y contains about 50 genes. Females have two X chromosomes. Males one X and one Y chromosome. The Y chromosome encode instructions related to “maleness”. Since it contains so little DNA, its genes probably trigger genes on other chromosomes to perform the actual work.

Where did chromosomes X and Y get their names?

It’s NOT because chromosome X looks like an X and chromosome Y looks like a Y. Since Y is much smaller than X, we at first didn’t notice the Y chromosome, so we though X was sometimes unpaired. We called it the “X element” because it was different from all the other chromosomes. Eventually we discovered its partner and called it “Y” because that comes next in the alphabet.

Which DNA comes from which parent?

The two chromosomes in a homologous pair contain the same set of genes. That is, you have two copies of each piece of your DNA. You inherit one member of each pair from each parent.

Each of the two chromosomes may contain slightly different variations (alleles) of those genes. For example, one copy of the gene that controls eyelash length may contain the “short eyelash” allele and the other may contain the “long eyelash” allele. Since the “long eyelash” allele is dominant, you will have long eyelashes.

When a parent produces a sex cell (sperm or egg cell) during meiosis, one chromosome from each pair is randomly chosen. Each sex cell only contains 23 chromosomes instead of the usual 46. When two sex cells (one egg and one sperm) merge to form a zygote, the zygote now has 23 homologous pairs of chromosomes, and each pair contains one chromosome from each parent.

How does Medel’s punnet square relate to DNA?

The yellow allele is dominant over the green allele.

You inherit two copies of every chromosome, one from your father and one from your mother. That means you have two copies of every gene. Those two copies may be slightly different, which is another way of saying you have two alleles (there may be more than two possible alleles in the population)

One allele may be “dominant”, meaning you only need one copy for it to take effect. An allele is “recessive” if you need two copies to see its effect. This most often happens because the dominant form of the gene encodes the production of a protein and the recessive form does not. All you need is one dominant form of the gene to produce the protein.

Alleles may also be “co-dominant”, where each has an effect on your phenotype.

The gene for blood type has at least 3 alleles: A, B and O. A and B are co-dominant. O is recessive.

If you have at least one “A” allele, you have “A” in your blood type. If you have at least one “B” allele, you have “B” in your blood type. If you have two “O” alleles, you have blood type “O”.

Additionally, the presence or absence of a particular protein appends a “positive” or “negative” to your blood type.

Traits can also be polygenic, meaning multiple genes contribute to the phenotype. This helps explain how continuous traits like height can be influenced by genes. If you have 5 genes, each of which have 2 alleles, that gives you 2⁵, or 32 possible combinations. If you include environmental factors and chance, you get something close to a continuous curve of possible heights.

Are genes on the same chromosome always inherited together?

No. There is an extra step in meiosis called “crossing over”. For each pair of chromosomes, pick a random point. Cut the chromosomes at that point and swap the pieces.

Genes that are physically closer to each other are less likely to be separated, and are inherited together more often. Genes at opposite ends of a chromosome, or on different chromosomes entirely, are not correlated at all. You can determine the order of genes on a chromosome by measuring how often they are inherited together in the population.

Why are some genetic diseases more common in either men or women?

Since men and women inherit X and Y chromosomes asymmetrically, there is asymmetric behavior in traits controlled by genes on those chromosomes.

Females can inherit one recessive allele without presenting the trait

Since men have only one X chromosome, it is easier for them to inherit a recessive trait carried on the X chromosome. Women need to inherit two recessive alleles in order to show the recessive trait.

If the allele is dominant, daughters are more likely to inherit it from their fathers than sons are from their fathers. Since the father’s X chromosome is only inherited by his daughters, 100% of his daughters will have the dominant allele. But the father’s Y chromosome is only inherited by his sons, so the sons’ fate is determined by the allele they inherited from their mother.

If every cell contains all your DNA, why do you have different types of cells?

You have many different types of cells in your body (skin cells, white/red blood cells, fat cells, synapses, etc.) In fully specialized cells, most genes are suppressed. Only 10–20% of genes are active in a given cell.

Some proteins have the effect of making certain genes unaccessible. This means those genes cannot produce the proteins they encode, and the genes cannot affect that cell.

Suppression of genes can actually be inherited through a phenomenon called epigenetics. DNA accumulates annotations based on environmental factors, and those annotations can be passed on to other cells as it splits, or even to sex cells and the next generation.

What are the physical form of epigenetic markers?

DNA can accumulate methyl tags and histones, which are molecular groups that attach to DNA and make certain regions inaccessible, thereby suppressing the gene. When DNA polymerase copies DNA, it also copies these annotations.

Eventually, some of your cells will produce sex cells. Those sex cells still contain these epigenetic markers. When two sex cells combine and form a zygote, most markers are erased, allowing the zygote to produce stem cells and start the specialization process over again.

However, around 1% of these epigenetic markers are not erased. They can be inherited. One example of inheritable epigenetics comes from rats. Rat mothers who lick their cubs trigger a chemical process that attaches certain epigenetic markers to the cubs’ DNA. These markers allow the cub to calm down more easily by suppressing genes related to fear. When these cubs grow up, they pass on the markers to their own offspring. There’s a good Radiolab episode about it.

How do the first two cells of an embryo start specializing?

A zygote (the first cell of an organism) is the ancestor of all other cells of an organism. You could say it has the potential to become any type of cell in the body. When it splits, one cell is going to become the top half of the body and the other cell is going to become the bottom half. But how does one know to become the top and the other know to become the bottom? There’s nothing different between them!

The answer lies in the the environment of the zygote. The mother provides chemical differentials to give orientation to the embryo. The top of the womb contains more of a certain chemical then the bottom does, so when the zygote splits, one side of the cell contains more of a certain protein than the other side. That protein controls which set of behaviors will evolve from that cell. So the cells start becoming specialized.

If radiation mutates one of my cells, will the mutation spread to the rest of my cells? Will I pass on the mutation?

The mutation is passed on to all descendants of the affected cell. So in order to affect many of your adult cells, it needs to happen early on in your development as an embryo. If the mutation occurs when the embryo is 2-celled, it will end up in about half of the cells of the adult organism. This can result in “genetic mosaicism”, the term for multiple genomes in the same organism.

To be passed on to you children, the mutation has to affect the cells responsible for producing your sex cells.

If the mutation affects behavior around cell growth or programmed cell death, it may start reproducing uncontrollably, resulting in a tumor. This is the general idea behind cancer.

How do we sequence DNA?

Sequencing DNA means determining the sequence of base pairs that make it up. If we know all 3 billion base pairs of the human genome, we can better understand and manipulate human genetics.

We have a method of sequencing stretches of DNA that are about 500 base pairs long. So we usually take a sample of DNA and chop it into pieces of about that length. We chop multiple copies of the DNA into overlapping pieces so that we can statistically guess how the pieces fit together.

Here are the step in sequencing a small strand of DNA (e.g. “GATTACA”):

Replicate it many times using DNA polymerase (the same way your cells replicate DNA. Divide the DNA into four batches In each patch, perform the replication again, but with one of four chemicals present which stop the replication process at one of four bases. Batch A now contains copies of “GA”, “GATTA” and “GATTACA”. Batch T contains copies of “GAT” and “GATT”. Batch C contains only “GATTAC”. Batch G contains only “G”. Add a different colored fluorescent dye to each batch Mix all four batches together in a long tube Apply an electric field to the mixture (electrophoresis). All the pieces of DNA will move toward a color sensor, but longer pieces will move more slowly than shorter pieces. Pieces will group together by size. So all the “G” strands will pass the sensor first. Then all the “GA” strands, then “GAT”, etc. Measure the color of each group as it passes the sensor. Each color corresponds to the next base in the sequence.

The price of sequencing DNA has dropped drastically over time. The first human genome to be sequenced cost over a billion dollars. In 2017, you can have your DNA sequenced for a couple hundred dollars. The price has fallen even faster than the price of computer chips (Computer chips have famously fallen in price 50% every year according to Moore’s law).

When we sequenced the human genome, whose DNA was it?

The DNA came from multiple anonymous donors. There were 5–10 times as many donors are were needed, so not even the donors know whose DNA was sequenced.

How did we use genetics to estimate the path of human migration?

We’ve used genetics to determine that humans originated in Africa and spread to Europe and Asia and eventually to the Americas.

Older populations have more genetic diversity than younger populations. Consider a small group of humans that migrated away from Africa. Eventually that group would grow to populate Europe, but they would all trace their ancestry back to that small group, so they would have less genetic diversity than the original population in Africa.

We didn’t use our normal DNA to measure this genetic diversity. We used mitochondrial DNA. This DNA is not recombined. It is inherited entirely from the mother through the egg cell. Therefore, the only changes in a population’s mitochondrial DNA happen due to random mutations. We can estimate the rate at which these mutations happen, and measure how many variants exist in a given population.

Using this technique, we’ve estimated that humans migrated out of Africa around 2 million years ago.

How does cloning work

It’s surprisingly simple. To clone an organism:

Take a somatic (specialized, non-sex cell) from the organism to be cloned Take an egg cell from a donor Remove and discard the nucleus from the donor egg Extract the nucleus from the somatic cell and insert it into the egg cell Wait several hours to let the DNA shed it’s epigenetic annotations, becoming unspecialized Add chemicals that are normally released when sperm fuses with egg, triggering cell division Implant the egg in a surrogate mother Grow the embryo and deliver the baby like normal. The offspring will have the same genome as the original organism to be cloned

I found a really great interactive cloning tutorial that helped me internalize the process.

What is gene cloning?

The goal of gene cloning is to extract a particular gene from an organism and replicate it millions of times. The steps are:

Extract some DNA from the organism. This contains the entire genome of the organism. Mix DNA with restriction enzymes. These enzymes come from bacteria, which use them to defend themselves against viruses. In bacteria, the enzymes look for specific patterns in DNA at cut it apart wherever that pattern occurs. Restriction enzymes cut up DNA into its component genes by looking for particular sequences that mark the boundaries of a gene. Mix bacterial plasmids (small rings of DNA) with the same restriction enzyme. The plasmid rings are cut open. Combine the cut-up genes and the cut-up plasmids in a test tube. Since they were cut with the same enzyme, they can join together, forming recombinant plasmids. Put recombinant plasmids into bacteria using heat or electric shocks to open small holes in the bacteria cell walls. Allow recombinant bacteria to grow into colonies. Detect which colonies contain the target gene by looking for the presence of the protein it encodes, or for genetic markers which are known to be near the target gene. Let that bacterial colony replicate until you have millions of copies of the target gene

What is CRISPR?

CRISPR is a recent advancement in gene editing. It uses the same principle from gene cloning, where enzymes cut specific sequence of DNA. But these enzymes are “programmable”, meaning you can attach a sequence of DNA to them and they will search out and cut that specific sequence of DNA.

Once that sequence is cut, any sequence of DNA that fits into the hole and is lying around will be merged into the DNA by natural DNA-repairing processes. So to modify a specific gene in a target organism:

Attach the target version of the gene to CRISPR proteins Deliver the protein into target cells, either via a virus or by opening holes in the cell walls. Flush the area with the desired version of the gene. CRISPR proteins remove the target gene, and the cells repairs their DNA using the desired version of the gene.

There is a radiolab episode about CRISPR.