The Mathematics of DNA

Imagine that someone gives you a mystery novel with an entire page ripped out.

And let’s suppose someone else comes up with a computer program that reconstructs the missing page, by assembling sentences and paragraphs lifted from other places in the book.

Imagine that this computer program does such a beautiful job that most people can’t tell the page was ever missing.

DNA does that.

In the 1940’s, the eminent scientist Barbara McClintock damaged parts of the DNA in corn maize. To her amazement,

the plants could reconstruct the damaged section. They did so by copying other parts of the DNA strand, then pasting them into the damaged area.

This discovery was so radical at the time, hardly anyone believed her reports. (40 years later she won the Nobel Prize for this work.)

And we still wonder: How does a tiny cell possibly know how to do…. that???

A French HIV researcher and computer scientist has now found part of the answer. Hint: The instructions in DNA are not only linguistic, they’re beautifully mathematical. There is an Evolutionary Matrix that governs the structure of DNA.

Computers use something called a “checksum” to detect data errors. It turns out DNA uses checksums too. But DNA’s checksum is not only able to detect missing data; sometimes it can even calculate what’s missing. Here’s how it works.

In English, the letter E appears 12.7% of the time. The letter Z appears 0.7% of the time. The other letters fall somewhere in between. So it’s possible to detect data errors in English just by counting letters.

In DNA, some letters also appear a lot more often (like E in English) and some much less often. But… unlike English, how often each letters appears in DNA is controlled by an exact mathematical formula that is hidden within the genetic code table.

When cells replicate, they count the total number of letters in the DNA strand of the daughter cell. If the letter counts don’t match certain exact ratios, the cell knows that an error has been made. So it abandons the operation and kills the new cell.

Failure of this checksum mechanism causes birth defects and cancer.

Dr. Jean-Claude Perez started counting letters in DNA. He discovered that these ratios are highly mathematical and based on “Phi”, the Golden Ratio 1.618. This is a very special number, sort of like Pi. Perez’ discovery was published in the scientific journal Interdisciplinary Sciences / Computational Life Sciences in September 2010.

Before I tell you about it, allow me to explain just a little bit about the genetic code.

DNA has four symbols, T, C, A and G. These symbols are grouped into letters made from combinations of 3 symbols, called triplets. There are 4x4x4=64 possible combinations.

So the genetic alphabet has 64 letters. The 64 letters are used to write the instructions that make amino acids and proteins.

Perez somehow figured out that if he arranged the letters in DNA according to a T-C-A-G table, an interesting pattern appeared when he counted the letters.

He divided the table in half as you see below. He took single stranded DNA of the human genome, which has 1 billion triplets. He counted the population of each triplet in the DNA and put the total in each slot:



When he added up the letters, the ratio of total white letters to black letters was 1:1. And this turned out to not just be roughly true. It was exactly true, to better than one part in one thousand, i.e. 1.000:1.000.

Then Perez divided the table this way:



Perez discovered that the ratio of white letters to black letters is exactly 0.690983, which is (3-Phi)/2. Phi is the number 1.618, the “Golden Ratio.”

He also discovered the exact same ratio, 0.690983, when he divided the table the following two alternative ways:





Again, the total number of white letters divided by the total number of black letters is 0.6909, to a precision of better than one part in 1,000.

Perez discovered two more symmetries:

Above: Total ratio of white:black letters = 1:1

Again, total ratio of white:black letters = 1:1

So for three ways of dividing the table, the ratio of white to black is 1.000:1.000.

And for the other three ways of dividing it, the ratio is 0.690983 or (3-Phi)/2.

When you overlay these 6 symmetries on top of each other, you get a set of mathematical stairs with 32 golden steps. Then an absolutely fascinating geometrical pattern emerges: The “Dragon Curve” which is well known in fractal geometry. Here it is, labeled with DNA letters in descending frequency:

You can see other non-DNA, computer generated versions of this same curve here.

Other interesting facts:

Similar patterns with variations on these same rules are seen across a range of 20 different species. From the AIDS virus to bacteria, primates and humans

Each character in DNA occurs a precise number of times, and each has a twin. TTT and AAA are twins and appear the most often; they’re the DNA equivalent of the letter E.

This pattern creates a stair step of 32 frequencies, a specific frequency for each pair.

The number of triplets that begin with a T is precisely the same as the number of triplets that begin with A (to within 0.1%).

The number of triplets that begin with a C is precisely the same as the number of triplets that begin with G.

The genetic code table is fractal – the same pattern repeats itself at every level. The micro scale controls conversion of triplets to amino acids, and it’s in every biology book. The macro scale, newly discovered by Dr. Perez, checks the integrity of the entire organism.

– the same pattern repeats itself at every level. The micro scale controls conversion of triplets to amino acids, and it’s in every biology book. The macro scale, newly discovered by Dr. Perez, checks the integrity of the entire organism. Perez is also discovering additional patterns within the pattern.

I am only giving you the tip of the iceberg. There are other rules and layers of detail that I’m omitting for simplicity. Perez presses forward with his research; more papers are in the works, and if you’re able to read French, I recommend his book “Codex Biogenesis” and his French website. Here is an English translation.



(By the way, he found some of his most interesting data in what used to be called “Junk DNA.” It turns out to not be junk at all.)

OK, so what does all this mean?

Copying errors cannot be the source of evolutionary progress, because if that were true, eventually all the letters would be equally probable.

This proves that useful evolutionary mutations are not random. Instead, they are controlled by a precise Evolutionary Matrix to within 0.1%

When organisms exchange DNA with each other through Horizontal Gene Transfer, the end result still obeys specific mathematical patterns

DNA is able to re-create destroyed data by computing checksums in reverse – like calculating the missing contents of a page ripped out of a novel.

No man-made language has this kind of precise mathematical structure. DNA is a tightly woven, highly efficient language that follows extremely specific rules. Its alphabet, grammar and overall structure are ordered by a beautiful set of mathematical functions.

More interesting factoids:

The most common pair of letters (TTT and AAA) appears exactly 1/13X as often as all the letters combined – consistently, the genomes of humans and chimpanzees.

If you put the 32 most common triplets in Group 1 and the 32 least common triplets in Group 2, the ratio of letters in Group1:Group2 is exactly 2:1. And since triplet counts occur in symmetrical pairs (TTT-AAA, TAT-ATA, etc), you can group them into four groups of 16.

When you put those four triplet populations on a graph, you get the peace symbol:

Does this precise set of rules and symmetries appear random or accidental to you?

My friend, this is how it is possible for DNA to be a code that is self-repairing, self-correcting, self-re-writing and self-evolving. It reveals a level of engineering and sophistication that human engineers could only dream of. Most of all, it’s elegant.

Cancer has sometimes been described as “evolution run amok.” Dr. Perez has noted interesting distortions of this matrix in cancer cells. I strongly suspect that new breakthroughs in cancer research are hidden in this matrix.

I submit to you that the most productive research that can possibly be conducted in medicine and computer science is intensive study of the DNA Evolution Matrix. Like I said, this is just the tip of the iceberg.

There is so much more here to discover!

When we develop computer languages based on DNA language, they will be capable of extreme data compression, error correction, and yes, self-evolution. Imagine: Computer programs that add features and improve with time. All by themselves.

What would that be like?

Perry Marshall

P.S.: Dr. Perez and I are friends. Perez worked on HIV research with the man who originally discovered HIV, Luc Montagnier. Perez also worked in biomathematics and Artificial Intelligence at IBM. I’m familiar with this work because last spring I had the privilege of helping him translate his groundbreaking research paper about this into English.

You can read it here: “Codon Populations in Single-stranded Whole Human Genome DNA Are Fractal and Fine-tuned by the Golden Ratio 1.618”

Click here for a more in-depth PDF version of this report.

Where Did Life And The Genetic Code Come From? Can The Answer Build Superior AI? The #1 Mystery In Science Now Has A $10 Million Prize. Learn More About It, Here – https://www.herox.com/evolution2.0