For over a decade now the amount (data size) and variety of genomic data available has been increasing exponentially. This growth was met with advancements made in the technologies involved in the processes of decoding, analysing and storing the genomic data.

Completion of the Human Genome Project designated for humankind a new era in the genetic revolution. In the years that followed its completion it provided scientific guidance to the structure, function and organisation of the human genetic makeup.

With great knowledge comes great power and with great power comes great responsibility. In the age of big data, knowledge does mean power. When it comes to genetic data, preferably accompanied with medical information, this data is worth hundreds of millions of dollars. So far, this data was gathered and stored in a centralized manner: 1. Data was kept on centralized servers. 2. Data was usually centralized at commercial entities making use of this data for R&D purposes. It appears that now Blockchain is challenging these old centralization paradigms. As the worlds of blockchain tech and genetics join forces, data is now being distributed on decentralized ledgers leading eventually to genetic data being handled and managed by the end-users rather than by commercial parties.

To understand this shift we first need to clarify some basic concepts.

Blockchain

In a very simplified manner, a Blockchain is a spreadsheet that is duplicated across a network of computers wherein spreadsheet is regularly updated. In a more technological definition the blockchain (referred to as distributed ledger technology (DLT)) is a digitized, decentralized, ledger that can be public or private and holds records of transactions. The blockchain keeps growing as new ‘completed’ blocks (the recent transactions) are recorded and added to it in chronological order. The blocks that are added are linked in a manner similar to a chain and hence the name Blockchain. Each new block being added contains a time stamp and data on the transaction. Blockchains could be private and handled by a company or organization with clear rules as to who within the organization has the authority to add new blocks or could also be public with access permission for all users.

Genome

The entire genetic information of an organism is usually known as the genome (usually it also contains for example mitochondrial genetic data). The genetic material building blocks that constitute genomes are called nucleotides (A, C, G, and T for DNA genomes) that make up all the chromosomes of an individual or a species. In almost all human cells, for example, the number of these blocks reach 6X109 and contains the instructions required to make the full range of the organism’s cell types, tissues and the organism in its entirety.

Genome Sequencing

This is the process used to determine the DNA sequence of an organism’s genomic data. When the entire genome is sequenced it is called full or whole genome sequencing (FGS and WGS respectively). Partial genome sequencing which focuses only on the regions of the genome that code for proteins (genes) is called Full Exome sequencing. Genomic data has revolutionised the fields of medicine and of genetic research and as costs of this process are dropping it is expected that significant proportion of the population will have their genome sequenced.

How do blockchain and genetics connect?

First step for brining genetics and blockchain together involves digitization of the genetic information (DNA). DNA is found in almost all our body cells encapsulated in the cell’s nucleus and needs to be first extracted. The easiest (and less painful) way to get the DNA from a person is by collecting a saliva sample (DNA can also be extracted from blood). The sample is collected in a special saliva collection kit and is then sent to a DNA sequencing service provider. The DNA molecule is read and converted into a digital file that can be sent back to the user. This is only the beginning of the genetic journey one can take to understand his genetic makeup. The next step is now to find the hidden meaning in the hundreds of millions of nucleotides of the genome. The digital file with the genetic information which is huge in size (1.5 Gigabyte after some “cleaning”) should now be shared with different genetic service providers that can decipher the genetic code and enrich it with data related to your health care and wellbeing. This data may include for example information on your tendency of developing metabolic diseases, the way you metabolize drugs and your inclination for developing various types of cancers. Storing the genetic information which is highly sensitive and then sharing it with 3rd parties carries many challenges in terms of data security. By properly connecting the new possibilities that are included in the blockchain infrastructure with the genetic and genomic data, new opportunities now reveal themselves for data sharing and management.

Imagine the blockchain is like a highway system that connects the different participants in the genetic ecosystem that includes, the consumers, the genetic labs, pharma companies, the genetic counsels and the health care and research institutes. On this imaginary highway genetic distributed application manage transections controlled by genetic smart contract that enable the participants to share genetic information, provide or consume genetic services and get paid or be rewarded for participation in this ecosystem. Now, add a few more important details such as that you can travel on these highways with high levels of anonymity, feeling very secured and being able to control the manner data is being transmitted and shared. This is what the new world of genetic blockchain will look like.

What are the challenges for Genomic data?

The vision of the genetic blockchain is exciting but before it turns into realty a few questions should should be addressed. The digital human genome is very large in terms of data size. One copy of the genome (in our cells there are two such copies) is in the size of ~750mb. Compared with the average size of transections occurring today on the Ethereum blockchain genetic data is several folds larger.

Building a state-of-the-art system of highways is useful only if there are enough cars to use it. To have more cars (or transections) it is required that more people will consume genetic services, and this can only happen when the prices of genetic services such as full genome sequencing and genetic testing will decline. As the prices for full genome sequencing are plummeting ($400 for non-clinical sequencing) and as genetic testing is moving from wet genetic tests to dry tests (based on algorithms that manage the digital genetic code) the need for the solutions that blockchain can bring with will significantly increase.

By Ofer A. Lidsky, Co-Founder of DNAtix