Show Me The Code: https://github.com/CovenantSQL/CovenantSQL

Author: Auxten Wang, Co-Founder of CovenantSQL

The idea of CovenantSQL rised in December 2017, in a cold windy day when Jing Mi came to have dinner with me in a BBQ restaurant. He brought me an interesting idea to build a SQL database on Blockchain. I was excited about this idea and decided immediately to quit my job and start this project.

Past Experiences with Database

I got to know about databases since I joined Baidu, one of the biggest internet companies in China. When I was working at Baidu, my proudest thing is to apply the idea of Dynamo distributed database to make a P2P transmission system that looks like a toy now: auxten/Gingko @ GitHub. At that time, Jing Mi was working on a database called Doris for Baidu statistics, so I often asked him questions about database, and we talked about MongoDB and MySQL. The Doris project was developed very well in Baidu and was added in the Apache Incubator this year.

Bitcoin

In 2012, I joined China’s top security company 360.com to develop security-related products and researched encryption algorithms, SETI@home. From there I got to know “Bitcoin”. Occasionally, I read the book written by Friedrich Hayek — Denationalization of Money. Later, I was surprised to find that some of my friends who have invested in Bitcoin in the early days were also influenced by this book. At that time, I just felt that this idea was really genius and bought some bitcoin instantly when the value was only ¥8.

I learned about Bitcoin more and more since then: automated trading; assembled Bitcoin mining machines (today, there are still a bunch of AMD graphics cards in my home). I even built a Bitcoin trading market with one of my friends (called 42BTC, hosted on GAE). I have been basically involved in the entire industry chain of Bitcoin. Having experienced various collapses, bear and bull markets of Bitcoin in the past years, I didn’t seriously calculate how much money I earned or lost from Bitcoin.

Last year, The People’s Bank of China issued a variety of red-headed documents prohibiting Chinese commercial banks from investing funds from Bitcoin exchanges, and I was so tired and sold most of my BTC. The remaining hundreds of BTCs were lost in Mtgox.com because Mtgox.com went bankrupt. Not to mention the lost, I felt that I was lucky enough to be an early investor in Bitcoin. My biggest takeaway from that experience is that I don’t care much about money gains and losses as I did several years ago.

Thunder Crystal

I forgot in which year Jing Mi moved to Shenzhen (He used to live in Beijing, the city where I stay). We contacted each other less often after that as we were all busy with work. I only knew that he was in charge of a project called “Thunder Crystal” at Xunlei, the biggest P2P tech company in China. Simply explain what Thunder Crystal project is for:

The project enables you to share your idle broadband resources to cache popular TV shows for Netflix and YouTube, and make money.

One day in 2014, someone sent a photo in the group chat, and I recognized Jing Mi at a glance: he participated in Xunlei’s IPO and went to NASDAQ.

jingmi@NASDAQ

My feeling is complicated:

1. Wow, I finally got to know someone who went to NASDAQ to ring the bell! 2. Well, Jing Mi must be financially free. Can we still discuss various technical issues happily in the future? 3. Sigh, there is one more person with financial freedom in the world, but there may be one less great programmer…

Papers published on ACM and IEEE for Thunder Crystal can be found here:

https://ieeexplore.ieee.org/abstract/document/7762143/

https://dl.acm.org/author_page.cfm?id=99658692448

Something Remarkable

Jeff Hammerbacher said that:

The best minds of my generation are thinking about how to make people click ads.

As one of the best minds, when you no longer need to work hard for a living, what do you want to do most?

The biggest risk is not taking any risk. The only strategy that is guaranteed to fail is not taking risks. — Mark Zuckerberg

Becoming very excited about the idea of Bitcoin, I bought a lot of bitcoins, and then experienced the price fluctuation, and finally disappointed.

When the concept of Blockchain was abstracted from Bitcoin, I even thought about rewriting a version of Boinc with Blockchain’s idea (Berkeley’s Volunteer computing framework, which allows users to use various scientific calculations when idle, including “SETI@Home”) But in the end, I still couldn’t make up my mind…

In the winter of 2017, jingmi came to Beijing, and we met in a BBQ restaurant near Zhong Guan Cun in Beijing. As in the old times, Jing Mi and I had a hot pot and scorned MongoDB using BSON for storage, but this time he also came with an exciting idea. His idea reminded me of his 8th-grade syndrome signature that he put on our university BBS many years ago:

Pioneers have changed the world. Things are silently evolving. Let us change the world again.

Generally speaking, this is how the idea looks like:

Connect the idle resources and devices to form a distributed database that supports SQL queries through a set of Code Law. Users and database miners will be matched, and value will be exchanged, under the restrictions of Code Law.

With the born of Ethereum, people became more excited and devout about blockchain, but now it runs into a big crisis this summer. In the era that technology has a huge impact on human life, a lot of people wish to get rich overnight, but the collision of ideas takes time, the development of theory takes time, and coding and project launch take time as well.

In the late 90s, the Internet boom led to bubbles and speculations. Now the history is repeating itself, the bubbles of blockchain might be even bigger. However, like the internet bubble didn’t destroy companies such as Google, Amazon, PayPal and Netflix, the blockchain bubble won’t destroy those companies who focus on developing their intrinsic value.

This time, I decided to make all-out effort for an idea. Even if I failed, I hope we can leave something useful that the next generation engineers can change the world. After the dinner with Jing Mi, I decided to quit my job and do something interesting with him.

Looking back, the reason why we choose to build an infrastructure is partially from our little 8th-grade syndrome obsession about database, but more from the hope to change to current issues of data & privacy protection:

on the next generation of Internet, everyone should have a complete Data Rights

Data Rights

It’s safer to store your data on an offline computer, but it’s also easy to accidentally lose it, and you would find it not easy to check. Whether it is on Facebook or WeChat, various cloud disks, the user’s data is almost always stored in a database controlled by a large company. The data belongs to you, but in the end, it is still under the control of the big Internet companies, and you have to confirm the terms of use.

The data on the Internet can be roughly divided into two categories:

Personal Data

Examples of personal data: personal identity & account information, private property, personally published content, historical data on applications, and websites.

Present situation: Privacy breaches, data mining abuse, and digital copyright infringement.

Our goal: Everyone should have control over reading and modifying personal data, as well as economic rights and authorization.

Public Data

Example of public data: Wikipedia and other co-create works, knowledge of human civilization, various data shared to everyone.

Present situation: Public wiki is sometimes malicious tampered and loses its credibility; papers, documents are used improperly to earn money; producers of valuable data receive no benefits, and sometimes lose control of copyright.

Our goal: The process of knowledge production often involves more than one individual, and each person’s contribution should be recorded.

Richard in “Silicon Valley” described a “decentralized Internet”. To build a decentralized internet what is critical is to build a decentralized database. When the traditional database meets blockchain, the Insert, Update, and Delete actions of data become Append. “Append instead of Overwrite” allows the history of the data to be fully recorded.

Read-Only

To change the present situation, there is a long way to go. A decentralized database provides at least some possibilities for users to control their own data.

To give a simple example: in the future, our personal data can be stored in a decentralized cloud database. Like Bitcoin, we can completely control our data with a single key. We can develop a standard similar to the PCI DSS from the credit card industry. In this moment, we call it the GDSS (General Data Security Standard) temporarily. The core is to require companies to strictly limit the use of the user’s data and delete it after use. For example, suppose Facebook is our trusted vendor. Following GDSS, we can give Facebook an authorized key that can only read our name, age, and friends list. At the same time, Facebook should make a record every time our data is read. If we find that Facebook has used our data to do something that we don’t like, we can revoke this key, make a complaint, or take a legal action.

Another example: the biggest headache for start-ups or research organizations in the big data industry is that there is no data available. Users can authorize these research institutions to use their own data and receive a certain amount of money for compensation. This will create a win-win situation and avoid the monopoly control of data by the companies that store our data.

European Union is leading data protection in terms of regulations. EU published GDPR this year and set out clear standards to regulate misconducts of using personal data.

Team Up

Before starting CovenantSQL, we have done a lot of projects in this field and understand that program development is a very complicated job. We gained a consensus learned from our similar experiences:

1. Project development must have clear, quantifiable, fine-grained goals.

2. Do not perform a BIG rewrite. Refactoring must be done in modules and must ensure that the interfaces are consistent.

In project’s early stage, write less code and complete the function now rather than writing more code now for future use purpose.

In April 2018, CovenantSQL team started the first line of code. We named the project with CovenantSQL as the word covenant implies our vision that records shall not be tampered without owner’s agreement. The translation into Chinese is called “契约”. The english word “covenant” is not common to Chinese people. Last time it was mentioned commonly is because of the movie Alien: Covenant (2017).

Architecture

Programmers who are familiar with the principles of distributed systems should know that from the perspective of the CAP theorem, Blockchain is a final consistency algorithm. The PoW used by Bitcoin for the typical Blockchain 1.0 is mainly for dealing with non-trusted networks and nodes. A lot of people have similar experience: “The fewer people involved, the more efficient is the decision-making.”

Blockchain developers seem to be aware of the same problem. The new generation of Blockchain systems led by EOS uses a “cabinet” like DPoS. We considered the issue with efficiency when we designed CovenantSQL at the beginning and used the layered architecture shown below:

CovenantSQL Three-Layer Design

The architecture of CovenantSQL is designed with three layers:

Global consensus layer (the main chain, the middle cycle in the architecture diagram):

There will be only one main chain throughout the network.

Main Role: Organizing database miners and match to users according to smart contract, dealing with transactions settlement, anti-cheating, recording the shard chain block hash on the main chain, and other global consensus matters. SQL consensus layer (shard chain, cycles on both sides):

Each database will have its own separate shard chain.

Main Role: transaction signature, transmitting, and consistency of the various transactions of the database. The traceable data history is implemented here, and the hash lock is saved on the main chain. Datastore layer:

Each Database has its own independent distributed engine.

Main Role: database storage & encryption, querying & signature, efficient indexing.

The three-layer design is hash locked with each other to ensure that the data cannot be tampered. From top to bottom, according to different requirements, different consensus algorithms and smaller consensus scopes are adopted to achieve higher consensus efficiency and performance.

The so-called non-tamperable blockchain is not 100% theoretically non-tamperable, it’s tamperable because the cost of tampering increase drastically as the number of nodes increases. We need different level of data security protection in different circumstances, so in our design, the user decides the number of replicates of the database. Of course, the more copies, the higher the cost. For a single database, assuming that the number of instances required by the database creator is N, then there will be no less than N nodes running the database, and only the N nodes need to maintain the full data of the database. The correspondence between the database and miners is anonymous to reduce the possibility of being attacked.

How We Work

Observing the commit history of CovenantSQL, you will find that the construction of CovenantSQL is a bottom-up process. First of all, we developed independent modules and test tools. After more than four months of intensive development, we finally built the main program. The benefits of doing this are obvious: CovenantSQL’s module single-test coverage is basically more than 75%, and many CovenantSQL modules can be used separately.

CovenantSQL test coverage

In order to simulate the network environment where CovenantSQL nodes are distributed all over the world, we also made a gadget called GNTE (Global Network Topology Emulator). By writing a YAML configuration and running a command, we can simulate a complicated container from China to the United States for testing, for example: 1Mbps bandwidth, delay 1200ms, 1% probability ±10ms.

This project was released for two months and has gained 171 Stars on Github.

Github: CovenantSQL/GNTE

Going forward, the release of all new versions of CovenantSQL will proceed:

Automated unit testing close to 80% line coverage Various integration tests including linear consistent testing GNTE simulates global network environment for integration testing

HSP is another gadget we built. We can automatically generate serialized interface `func(v *Type) MarshalHash() ([]byte, error)` according to the various types of Golang we define. No matter how complex is the Struct, as long as the stored content is the same, the content of `[]byte` is consistent.

The main principle is to analyze the golang code, generate an AST (abstract syntax tree), and generate different `MarshalHash` functions according to different types on the AST. This avoids the slowdown of approximately two orders of magnitude caused by the `Reflect` reflection at runtime. CovenantSQL mainly uses this tool to generate code that computes block hashes.

Of course, even the unit test code is automatically generated. For more detailed usage please check out here: CovenantSQL/HashStablePack

DH-RPC is a secp256k1-ECDH-AES encrypted P2P RPC framework for decentralized applications written in golang.

CovenantSQL is built on DH-RPC, including:

Byzantine Fault Tolerance consensus protocol Kayak

Consistent Secure DHT

DB API

Metric Collect

Blocks sync

Demo Code see Example

Features

75%+ code testing coverage.

100% compatible with Go net/rpc standard.

ID-based routing and Key exchange built on Secure Enhanced DHT.

Use MessagePack for serialization which supports most types without writing Marshal Unmarshal .

. Crypto Schema

Use Elliptic Curve Secp256k1 for Asymmetric Encryption

ECDH for Key Exchange

PKCS#7 for padding

AES-256-CBC for Symmetric Encryption

Private key protected by the master key

The anonymous connection is also supported

DHT persistence layer has 2 implementations:

BoltDB based simple traditional DHT

Kayak based 2PC strong consistent DHT

Connection pool based on Yamux, make thousands of connections multiplexed over One TCP connection.

see: CovenantSQL/DH-RPC

So far, we have completed about 90% of the main functions. The test network is also under construction. For more updates, please follow us on Github: