Blockchain also creates this opportunity to arrive at a consensus of a global truth: Everybody agrees that this is the account balance of a certain address, or that something that has been put on the blockchain will be there forever (you can point to the time stamp for this particular element). Git doesn’t really do that. So, Git is actually a radically decentralized vision of a protocol.

When Linus first came up with Git, there was no GitHub — no centralized authority for managing things. Therefore, the idea of Git is that there is no central authority. And on the blockchain, a global truth exists — so that one piece of content, once everybody agrees, is updated and synchronized accordingly. Combining the radical decentralization and flexibility of Git with the type of consensus we get from the blockchain is what we aim to do.

Let’s use an example that has nothing to do with Git or blockchain. Think about word processing:

When you write an article about the “Ultimate Snowboard Experience”, you use a word processor like Microsoft Word. Through that tool, you create something called content. That content is encoded in a word document (.docx). The distribution consists of adding that Word document to a shared folder like a Dropbox folder; when it is synced or uploaded, it is considered distributed. This could happen through a mailing list too. That’s the basic idea of content creation and distribution: You have a tool, which you use to create an artifact (some sort of bundle of text and images, etc.); then, you move that bundle to a place where it becomes accessible to everyone you want to share it with, or people get notified that they can download the document to their computers.

To get closer to modern Web development, imagine this: Instead of using Microsoft Word, you use a Web page editor, where you can type, drag and drop, or use code to insert your text. But when you save a Web page, it’s not as atomic as a docx file. It is a folder with HTML, CSS, and Javascript. That folder represents these interlinked resources — rendering this special fare to visit Norway for $250, for example.

When you’re done, you upload that folder to a Web server like visitnorway.deals, for which you pay a Web host USD $6 a month. There’s an FTP account, you just type in your username and password, and that’s how you distribute your work.

However, this last step — the way to deliver the content for distribution — has changed. The Web page editor (which could be a Web app or a desktop app) still creates the same HTML-CSS-file; but instead of directly pushing it to the website by uploading it to an FTP server, you now check in your code (your HTML and CSS) to GitHub — a Git repository, a shared folder system that tracks version history. Once you have checked in your code, CI (Continuous Integration) automatically recognizes that someone has checked it in, that this is the official master version. It bundles everything together and deploys it to the cloud or to a server.

GitHub not only monitors which files have been uploaded to the FTP server; it actually has a version management system. Anything you’re working on locally is managed by the version control system Git. So, every single time you add a file, you can keep track of which file you’re changing, which items you’re adding, which images you’re editing. And when you push it, Git keeps track of that set of commits as an atomic thing. Thus, Git acts like a chain, in that it remembers the entire history of your files in a cryptographical way. Everything has integrity when you push it to a service hosted by GitHub.com (which is a great website, where all developers put their code; it’s the social network for open- and even closed-source development). GitHub basically becomes a clearinghouse for the things that are created and distributed on people’s computers, with version-controlled folders managed by Git. Then, as a single source of truth, it helps you to deploy your content to a website.

GitHub vs. Gitchain

Structurally, GitHub sits at the center, and creators (makers or coders) are just updating the Git Repository. They either update it on their computers first and then push it, or they update it directly on GitHub.com. That information gets sucked into GitHub, and that’s where the workflow begins.

GitHub is a very powerful network of people. They come to the centralized infrastructure of GitHub to do not only coding, but increasingly collaborating on content, data sets, specifications, documents, rules, regulations, and legal clauses. From a decentralization/blockchain-oriented perspective, this seems a bit uncomfortable. Does it make sense for everything to be routed through GitHub? The original version of Git, as Linus intended it, was very decentralized. So, the question we ask is: Could GitHub — this important position in the center — be replaced with something that is more like a protocol and less like a service?

That’s where Gitchain comes in: It allows creators to connect with each other, using a combination of a Git protocol and a blockchain system, so that they don’t need a centralized entity to coordinate what they do.

Let’s simplify this by looking at two different repositories. One is owned by one person; the other one could be a company, a partner, or another person.

Almost anything that is a file can be managed in a Git Repository. This is content in HTML, data in CSV, content or data in some JSON format, as well as code, Javascript, markups, and configuration files. Those are all files that can be checked into a Git Repository the same way they can be copied into a folder; except that Git keeps track of every change to every file, building up a content-addressable system that allows you to go back to a previous version.

Let’s say you want to make this set of things (your website, proposal, or other data set) available to another person. Right now, you have to upload it to GitHub. Instead, you could push it to Gitchain. The Gitchain will carry the whole bundle to the other side and allow another Git Repository, client, computer, or server to download all the information to recreate the content, data, and code. This is what we call synchronization — you’re basically making a copy of the whole thing. Gitchain has a hollow middle (meaning: no centralized entity!) and routes around the boundary through the concept of a distributed network.

Moving parts of Gitchain: Distributed Storage & Distributed Ledger

Two basic sets of infrastructure underline what we call Gitchain: Distributed Storage & Distributed Ledger. Essentially, it’s all about pushing and syncing between two repos.

If you have a 400-megabyte file, you can’t put that on the blockchain; it’s too big. So, you put it (along with all the changes, metadata, captions, and descriptions) into something called a “packfile”: In a Git protocol, every single change is stored in the folder structure, which is very inefficient for a large amount. So, when you have to copy 4,000 files, you can create a packfile, which is like a zip file, except that it is compliant with Git. Once you have a packfile, you put it in Distributed Storage — which could be a shared drive on the cloud, a distributed file system like IPFS, or another type of blockchain-driven storage system.

Now, you have to tell someone where this packfile is, and that’s where the reference comes in. You put a record on the Distributed Ledger, which is limited in size, saying, “Whoever finds this link in the Distributed Ledger can find the packfile; and with the packfile, they can recreate the original content, data, and/or code, or a mixture.” By pushing it, you create the reference on the ledger. When the other person sees the reference, they can look up the packfile, download it through a Distributed Storage mechanism, and unpack it; then, they can sync and recreate the Git Repository.

The process of figuring out how to pack the packfile, decode it, and reconstruct its history is done by Git. This protocol just works. Developers rely on it to make sure every single line of code that exists in a repository shows up on the website precisely. So, instead of creating your own protocol to synchronize Distributed Ledger and Storage, why don’t you use the one that has been proven already? Let Git figure out how to unwrap packfiles for you.

The key is that we only install tiny bits of data on-chain. More importantly, we store a consistent amount of data for each packfile, because we don’t need to store the number of files that are involved in it, the number of changes that have been made, or the number of people who made those changes. All that information is condensed in the packfile through the Git protocol. All you have to do is tell the other person where the packfile is, plus maybe some logic about who can access it. But the majority of the data is considered off-chain — meaning not on the Distributed Ledger.

It can be in the Distributed Storage. This Distributed Storage can handle megabytes, gigabytes, and in some cases terabytes of data. If you chop up the packfile further, you can store any amount of data off-chain, and just use tiny bits of on-chain data as instructions for the Git repositories to recreate that content data and code.

Technical layers

Looking at this from a technical perspective, we have a Distributed Ledger or Blockchain layer and a Distributed Storage layer, both of which are Layer 1 protocols. They sit on the bottom, providing a flexible, content-addressable ledger service. Your Git Repository is an abstraction sitting on top of that, allowing you to open up the folder instead of inspecting the blockchain or looking into a packfile, so you can see things clear-text.

Layer 2 is the Distributed Sync and Syndication system, using Git as an online protocol. When you write a bunch of files, you want to pack them together and connect the versions via hashes (as the Wikipedia article suggests). Then, you upload the packfile into Distributed Storage. Once you know where it’s going to be, you record the hash and the URL to that packfile on-chain, using a Distributed Ledger. That is how a consumer can find the packfile, download it, and unpack it. All you need are four actions: pack & upload the packfile, record it on-chain, decode it, then download & unpack it.

On Layer 3, the Distributed User Hubs, users see what’s in the Git Repository. Most developers will just send you to their GitHub to check out what they do. The Git stuff, text files, and code make sense to them. But if you are not a developer, you have no idea what that means — though it could be something as simple as a list of websites or a list of articles or books. People don’t know how to look at markup, HTML, or JSON.