In this guide we’ll go through creating a smart contract on Ethereum that notarizes git commits only if the commit date is within the allocated window of time and offer the ability to verify that a commit was published by verifying with merkle proofs composed from commit hash logs. Then we’ll be using git pre-push hooks to publish the commit on-chain on every tagged release.

Here’s some schematics to help visualize the processes:

i. Tagging release

ii. Publishing to Ethereum via git pre-push hook

iii. Verifying git commit via merkle proofs

The problem

When looking at the commit history in a git repository you can never be certain that the commits were published at the commit date it says. Let’s go through a quick example to demonstrate. First we’ll initialize a new git repository and commit as normal and expected:

$ git init Initialized empty Git repository in /tmp/example-repo/.git/ $ git touch README.md $ git add . $ git commit -m "init" [ master ( root-commit ) 98f2e97 ] init 1 file changed, 0 insertions ( + ) , 0 deletions ( - ) create mode 100644 README.md $ git log commit 98f2e9701776e4a861a0d5eff2404ebe5db2633b ( HEAD -> master ) Author: Miguel Mota <hello@miguelmota.com> Date: Sun Jul 28 15:58:34 2019 -0700 init

Everything looks good as expected, but now backtracking a little lets use the --date flag this time when committing to set a custom commit date:

$ git commit -m 'init' --date='10 days ago' [master (root-commit) fd6b209] init Date: Sun Jun 16 16:00:50 2019 -0700 1 file changed, 0 insertions(+), 0 deletions(-) create mode 100644 README.md $ git log commit fd6b209b8bb4ff6e1383ef068c444bfc295a09c2 (HEAD -> master) Author: Miguel Mota <hello@miguelmota.com> Date: Sat Jun 06 16:00:50 2019 -0700 init

Now it appears as if the commit happened 10 days ago!

Someone can completely make up the dates to make it appear as if it was created a while back so you can never really be sure if they’re telling the truth. But if commits were notarized on-chain, then you don’t have to trust; just verify!

An example of why checkpointing would be important, is being able to know that a commit was signed with a GPG key that wasn’t revoked. For example, if someone has a GPG key they use to sign their commits and it was set to expire in a month and after that month the key gets compromised, the thief can sign commits after the expiration with that key (by backdating their hardware or system clock using libfaketime) and sign backdated commits, to make it appear as if they are legitimate commits signed with the key. By publishing the entire commit object which includes the date, the smart contract can only accept commits that are within a day, or even a few hours of the current time so it’s impossible to publish on-chain backdated commits. This checkpointing will help in proving authenticity of commits if the chance to do so ever arises since the commit is pegged to the actual date that the committer says the commit occurred on which is enforced by the smart contract.

Although it can, the contract won’t store the commit data and will simply emit events. This keeps the state trie light and data is still available off-chain via the event logs. Cost is negligible since it’s suggested to only on publish on major/minor tagged releases.

Smart contract

There needs to be a way to notarize commits in a way that the commit date can also be verified to be current during notarization, meaning that the notary shouldn’t allow commits older than a small window from the current time. This is actually pretty trivial to implement in a smart contract as we’ll see.

Before proceeding let’s look at what a commit consists of. The commit object field components are:

tree hash

hash parent hash, or hashes

hash, or hashes author meta string

meta string committer meta string

meta string gpgsig optional signature

optional signature message of commit

The tree is a merkle root of the committed file objects.

The parent references the parent commit, and sometimes there’s multiple parents if merging multiple commits without fast-forwarding.

The author contains meta data about the author such as name, email, timestamp and timestamp timezone.

The committer contains meta data about the committer such as name, email, commit timestamp and timestamp timezone.

The gpgsig is the signature of the signed commit by the committer if gpg signing is enabled.

The last portion of the commit is the actual commit message which may span multiple lines.

An actual example of the commit will look like this:

$ git cat-file -p HEAD tree 00dd089c310aea2b821d23ea0f1a6a6235ad165c parent 32f04c7f572bf75a266268c6f4d8c92731dc3b7f author Miguel Mota <hello@miguelmota.com> 1560727622 -0700 committer Miguel Mota <hello@miguelmota.com> 1560727622 -0700 gpgsig -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEkA8ilwQdtHsQgZEbZ+wRYViKAPkFAl0G0EYACgkQZ+wRYViK APmWdxAAgWKOQpz1/QbzYxOXQZ5uT8lTVnxw8HN4KZaZ36ehFTxzRVO0IEniJGr4 5+sVskMDbkP/aKQyS/UUmeXKGeYQT4Kpwvtih5CZHSLNO2LQ8pz5o0wjWK8OHmx7 pGuAd83gMqfnQF7+KDqpxqHR63NDmuRo4QQ18rolga16Md4wIRzFNU+JsX7WVIcD zPd6PzotAkGD+suqMiYt6ka6cqQT9lB8WN5L88Kdyy8mFsEu7YZBVkWQqB4YCLgu 7s3vaSuJ9NIGtT3C1Kd2lmEsrDZj84bmEHaP8aOdLAstucNrl8/wSo3NQFeydALQ WBpiDhFY2jOYSxcuwI+ZYfeizztr4qGXUaI2VM7/HYSChmWzyvghmP/XmZJDwCKk dDgyNSxEjWfM6GD1fmPulvU2MZKabqv6juHpETsPNPdpw7u+Z7om8s2G66erMliU WQOfwE4lFcBF0oVoJp2FQQYcme4BERDsqUWJ8C60PW0FELuZlAWRRcUIl49M6gXa sSTNfIXubA2LxjHQFS7hy+9+N1Dl1AFcQZP+Md8ai8B4JfDgswf+m1OVuOihDECa bMotWVZ+qHeycly9RihDkCas8ICPlCIGZ6PmAPnsMr5Ruzt9oaKuZ5UInB6IRx2k H7510dWvJLLZ7w1r78UWdyiT4DH5xRuqQJ8F7erOmtPw5lCmKto = = OZk+ -----END PGP SIGNATURE----- add license

The commit hash is a SHA-1 of the commit object contents. It’ll be computed like this, pseudocodically speaking:

SHA1(tree, parent, author, committer, signature, message)

An example of the commit hash for the shown commit object:

$ git rev-parse HEAD d89f84d948796605a413e196f40bce1d6294175d

In the smart contract we need to have the committer publish the entire contents of the commit object instead of just the hash in order to be able to verify the commit date they’re claiming.

First we can represent the commit object in solidity as a struct:

struct Commit { string tree; string [] parents; string author; uint256 authorDate; string authorDateTzOffset; string committer; uint256 commitDate; string commitDateTzOffset; string message; string signature; }

And specifiy a mapping to store the commit checkpoints:

mapping ( bytes20 => uint256 ) public checkpoints;

The meat of the contract is the checkpointing functionality. It should accept the commit object and verify that the commit date is within a 24 hour window from the current block time.

function checkpoint( Commit calldata _commit ) external returns ( bytes20 commitHash) { require (_commit.commitDate <= now + 24 hours ); require (_commit.commitDate > now - 24 hours ); // ...

Next it should construct the commit hash from the commit data by concanetating the fields into their proper format:

// ... string memory treeStr = concat( "tree " , _commit.tree, "

" , "" , "" , "" , "" ); string memory parentsStr; for ( uint256 i = 0 ; i < _commit.parents.length; i ++ ) { parentsStr = concat(parentsStr, "parent " , _commit.parents[i], "

" , "" , "" , "" ); } string memory authorStr = concat( "author " , _commit.author, " " , uint2str(_commit.authorDate), " " , _commit.authorDateTzOffset, "

" ); string memory committerStr = concat( "committer " , _commit.committer, " " , uint2str(_commit.commitDate), " " , _commit.commitDateTzOffset, "

" ); string memory signatureStr = "" ; if ( bytes (_commit.signature).length > 0 ) { signatureStr = concat( "gpgsig " , _commit.signature, "" , "" , "" , "" , "" ); } string memory messageStr = concat( "

" , _commit.message, "" , "" , "" , "" , "" ); string memory data = concat(treeStr, parentsStr, authorStr, committerStr, signatureStr, messageStr, "" ); // ...

After concatenating to the proper format, the data must contain prefixed with the commit label followed by the length of the data to generate the commit id hash. We’ll be using the SHA1.sol library which implements SHA-1 in solidity:

// ... commitHash = SHA1.sha1(abi.encodePacked( "commit " , uint2str(strsize(data)), byte ( 0 ), data)); // ...

And lastly setting the commit id as the key and the commit date as the value in the storage mapping if it doesn’t exist already.

// ... require (checkpoints[commitHash] == 0 ); checkpoints[commitHash] = _commit.commitDate; }

The contract also contains methods for verifying merkle proofs which we’ll see later. If you’re not familiar with merkle proofs in solidity, check out this example.

Here’s the full solidity contract:

pragma solidity ^ 0 . 5 . 2 ; pragma experimental ABIEncoderV2; import "./SHA1.sol" ; contract Commits { mapping ( bytes20 => uint256 ) public checkpoints; event Checkpointed( address indexed sender , bytes20 indexed commit); struct Commit { string tree; string [] parents; string author; uint256 authorDate; string authorDateTzOffset; string committer; uint256 commitDate; string commitDateTzOffset; string message; string signature; } function checkpoint( Commit calldata _commit ) external returns ( bytes20 commitHash) { require (_commit.commitDate <= now + 24 hours ); require (_commit.commitDate > now - 24 hours ); string memory treeStr = concat( "tree " , _commit.tree, "

" , "" , "" , "" , "" ); string memory parentsStr; for ( uint256 i = 0 ; i < _commit.parents.length; i ++ ) { parentsStr = concat(parentsStr, "parent " , _commit.parents[i], "

" , "" , "" , "" ); } string memory authorStr = concat( "author " , _commit.author, " " , uint2str(_commit.authorDate), " " , _commit.authorDateTzOffset, "

" ); string memory committerStr = concat( "committer " , _commit.committer, " " , uint2str(_commit.commitDate), " " , _commit.commitDateTzOffset, "

" ); string memory signatureStr = "" ; if ( bytes (_commit.signature).length > 0 ) { signatureStr = concat( "gpgsig " , _commit.signature, "" , "" , "" , "" , "" ); } string memory messageStr = concat( "

" , _commit.message, "" , "" , "" , "" , "" ); string memory data = concat(treeStr, parentsStr, authorStr, committerStr, signatureStr, messageStr, "" ); commitHash = SHA1.sha1(abi.encodePacked( "commit " , uint2str(strsize(data)), byte ( 0 ), data)); require (checkpoints[commitHash] == 0 ); checkpoints[commitHash] = _commit.commitDate; emit Checkpointed( msg . sender , commitHash); } function checkpointed( bytes20 commit) public view returns ( bool ) { return checkpoints[commit] != 0 ; } function checkpointVerify( bytes20 commit, bytes20 root, bytes20 leaf, bytes20 [] memory proof) public view returns ( bool ) { require (checkpoints[commit] != 0 ); return verify(root, leaf, proof); } function verify( bytes20 root, bytes20 leaf, bytes20 [] memory proof) public pure returns ( bool ) { bytes20 computedHash = leaf; for ( uint256 i = 0 ; i < proof.length; i ++ ) { bytes20 proofElement = proof[i]; if (computedHash < proofElement) { // Hash(current computed hash + current element of the proof) computedHash = SHA1.sha1(abi.encodePacked(computedHash, proofElement)); } else { // Hash(current element of the proof + current computed hash) computedHash = SHA1.sha1(abi.encodePacked(proofElement, computedHash)); } } // Check if the computed hash (root) is equal to the provided root return computedHash == root; } function concat( string memory _a, string memory _b, string memory _c, string memory _d, string memory _e, string memory _f, string memory _g) internal returns ( string memory ) { return string (abi.encodePacked(_a, _b, _c, _d, _e, _f, _g)); } function uint2str( uint v) internal view returns ( string memory str) { uint256 maxlength = 100 ; bytes memory reversed = new bytes (maxlength); uint i = 0 ; while (v != 0 ) { uint remainder = v % 10 ; v = v / 10 ; reversed[i ++ ] = byte ( uint8 ( 48 + remainder)); } bytes memory s = new bytes (i); for ( uint j = 0 ; j < i; j ++ ) { s[j] = reversed[i - j - 1 ]; } str = string (s); } function strsize( string memory str) internal view returns ( uint length) { uint256 i = 0 ; bytes memory strbytes = bytes (str); while (i < strbytes.length) { if (strbytes[i] >> 7 == 0 ) { i += 1 ; } else if (strbytes[i] >> 5 == 0x06 ) { i += 2 ; } else if (strbytes[i] >> 4 == 0x0E ) { i += 3 ; } else if (strbytes[i] >> 3 == 0x1E ) { i += 4 ; } else { //For safety i += 1 ; } length ++ ; } } }

We’ll go ahead an deploy it to the Kovan testnet, you can see it here on etherscan.

Git hook

We should create a git hook to submit a transaction that performs the checkpoint.

The way this is going to work is we’re going to only publish to the smart contract if it’s a tagged release, meaning that we’ll check if the current git commit has a corresponding git tag associated with it. For simplicity sake, let’s use node.js and the child proccess library to execute git commands:

const { execSync } = require( 'child_process' ) const commit = execSync( 'git cat-file -p HEAD' ).toString().trim() const commitHash = execSync( 'git rev-parse HEAD' ).toString().trim() const tag = execSync( 'git describe --tags `git rev-list --tags --max-count=1`' ).toString().trim() const tagCommit = execSync( `git rev-list -n 1 " ${ tag } "` ).toString().trim() if (tagCommit !== commitHash) { console.log( 'Tag not found for commit, skipping checkpoint.' ) process.exit( 0 ) }

If a tag was found matching the commit we’ll proceed with parsing the commit string and prepping the values for the transaction data:

const parseCommit = require( 'git-parse-commit' ) console.log( `Tag ${ tag } found, checkpointing commit ${ commitHash } ` ) const { tree, parents, author : { name : authorName, email : authorEmail, timestamp : authorDate, timezone : authorDateTzOffset }, committer : { name : committerName, email : committerEmail, timestamp : commitDate, timezone : commitDateTzOffset }, pgp, title, description } = parseCommit( ` ${ commitHash } \ n ${ commit } ` ) const author = ` ${ authorName } < ${ authorEmail } >` const committer = ` ${ committerName } < ${ committerEmail } >` const message = ` ${ title }${ description } \ n` // NOTE: newlines are necessary here const signature = `-----BEGIN PGP SIGNATURE----- ${ pgp } -----END PGP SIGNATURE-----` .split( '

' ).join( '

' ) + '

'

We have all the data formatted and queued and now just need to set up the web3 provider and load the contract. First let’s create custom git config attributes for setting the private key and provider uri (example private key was derived from ganache-cli --deterministic ):

$ git config ethereumcheckpoint.privatekey 4f3edf983ac636a65a842ce7c78d9aa706d3b113bce9c46f30d7d21715b23b1d $ git config ethereumcheckpoint.provideruri https://kovan.infura.io/

These custom values live in .git/config but you may also set them to be global with the --global flag:

In the code we simply query for the custom config values:

const privateKey = execSync ( 'git config ethereumcheckpoint.privatekey' ) .toString () .trim () const providerUri = execSync ( 'git config ethereumcheckpoint.provideruri' ) .toString () .trim ()

Next up is setting the private key web3 provider:

const Web3 = require( 'web3' ) const PrivateKeyProvider = require( 'truffle-privatekey-provider' ) const web3 = new Web3(provider) const provider = new PrivateKeyProvider(privateKey, providerUri)

We’ll read the ABI and contract address directly from the generated contract JSON file from truffle migrate when it was deployed and then initialize the contract instance:

const fs = require( 'fs' ) const path = require( 'path' ) const { address : sender } = web3.eth.accounts.privateKeyToAccount( `0x ${ privateKey } ` ) const contractJSON = JSON.parse(fs.readFileSync(path.resolve(__dirname, '../build/contracts/Commits.json' ))) const { abi } = contractJSON const networkId = 42 // kovan const { address : contractAddress } = contractJSON.networks[networkId] const contract = new web3.eth.Contract(abi, contractAddress)

Finally we send a signed transaction on-chain and assert the status to be successful:

const data = { tree, parents, author, authorDate, authorDateTzOffset, committer, commitDate, commitDateTzOffset, message, signature } console.log( 'Checkpointing commit to Ethereum...' ) const { status, transactionHash } = await contract.methods.checkpoint(data).send({ from : sender }) console.log( `Transaction hash: ${ transactionHash } ` ) assert.ok(status)

Here’s the full git pre-push hook code:

const assert = require( 'assert' ) const fs = require( 'fs' ) const path = require( 'path' ) const { execSync } = require( 'child_process' ) const Web3 = require( 'web3' ) const PrivateKeyProvider = require( 'truffle-privatekey-provider' ) const parseCommit = require( 'git-parse-commit' ) const contractJSON = JSON.parse(fs.readFileSync(path.resolve(__dirname, '../build/contracts/Commits.json' ))) const { abi } = contractJSON const networkId = 42 // kovan const { address : contractAddress } = contractJSON.networks[networkId] const privateKey = execSync( 'git config ethereumcheckpoint.privatekey' ).toString().trim() const providerUri = execSync( 'git config ethereumcheckpoint.provideruri' ).toString().trim() const provider = new PrivateKeyProvider(privateKey, providerUri) const web3 = new Web3(provider) const { address : sender } = web3.eth.accounts.privateKeyToAccount( `0x ${ privateKey } ` ) const commit = execSync( 'git cat-file -p HEAD' ).toString().trim() const commitHash = execSync( 'git rev-parse HEAD' ).toString().trim() const tag = execSync( 'git describe --tags `git rev-list --tags --max-count=1`' ).toString().trim() const tagCommit = execSync( `git rev-list -n 1 " ${ tag } "` ).toString().trim() if (tagCommit !== commitHash) { console.log( 'Tag not found for commit, skipping checkpoint.' ) process.exit( 0 ) } console.log( `Tag ${ tag } found, checkpointing commit ${ commitHash } ` ) const { tree, parents, author : { name : authorName, email : authorEmail, timestamp : authorDate, timezone : authorDateTzOffset }, committer : { name : committerName, email : committerEmail, timestamp : commitDate, timezone : commitDateTzOffset }, pgp, title, description } = parseCommit( ` ${ commitHash } \ n ${ commit } ` ) const author = ` ${ authorName } < ${ authorEmail } >` const committer = ` ${ committerName } < ${ committerEmail } >` const message = ` ${ title }${ description } ` let signature = '' if (pgp) { // NOTE: newlines are necessary here signature = `-----BEGIN PGP SIGNATURE----- ${ pgp } -----END PGP SIGNATURE-----` .split( '

' ).join( '

' ) + '

' } const data = { tree, parents, author, authorDate, authorDateTzOffset, committer, commitDate, commitDateTzOffset, message, signature } const contract = new web3.eth.Contract(abi, contractAddress) try { console.log( 'Checkpointing commit to Ethereum...' ) const { status, transactionHash } = await contract.methods.checkpoint(data).send({ from : sender }) console.log( `Transaction hash: ${ transactionHash } ` ) assert.ok(status) const _commitDate = await contract.methods.checkpoints( `0x ${ commitHash } ` ).call() assert.equal(_commitDate.toString(), commitDate.toString()) console.log( 'Successfully checkpointed commit to Ethereum.' ) process.exit( 0 ) } catch (err) { console.error(err.message) }

Trying it out

In an git repository, create a pre-push git hook by creating the file .git/hooks/pre-push .

For convenience, I created a gist with the required files to create an NPM module which you can install with:

npm install gist:8d2d0d1c010137eca271985a1cfaa67d

Now in .git/hooks/pre-push add the following content:

#!/usr/bin/env node require ( 'ethereum-checkpoint-git-commit' )

And make sure it’s executable:

chmod +x .git/hooks/pre-push

Assuming you have set the required custom git config attributes already, you can tag the commit and push which invokes the pre-push git hook and publishes the commit on-chain:

$ git tag v0.0.1 $ git push origin master Tag v0.0.1 found, checkpointing commit da1597df5884f651acfa2d3f50bb37d320fb2a20 Checkpointing commit to Ethereum... Transaction hash: 0x0f64c42d47e7c92d5aa0da5eca0a6d3a33aed2695e46b70e3e850ce4694c525f Successfully checkpointed commit to Ethereum. // ...

See the transaction on etherscan.

Verification

Since the parent commit hash lives on-chain now, it’s very easy to check if any previous commits are children of the parent commit.

Run git log to get all commit hashes:

$ git --no-pager log --pretty = oneline | awk '{print $1}' 07fe57235fc504613879f11107a5d81ffaaf1d40 d89f84d948796605a413e196f40bce1d6294175d 32f04c7f572bf75a266268c6f4d8c92731dc3b7f b80b52d80f5fe940ac2c987044bc439e4218ac94 1553c75a1d637961827f4904a0955e57915d8310

Now we’ll construct a merkle tree using the commit log hashes as the leaves:

const { MerkleTree } = require( 'merkletreejs' ) const sha1 = require( 'sha1' ) const leaves = execSync( `git --no-pager log --pretty=oneline | awk '{print $1}'` ).toString().trim().split( '

' ).map(x => Buffer.from(x, 'hex' )) const tree = new MerkleTree(leaves, sha1, { sort : true })

We specify a leaf node that we want to generate proof from, meaning that we’ll get the minimum node hashes requires to proof that the leaf exists in the merkle tree:

const root = tree.getHexRoot() const leaf = Buffer.from( '32f04c7f572bf75a266268c6f4d8c92731dc3b7f' , 'hex' ) const proof = tree.getHexProof(leaf)

Now that we have the proof we make a read-only call to the smart contract to verify that the proof is valid and that the commit exists on-chain

const verified = await contract.methods.verify(root, leaf, proof).call({ from : '0x90f8bf6a479f320ead074411a4b0e7944ea8c9c1' }) console.log( `verified: ${ verified } ` ) // true

It returns true , and if we pass it an invalid commit hash or proof for a parent hash as the leaf that has not been committed on-chain then it returns false .

Full code of verification call code:

const fs = require( 'fs' ) const path = require( 'path' ) const { execSync } = require( 'child_process' ) const Web3 = require( 'web3' ) const { MerkleTree } = require( 'merkletreejs' ) const sha1 = require( 'sha1' ) const contractJSON = JSON.parse(fs.readFileSync(path.resolve(__dirname, '../build/contracts/Commits.json' ))) const { abi } = contractJSON const networkId = 42 // kovan const { address : contractAddress } = contractJSON.networks[networkId] const providerUri = execSync( 'git config ethereumcheckpoint.provideruri' ).toString().trim() const web3 = new Web3( new Web3.providers.HttpProvider(providerUri)) ;(async () => { const contract = new web3.eth.Contract(abi, contractAddress) const leaves = execSync( `git --no-pager log --pretty=oneline | awk '{print $1}'` ).toString().trim().split( '

' ).map(x => Buffer.from(x, 'hex' )) const tree = new MerkleTree(leaves, sha1, { sort : true }) const root = tree.getHexRoot() const leaf = Buffer.from( '32f04c7f572bf75a266268c6f4d8c92731dc3b7f' , 'hex' ) const proof = tree.getHexProof(leaf) const verified = await contract.methods.verify(root, leaf, proof).call({ from : '0x90f8bf6a479f320ead074411a4b0e7944ea8c9c1' }) console.log( `verified: ${ verified } ` ) })()

All of the code is available on github.