Factom is the data layer of the blockchain space: any arbitrary data can be published to Factom at a low and fixed cost ($0.001/kb). That opens up a lot of possibilities. Over time several design patterns have emerged for applications to efficiently and safely leverage that powerful capability. This article is directed towards developers who are already familiar with the Factom concepts of chains and entries and who are looking for a list of good practices for building a production ready application on Factom.

In this article we’ll illustrate a Factom entry using a table of this form:

+--------+--------------+

| Ext[0] | <0xab45baab> |

| Ext[1] | <0x676c67a4> |

+--------+--------------+

| This is the content. |

+-----------------------+

Data on Factom

Factom entries can store any data. You may have heard that Factom was meant to store hashes of documents, but that is really only one use case. Whilst there are many good reasons to exclusively publish hashes of off-chain data, there are also occasions where applications may want to publish more expressive data.

Public immutable data for ever

Just because you have the ability to store anything on Factom, it does not mean that you should. When you are designing your application and you are thinking of storing a piece of data you should ask yourself:

Is it ok for this data to be public? Absolutely anybody can see it; the public protocol has no privacy features.

Is it ok for this data to be immutable? You cannot edit a mistake.

Is it ok to be possibly accessible forever? Eternity is a long time, project yourself into future situations.



The consequences of not seriously evaluating those properties can be dramatic:

The consequences of not seriously evaluating those properties can be dramatic: Legal issues. Think “GDPR” here. Putting personal data on a blockchain is most likely a terrible idea.

Exposing any sensitive/private data could jeopardize your competitive advantage.

A natural response might be to think that you can simply encrypt the data before putting it on the blockchain. I must give you a word of caution, and repeat that eternity is a long time. Encryption algorithms have a limited lifespan, they do get broken after a certain time (due to technological advancement), which means that at some point in the future your data may be deciphered. If you data has an “expiration date” (i.e. it becomes completely worthless after that date) that is prior the expected lifetime end of your encryption algorithm, then you may be fine. To sum up: be aware that encryption of data on a blockchain only gives you temporary privacy and you must evaluate the impact of your data getting revealed at some point in the future.

I’d extend the above by stating that it is especially important on blockchain that you don’t rely on any security through obscurity. Do not assume you’re safe because people don’t know which chain you are using or that they won’t comprehend the data in your entries; all the data is public and they have a lot of time to figure it out (eternity is a long… ok ok enough).

But what happens if someone stores something illegal on Factom (e.g. child pornography), wouldn’t all the people running nodes of the Factom network now be operating illegally as they are effectively maintaining a database with illegal content? That would be an existential threat waiting to happen. The Factom network has a mechanism to protect itself against that: local erasure. While it is impossible to censor the content of the Factom network in its entirety, individual nodes of the network have the choice to remove an entry from their database without altering the integrity of the blockchain. If an entry has been locally erased from a node that means that particular entry cannot be queried on that particular node, but it’s possible that the content is still available on another node.

Hashes

A cryptographic hash function is a mathematical algorithm that maps data of arbitrary size to a bit string of a fixed size (a hash). It is designed to be a one-way function, that is, a function which is infeasible to invert. You provide a hash function (you may have heard of SHA or BLAKE functions) with a data input (a file for instance) and you get a hash as an output that will look something like e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855. That string can be used as a finger print of the the input file: it is impractical to find/create another file that would yield the same hash, and it also means that if the original file is modified, even by a single byte or character, the hash would be completely different. It is impossible to reconstruct the file just from its hash.

Hash functions have been around for decades and have found numerous applications. Using them in the context of blockchain brings a new dimension to it: storing hashes in Factom allows to “secure” any file against modifications (or removal) and show the proof to the world, making it accessible for any third party to audit the integrity (non alteration) of said files. The typical configuration would be to store some files in any storage service (it can be private or public, centralized or not, it is up to you) and store the hashes of those files in a chain in Factom. The idea is that you can share one of those files with another party and prove undeniably that the file hasn’t been altered since its fingerprint was recorded on Factom. Or the other party can look at the hashes recorded on chain and ask you to provide the corresponding files, so it can verify that you’re not hiding any files and that they are unmodified.

You’ll notice that hash function properties conveniently remove the biggest concerns expressed in the previous section: hashes just by themselves don’t convey sensitive information. Exposing them publicly, without possibility to edit them and for ever shouldn’t have you too worried if your application was properly designed (hash functions don’t hide the input if the input can be easily guessed though! You’d need to add a secret to the recipe in this case, see HMAC).

It’s worth noting that hashes are a pseudonymisation technique under EU law, not an anonymisation technique. That means hashes are still considered personal data for the purposes of GDPR.

Data authentication

Factom can store any data, and anyone owning Entry Credits can insert data into the blockchain. It is important to keep in mind that chains are not permissioned, anyone can write entries in “your chain”, whether it’s intentional or not. Two direct consequences of this situation:

1. A malicious actor can try to alter the state of your application by inserting unwanted data in chains used by the app.

2. Your chain can be spammed with garbage data, potentially slowing down your application depending how it was designed. There is no real way to prevent that so your application should be designed to be able to efficiently filter out that spam.

Example of a counter app

We’ll use the example of a simple counter application in the rest of this section to illustrate the patterns we are presenting. That application will keep track of a single value (number) and be able to update it over time. Even for such a trivial application you should still evaluate whether it is acceptable to store that piece of information on a public blockchain. If the counter keeps track of the number of guards escorting a valuable asset at all time, that may be not be a good idea!

The initial and naive implementation of this counter would be to create a chain for the counter, and every time the counter is updated insert an entry with the new value in that chain. An entry to update the value of the counter to 42 would look like:

+---------------------+

| (no external IDs) |

+---------------------+

| 42 |

+---------------------+

Immediately you realize that in this simple form the counter can be set to any value by anyone, making it completely unreliable.

Cryptographic signature

One solution is to authenticate the source of the data in the entry using public-key cryptography: the trusted source of the information should sign the data and put the signature along the data so anyone can verify it.

A common pattern with Factom is to use the first entry of a chain as a “registration” or “header” entry that contains information to help with the interpretation and validation of subsequent entries. Using that idea the first entry of our counter chain will contain the public key(s) of the source(s) that will sign the counter updates. Then the application reading the values of the counter can verify that each update is properly signed, and if not, ignore the entry, effectively preventing anyone not in possession of the private key(s) to alter the counter value.

The first entry of the counter chain would look like:

+--------+---------------------------------------------------------+

| Ext[0] | factom-counter |

| Ext[1] | ed25519 |

| Ext[2] | <0x591b76f87bbea9eab70603c0a728999a25b2afac936fbeb3...> |

+--------+---------------------------------------------------------+

Ext[0] is a human readable marker. Not strictly necessary, but you’ll often find it useful to use such marker in your parser code to quickly identify the nature of the chain.

is a human readable marker. Not strictly necessary, but you’ll often find it useful to use such marker in your parser code to quickly identify the nature of the chain. Ext[1] indicates the type of asymmetric key used, here ed25519.

indicates the type of asymmetric key used, here ed25519. Ext[2] the 32 bytes ed25519 public key to be used to verify the signatures of the following entries.

The following entries update the value of the counter and take the form of:

+--------+---------------------------------------------------------+

| Ext[0] | counter-update |

| Ext[1] | <0x8dbcf489023412b59d84d2675991c82d27b7e40541c34f6f...> |

+--------+---------------------------------------------------------+

| 1989 |

+------------------------------------------------------------------+

Ext[0] contains a human readable marker indicating the nature of the entry. Again, not strictly necessary but can be very handy if you have multiple type of entries in the same chain (we could imagine a ‘reset-counter’ type of entry for instance).

contains a human readable marker indicating the nature of the entry. Again, not strictly necessary but can be very handy if you have multiple type of entries in the same chain (we could imagine a ‘reset-counter’ type of entry for instance). Ext[1] contains a 64 bytes signature of the content of the entry (“1989”). The signature must have been generated by the secret key corresponding to the public key present in the first entry of the chain.

contains a 64 bytes signature of the content of the entry (“1989”). The signature must have been generated by the secret key corresponding to the public key present in the first entry of the chain. Content contains the new value of the counter.

Anti pattern: using Entry Credit address for authentication

You may be tempted to use the paying Entry Credit address for the purposes of authentication. It is indeed possible to retrieve that information from the Entry Credit block that contains the commit of the entry. While it does allow to match entries with an entity (controlling the EC private address) we would generally recommend against using this approach.

The reasons are:

This violates an important principle in Computer Science called the separation of concerns. Entry Credit addresses are part of the Factom protocol layer while the data authentication is part of your application layer, it is strongly recommended to not mix different layers.

Tight coupling of the payment component with the data authentication component brings some inflexibility/usability issues:

1. A user cannot delegate the payment of the insertion to another party while still authenticating the data as his.

2. If that address is also used for other applications, it increases the possible attack surface of your application. Your application’s authentication keys should be exposed as infrequently as possible, which is difficult when you also have to use those keys to pay for entries.

1. A user cannot delegate the payment of the insertion to another party while still authenticating the data as his. 2. If that address is also used for other applications, it increases the possible attack surface of your application. Your application’s authentication keys should be exposed as infrequently as possible, which is difficult when you also have to use those keys to pay for entries. An Entry Credit address is nothing more than a cryptographic key pair that can be lost or compromised. If you lost your key you would permanently lose the ability to authenticate content against those keys. This is the same problem as directly using a key pair, which brings us to the next section about identities.

Identity

Using an asymmetric key pair to sign and verify the data on our chain is a major improvement in the reliability of our counter application. But it is also relatively risky and inflexible to directly bind our chain data to a single key pair. One should be fearful of the future and having a key lost or compromised is fairly common:

If you lose your private key you become unable to sign new data, effectively preventing new update as nobody will trust you if it is not properly signed.

If you private key is stolen, the thief can now pretend to be the source of truth and any data they insert will be trusted and considered valid.

That’s why we strongly recommend integration with a digital identity system instead. The digital identity system should be able to bind multiple keys to an identity and allow management of a key’s life cycle. You should be able to rotate the keys in case of loss or hack.

Factom Inc. developed one such on-chain digital identity system for this purpose.

Using this identity system the first entry of our chain would become:

+--------+---------------------------------------------------------+

| Ext[0] | factom-counter |

| Ext[1] | <0xe0beb5b6c6632abcc812e0d05626292095ed0eb602a04095...> |

+--------+---------------------------------------------------------+

Ext[1] is now simply a reference to the chain of a digital identity. The counter update entries are now signed by a valid key that is attached to that digital identity. Your application would need to integrate with this identity system to be able to validate the entries but you benefit from a greater flexibility and safety.

Replay attacks

Replay attacks are a family of attacks that consists in reusing data for an existing valid entry to forge and insert a new entry in a chain so that it would appear valid to the application reading the data and would alter the internal state of the application. These attacks are fairly easy to overlook when you’re new to development on Factom.

Inter-chain

Inter-chain replay attacks consist of replaying an entry from one chain to another.

Let’s imagine we have 2 counters C1 and C2. Each has its own chain, and in both cases the data is authenticated by the same identity (i.e. same cryptographic keys). If a new value is published on counter C1, it is extremely easy for an attacker to copy that entry and insert it in the chain of counter C2. The update to counter C2 will be accepted because the signature of the data is valid. The attacker didn’t have to come up with a signature, it was given to him by the counter C1 entry. It is true though that the attacker couldn’t use any arbitrary value for the counter (otherwise the signature wouldn’t be valid anymore) but it still allows him to pick any value that was published for counter C1.

Luckily inter-chain replay attacks are easily defeated by “binding” the signature of the content to a specific chain: instead of signing only the content of the entry (which allow the signature to be“portable”), you should sign [chain ID + content] (+ being a concatenation operator). Such a signature cannot be valid if inserted in any other chain.

Intra-chain

While reading about the inter-chain replay attack you may have realized that it was also possible to just replay an entry within the same chain and have the same devastating effect. It’s worth noting that Factom will not allow you to commit two identical entries to the same chain. But don’t be fooled (like I was), this is only true for about an hour, after which it is indeed possible to re insert the same exact entry. This is a mechanism to prevent applications inadvertently recording duplicate data (because no regular applications should have to publish duplicate entries within such a short amount of time).

There are two potential solutions to that problem.

Global uniqueness

One possible solution is for your application to disallow an entry to ever appear twice. Your application should only consider valid the first insertion of a given entry by verifying the uniqueness of its entry hash. If somebody tries to replay the entry as is then your application will discard the duplicate and it won’t affect the state of your application. With this approach your application will have to keep track of all the entry hashes of valid entries processed by your application.

Entry malleability

You may have heard of transaction malleability issues that have happened in the past with Bitcoin. Similarly, depending on what parts of your entries are covered by the signature, you can have entry malleability: a malicious actor could modify an entry without invalidating the signature. This can be leveraged in replay attacks.

Let’s take an example:

+--------+---------------------------------------------------------+

| Ext[0] | Signer_A |

| Ext[1] | <0x8dbcf489023412b59d84d2d9625ed26f57b7e40541c34f6f...> |

| Ext[2] | Signer_B |

| Ext[3] | <0xd218bc38f70d347f291700a25d2c2fe1f433a87c693551a5...> |

+--------+---------------------------------------------------------+

| A and B agreed to give C $1,000,000 |

+------------------------------------------------------------------+

Your application record the agreement in a entry that A and B would give C one million dollars. The statement is written in the content of the entry.

A and B, using keys associated with their digital identities (Signer_A, Signer_B) would sign that statement concatenated with the chain id (to prevent inter-chain replay) and the signatures are put in the external ids together with a reference to their respective identities. Without additional constraints the entry is malleable and an attacker can work around the global uniqueness of the entry:

+--------+---------------------------------------------------------+

| Ext[0] | Signer_B |

| Ext[1] | <0xd218bc38f70d347f291700a25d2c2fe1f433a87c693551a5...> |

| Ext[2] | Signer_A |

| Ext[3] | <0x8dbcf489023412b59d84d2d9625ed26f57b7e40541c34f6f...> |

+--------+---------------------------------------------------------+

| A and B agreed to give C $1,000,000 |

+------------------------------------------------------------------+

Identities and signatures are still valid, the content is valid, but the hash of this entry is different from the previous one effectively bypassing

the global uniqueness check supposed to protect you against intra-chain replay attacks. As a result this entry would be read as perfectly valid and A and B may have to give away another million to C…

There are several ways to fix this malleability issue:

Constrain the ordering of the external ids in the design of your application (e.g. signatures should follow the lexicographic order of the identities’ name)

Bind the signature to its external id index. That would mean adding the external id index as part of the data signed.

Now you may wonder what if A and B legitimately want to give another million to C, they would have to re-emit the same entry that would not be considered valid by the application. The solution is to add a nonce, some random bytes that would allow the entry hash to be different. For instance we can now put a random number in the first external id, which is followed like before by the identities and signatures.

+--------+---------------------------------------------------------+

| Ext[0] | <0x625ed26f57b7> |

| Ext[1] | Signer_A |

| Ext[2] | <0x8dbcf489023412b59d84d2d9625ed26f57b7e40541c34f6f...> |

| Ext[3] | Signer_B |

| Ext[4] | <0xd218bc38f70d347f291700a25d2c2fe1f433a87c693551a5...> |

+--------+---------------------------------------------------------+

| A and B agreed to give C $1,000,000 |

+------------------------------------------------------------------+

What did we just do? We just re-opened a malleability issue: an attacker can now put any random number in the first external id to change the entry hash and submit his forged entry… Now you may already know how to fix that: the signatures should also cover the nonce which would prevent an attacker to freely change it. The signature of A and B now covers: (chainId + nonce + entry content)

+--------+---------------------------------------------------------+

| Ext[0] | <0x625ed26f57b7> |

| Ext[1] | Signer_A |

| Ext[2] | <0x7a2e4ee6dda05a7c473afa08bb8a38809b1a202485277909...> |

| Ext[3] | Signer_B |

| Ext[4] | <0x39ddd603568428688c0ab1cdae72eb038fbcbcbc371df1e4...> |

+--------+---------------------------------------------------------+

| A and B agreed to give C $1,000,000 |

+------------------------------------------------------------------+

With this solution we are protected against inter-chain and intra-chain replay attacks while still allowing legitimate duplicate agreements, bingo!

Local uniqueness

An alternative to global uniqueness is local uniqueness. With global uniqueness your application needs to retain all the valid entry hashes that have ever been processed. With the local uniqueness design the application entries are valid only for a given time window which allows the application to only have to check for replay attacks for that limited time frame.

The idea is to add a a block height (or timestamp) as part of the entry and require that it is close enough to the block height (or timestamp) of the block the entry will be part of. That block height or timestamp in the entry needs to be covered by a signature so that it cannot be maliciously later modified, otherwise malleability once again would allow a replay attack. With this constraint the application only has to verify the uniqueness within an acceptable time window defined by the application, typically few blocks (or few hours if using a timestamp). The acceptable time range accounts for potential technical difficulties that would delay the entry insertion into the blockchain.

Let’s take an example of local uniqueness using block height with an acceptable range of height difference of [-3, +3]. Put another way the constraint we impose to an entry to be valid is:

|<block height written in the entry> - <block height the entry is part of>| <= 3

If an entry contains a reference to block height 9 and is actually part of the block of height 10, it passes the test. Because the blockchain is currently at height 10, the application only has to check duplicates in the entries of blocks in the range [10–6, 10] (6 being the size of the interval [-3, +3]) because we know that the entry we are are currently processing cannot possibly be a replay of a valid entry from a block with an height strictly inferior to 4.



Few notes:

It is a more complicated scheme than the global uniqueness one.

Because the entries are now timestamped with a limited lifespan it is slightly more inconvenient to create those entries in advance (offline for instance) which may be an hindrance for certain applications.

That approach is less “stateless” in the sense that the user creating the entry needs to either know the current block height or the current time.

Using block height vs timestamp: there is a small and subtle difference. In the rare case of a global Factom network stall (Factom prevents accidental forks by stalling the blockchain), the block height won’t continue to increase but the time will. It means that, when using a timestamp, the entries could expire if the network stall is too long and the entries only make it to the next block, which wouldn’t be the case when using a block height.

Conclusion

Here’s a recap list we recommend you go through every time you are designing an application on Factom to avoid all the pitfalls described in this article:

Evaluate if the “data integrity pattern” (using hash of data) could be suitable for your use case. It is a powerful, secure and well proven approach.

Always ask yourself if it is ok for the data your are storing on Factom to be public, immutable and available forever (and be cautious with symmetric encryption obfuscation).

In most cases you should use cryptographic signature to authenticate the origin of the data in a chain.

Use a digital identity (with multiple keys that can be rotated) to sign data instead of directly using a key pair.

Check that your design doesn’t allow for harmful entry malleability. Consider what could happen if data not covered by a signature is modified.

Check that your design doesn’t allow inter-chain and intra-chain replay attacks.

Examples

Here are some real world applications specifications that put in practice the patterns described in this article: