Binary-to-text encoding is the encryption of data in plain text. It’s an encoding of binary data within a sequence of printable characters. These encodings are vital components for the transmission of data when the channel does not authorize binary data, like with email or NNTP, or if it isn’t 8-bit clean. This article will explain a specific version of this encoding method, which is called ‘Base58Check.’

What is it?

‘Base58Check’ is a Bitcoin method that converts 160-bit hashes into P2PKH and P2SH addresses. It is also useful for tasks such as encoding private keys for backup in the WIP (Work-in-progress) format.

It is basically a modified version of Base58 binary-to-text encoding. This method is a valuable tool for encoding byte arrays existing in Bitcoin into human-typable strings.

Breaking it down

To provide further context, the root of this method – Base58 – should be explained. It is a group of binary-to-text encoding schemes that represent large integers as alphanumeric text. Since its introduction by Satoshi Nakamoto, other cryptocurrencies and applications are applying it to their systems.

Its design is pretty similar to Base64, which represents binary data in an ASCII (American Standard Code for Information Interchange) string format. It does this by translating the data into a radix-64 representation. However, Base58 has modifications that help it avoid both non-alphanumeric characters and letters.

These symbols will often look ambiguous when an entity prints them out. So, it is for human users who manually enter the data and copy from some visual source. Additionally, it allows for easy ‘copy-and-paste’ because double-clicking will typically select the whole string.

Base56: what’s the difference?

When you compare it to Base64, the following letters – which are very similar – are absent:

0 (zero)

O (capital o)

I (capital i)

l (lowercase L)

Other missing symbols include the non-alphanumeric characters + (plus) and / (slash).

Contrary to Base64, the digits of the encoding don’t line up with byte boundaries of the original data. Because of this, the method is useful for encoding large integers, but it’s not suitable for encoding longer portions of binary data. The order of letters in the alphabet depends entirely on the application. This is why the term “Base58” alone isn’t enough to fully illustrate the format. A variation, Base56, excludes 1 (one) and o (lowercase o) in comparison to Base58.

With all of this in mind, the definition of Base58Check should now be considerably clearer. It is an encoding format that unambiguously encrypts the type of data in the first few characters. In addition, it includes an error detection code in the last few characters.

Background of encoding

The original Bitcoin client source code thoroughly explains the reasoning behind Base58 coding:

// Why base-58 instead of standard base-64 encoding?

// – Don’t want 0OIl characters that look the same in some fonts and

// could be used to create visually identical looking account numbers.

// – A string with non-alphanumeric characters is not as easily accepted as an account number.

// – E-mail usually won’t line-break if there’s no punctuation to break at.

// – Double-clicking selects the whole number as one word if it’s all alphanumeric.

Features

There are an array of features that Base58Check possesses:

A payload of arbitrary size.

A set with a total amount of 58 alphanumeric symbols. These symbols consist of distinguishable uppercase and lowercase letters. The four previously mentioned symbols (0, O, I, l) are not used.

One byte of version/application information. Bitcoin addresses use 0x00 for this particular byte. Future ones may potentially use 0x05.

Four bytes (32 bits) of error checking code drawing from SHA-256. This code can be a tool for automatically detecting – and possibly correcting – typographical errors.

An additional step for the preservation of leading zeros in the data.

Creating a string

When it comes to the creation of a Base58Check string, it is done from a version/application byte and payload. The steps of the process go as follows:

Take the version byte and payload bytes, and then link them together (bytewise). Take the first four bytes belonging to SHA-256 (the results of step #1). Connect the results from step #1 and the results from step #2 together (bytewise). Treat the results of step #3 – a collection of bytes – as a singular big-endian bignumber. Convert this to Base58 using normal mathematical steps (bignumber division) and the Base58 alphabet, which we will discuss later. The outcome should not be retaining any leading Base58 zeros (character ‘1’). The leading character ‘1’, which has a zero value in Base58, represents the entirety of a leading zero byte. When it is in a leading position, it has no discernible value as a Base58 symbol. There can be one or more leading ‘1’s if it’s necessary to represent one or more leading zero bytes. Count the total number of leading zero bytes that were the result of step #3. For older Bitcoin addresses, there’s always at least one for the version/application byte. For newer addresses, there will never be any. Each leading zero’s character ‘1’ serves as its representative in the final result. Connect the ‘1’s from step #5 with the results deriving from step #4. The outcome here is the Base58Check result.

Address encoding

The implementation of Bitcoin addresses is done using the Base58Check encoding of the hash of either. Below are the details:

Pay-to-script-hash (p2sh): payload is: RIPEMD160(SHA256(redeemScript)). Where “redeemScript” is a script the wallet knows how to spend; version 0x05 (these addresses begin with the digit ‘3’)

Pay-to-pubkey-hash (p2pkh): payload is RIPEMD160(SHA256(ECDSA_publicKey)). Where “ECDSA_publicKey” is a public key the wallet knows the private key for; version 0x00 (these addresses begin with the digit ‘1’)

The resulting hash in these cases always equals out to 20 bytes. These are big-endian, which means the most significant byte first.

Something important to note is that you should beware of bignumber implementations that clip leading 0x00 bytes. Alternatively, ones that prepend extra 0x00 bytes to indicate sign, which would lead to the permanent coin loss. Your code must be able to handle the latter case properly or you might generate valid-looking addresses that can be sent to, but not spent from.

Private key encoding

Base58Check encoding is also very useful for encoding ECDSA private keys in the wallet import format. The formation of this is exactly the same as how you would from a Bitcoin address. However, 0x80 is used for the version/application byte and the payload is 32 bytes, not 20. Additionally, a private key in Bitcoin is a singular 32-byte unsigned big-endian integer. For private keys in association with an uncompressed public key, such encodings will produce a 51-character string that starts with ‘5’. Or, to be more specific, either ‘5H’, ‘5J’, or ‘5K’.

Symbols

The Base58 symbol chart that Bitcoin uses is specific to the Bitcoin project. Its intention is to not be the same as any other Base58 implementation outside the context of Bitcoin.

The characters omitted are 0, O, I, and l.

Value 0 = Character 1

Value 1 = Character 2

Value 2 = Character 3

Value 3 = Character 4

Value 4 = Character 5

Value 5 = Character 6

Value 6 = Character 7

Value 7 = Character 8

Value 8 = Character 9

Value 9 = Character A

Value 10 = Character B

Value 11 = Character C

Value 12 = Character D

Value 13 = Character E

Value 14 = Character F

Value 15 = Character G

Value 16 = Character H

Value 17 = Character J

Value 18 = Character K

Value 19 = Character L

Value 20 = Character M

Value 21 = Character N

Value 22 = Character P

Value 23 = Character Q

Value 24 = Character R

Value 25 = Character S

Value 26 = Character T

Value 27 = Character U

Value 28 = Character V

Value 29 = Character W

Value 30 = Character X

Value 31 = Character Y

Value 32 = Character Z

Value 33 = Character a

Value 34 = Character b

Value 35 = Character c

Value 36 = Character d

Value 37 = Character e

Value 38 = Character f

Value 39 = Character g

Value 40 = Character h

Value 41 = Character i

Value 42 = Character j

Value 43 = Character k

Value 44 = Character m

Value 45 = Character n

Value 46 = Character o

Value 47 = Character p

Value 48 = Character q

Value 49 = Character r

Value 50 = Character s

Value 51 = Character t

Value 52 = Character u

Value 53 = Character v

Value 54 = Character w

Value 55 = Character x

Value 56 = Character y

Value 57 = Character z



Algorithm

The algorithm for encoding address_byte_string (consisting of 1-byte_version + hash_or_other_data + 4-byte_check_code) is the following:

Version bytes

code_string = “123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz” x = convert_bytes_to_big_integer(hash_result) output_string = “” while(x > 0) { (x, remainder) = divide(x, 58) output_string.append(code_string[remainder]) } repeat(number_of_leading_zero_bytes_in_hash) { output_string.append(code_string[0]); } output_string.reverse();

Here is a list of just some of the most common version bytes:

Decimal version: 0 – Leading symbol: 1 – Use: Bitcoin public key hash

0 – 1 – Bitcoin public key hash Decimal version: 5 – Leading symbol: 3 – Use: Bitcoin script hash

5 – 3 – Bitcoin script hash Decimal version: 21 – Leading symbol: 4 – Use: Bitcoin (compact) public key (proposed)

21 – 4 – Bitcoin (compact) public key (proposed) Decimal version: 52 – Leading symbol: M or N – Use: Namecoin public key hash

52 – M or N – Namecoin public key hash Decimal version: 128 – Leading symbol: 5 – Use: Private key

128 – 5 – Private key Decimal version: 111 – Leading symbol: m or n – Use: Bitcoin testnet public key hash

111 – m or n – Bitcoin testnet public key hash Decimal version: 196 – Leading symbol: 2 – Use: Bitcoin testnet script hash

Conclusion

Base58Check is a complex topic. There are a lot of technical details to it, but it’s all worth it to learn about this solid method of encoding.