Math in Solidity (Part 1: Numbers)

This article opens a series of articles about doing math in Solidity. The first topic to discuss is: numbers.

Introduction

Ethereum is a programmable blockchain, whose functionality could be extended by publishing pieces of executable code, known as smart contracts, into the blockchain itself. This distinguishes Ethereum from the first generation of blockchains, where new functionality requires client software to be modified, nodes to be upgraded, and the whole blockchain to be forked.

A smart contract is a piece of executable code published on-chain, that has a unique blockchain address assigned to it. Smart contract controls all the assets belonging to its address and may act on behalf of this address when interacting with other smart contracts. Each smart contract has persistent storage that is used to preserve the smart contract state between invocations.

Solidity is the primary programming language for smart contract development on Ethereum, as well as on several other blockchain platforms that use Ethereum Virtual Machine (EVM).

Programming was always about math, blockchain was always about finance, and finance was about math since ancient times (or maybe math was about finance). Being the primary programming language for Ethereum blockchain, Solidity has to do math well.

In this series, we discuss various aspects of how Solidity does the math, and how developers do math in Solidity. The first topic to discuss is: numbers.

Numeric Types in Solidity

In comparison to mainstream programming languages, Solidity has quite many numeric types: namely 5,248. Yes, according to the documentation, there are 32 signed integer, 32 unsigned integer, 2592 signed fixed-point, and 2592 unsigned fixed-point types. JavaScript has only two numeric types. Python 2 used to have four, but in Python 3 type “long” was dropped, so now there are only three. Java has seven and C++ has something about fourteen.

With so many numeric types, Solidity should have proper type for everybody, right? Not so fast. Let’s look at these numeric types a bit closer.

We will start with the following question:

Why Do We Need Multiple Numeric Types?

Spoiler: we don’t.

There are no numeric types in pure math. A number may be integer or non-integer, rational or irrational, positive or negative, real or imaginary etc, but these are just properties, the number may or may not have, and single number may have several such properties at once.

Many high-level programming languages have single numeric type. JavaScript had only “number” until “BigInt” was introduced in 2019.

Unless doing hardcore low-level stuff, developers don’t really need multiple numeric types, they just need pure numbers with arbitrary range and precision. However, such numbers are not natively supported by hardware, and are somewhat expensive to emulate in software.

That’s why low-level programming languages and languages aimed at high performance usually have multiple numeric types, such as signed/unsigned, 8/16/32/64/128 bits wide, integer/floating-point etc. These types are natively supported by hardware and are widely used in file formats, network protocols etc, thus low-level code benefits from them.

However, for performance reasons, these types usually inherit all the weird semantics of underlying CPU instructions, such as silent over- and underflow, asymmetric range, binary fractions, byte ordering issues etc. This makes them painful in high-level business logic code. Straightforward usage often appears insecure, and secure usage often becomes cumbersome and unreadable.

So, the next question is:

Why Does Solidity Has So Many Numeric Types?

Spoiler: it doesn’t.

EVM natively supports two data types: 256-bit word and 8-bit byte. Stack elements, storage keys and values, instruction and memory pointers, timestamps, balances, transaction and block hashes, addresses etc are 256-bit words. Memory, byte code, call data, and return data consist of bytes. Most of the the EVM opcodes deal with words, including all math operations. Some of the math operations treat words as signed integers, some as unsigned integers, while other operations just work the same way regardless of whether arguments are signed on unsigned.

So EVM natively supports two numeric types: signed 256-bit integer and unsigned 256-bit integer. These types are known in Solidity as int and uint respectively.

Apart from these two types (and their aliases int256 and uint256 ) Solidity has 62 integer types int<N> , and uint<N> , where <N> could be any multiple of 8 from 8 to 248, i.e. 8, 16, …, 248. On EVM levels, all these types are backed by the same 256-bit words, but result of every operation is truncated to N bits. They could be useful for specific cases, when particular bit width is needed, but for general calculations these types are just less powerful and less efficient (truncating after every operation is not free) versions of int and uint .

Finally, Solidity has 5184 fixed-point types fixedNxM and ufixedNxM where N is multiple of 8 from 8 to 256 and N is an integer number from 0 to 80 inclusive. These types are supposed to implement decimal fixed-point arithmetic of various range and precision, but as of now (Solidity 0.6.2) the documentation says that:

Fixed point numbers are not fully supported by Solidity yet. They can be declared, but cannot be assigned to or from.

So fixed-point numbers as well as fractions numbers in general are not currently supported.

Then, the next question is:

What If We Need Fraction Numbers or Integers Bigger Than 256 Bit?

Spoiler: you have to emulate them.

One would say, that 256 bit ought to be enough for anybody. However, once most of the numbers in Ethereum are 256 bit wide, even simple sum of two numbers may be as wide as 257 bit, and product of two numbers may be up to 512 bit wide.

Common way to emulate fixed or variable width integers numbers, that are wider, than types natively supported by programming language, is to represent them as fixed or variable length sequences of shorter, natively supported integer numbers. So bit image of wide integer is the concatenation of bit images of shorter integers.

In Solidity, wide integers may be represented as fixed or dynamic arrays whose elements are either bytes or uint vales.

For fractions situation is a bit more complicated, as as there are different flavors of them, each having its own advantages and drawback.

The most basic are simple fractions: just one integer, called “numerator”, divided by another integer, called “denominator”. In Solidity simple fraction could be represented as a pair of two integers, or as a single integer, whose bit image is the concatenation of bit images of numerator and denominator. In the latter case, numerator and denominator has to be of the same width.

Another popular format for fractions is fixed-point numbers. Fixed-point number is basically a simple fraction whose denominator is a predefined constant, usually power of 2 or 10. The former case is known as “binary” fixed-point, while the latter is known as “decimal” fixed-point. As long as denominator is predefined, there is no need to specify is explicitly, so only the numerator need to be specified. In Solidity fixed-point numbers are usually represented as a single integer numerator, while commonly used denominators are 10¹⁸, 10²⁷, 2⁶⁴, and 2¹²⁸.

Yet another well-known format for fraction numbers is floating-point. Basically, floating point number could be described as following:

where m (mantissa) and e (exponent) are integers, and B (base) is a predefined integer constant, usually 2 or 10. The case when B=2 is known as “binary” floating-point, and the case when B=10 is known as “decimal” floating-point.

IEEE-754 standardizes several common floating-point formats, including five binary formats known as “half”, “single”, “double”, “quadruple”, and “octuple” precision. Each of these formats packs both, mantissa and exponent, into single sequence of 16, 32, 64, 128, or 256 bits respectively. In Solidity, these standard formats could be represented by binary types bytes2 , bytes4 , bytes8 , bytes16 , and bytes32 . Alternatively, mantissa and exponent could be represented separately as a pair of integers.

And the file question for this section:

Do We Have to Implement All This by Ourselves?

Spoiler: not necessary.

The good news is that there are Solidity libraries for various number formats, such as: fixidity (decimal fixed-point with arbitrary number of decimals), DSMath (decimal fixed-point with 18 or 27 decimals), BANKEX Library (IEEE-754 octuple precision floating-point), ABDK Libraries (binary fixed-point and quadruple precision floating-point) etc.

The bad news is that different libraries use different formats, so it is really hard to combine them. The roots of this problem will be discussed in the next section.

Numeric Literals in Solidity

In the previous section we discussed how numbers are represented at run time. Here we will look at how they are represented at the development time, i.e. in the code itself.

Compared to mainstream languages, Solidity has quite a rich syntax for numeric literals. First of all, good old decimal integers are supported, such as 42 . As in other C-like languages, there are hexadecimal integer literals, like 0xDeedBeef . So far so good.

In Solidity, literals may have unit suffix, such as 6 ether , or 3 days . A unit, is basically a factor, the literal is multiplied by. Here ether is 10¹⁸ and days is 86,400 (24 hours × 60 minutes × 60 seconds).

Apart from this, Solidity supports scientific notation for integer literals, such as 2.99792458e8 . This is quite unusual, as mainstream languages support scientific notation for fractional literals only.

But probably the most unique feature of the whole Solidity language, is its support for rational literal expressions. Virtually every mature compiler is able to evaluate constant expressions at compile time, so x = 2 + 2 does not generate add opcode, but is rather equivalent to x = 4 . Solidity is able to do this as well, but actually, it goes far beyond that.

In mainstream languages, compile-time evaluation of constant expression is just an optimization, so constant expression is evaluated at compile time exactly the same way as it would be evaluated at run time. This makes it possible to replace any part of such expression with named constant or variable holding the same value, and get exactly the same result. However, for Solidity this is not the case.

At run time, division in Solidity rounds result towards zero, and other arithmetic operations wraps on overflow, while at compile time, expressions are evaluated using simple fractions with arbitrary large numerator and denominator. So, at run time, expression ((7 / 11 + 3 / 13) * 22 + 1) * 39 would be evaluated to 39, while at compile time the very same expression is evaluated to 705. The difference is because at run time, 7 / 11 and 3 / 13 are rounded to zero, but at compile time, the whole expression is evaluated in simple fractions without any rounding at all.

Even more interesting, the following expression is is valid in Solidity: 7523 /48124631 * 6397 , while this is not valid: 7523 / 48125631 * 6397 . The difference is that the former evaluates to integer number, while the latter evaluates to non-integer. Remember, that Solidity do not support fractions at run time, so all literals have to be integer.

While fractional numbers and big integers may be represented in Solidity at run time, as described in the previous sections, there is no convenient way to represent them in the code. This makes any code, that performs operations with such numbers, rather cryptic.

As long as Solidity does not have a standard fixed-point nor floating-point format, every library uses its own, which makes libraries incompatible with each other.

Conclusion

Smart contracts did math since the very beginning of the Ethereum blockchain. From simple percents to complicated derivative valuation. However, Solidity, the primary language for smart contract development, does not go much further, in terms of math, then just exposing what EVM opcodes are able to do.

Libraries try to cover what basic language misses, but suffer from the absence of standardized number formats.

The core language has some unmatched features, but at the same time lacks support for basic, must-have things.

In the subsequent articles, we will discuss how to deal with all this, and the next topic to discuss will be: overflow.