Introduction

Protocols for the exchange of financial information form the foundation of modern financial markets. As “the TCP/IP of financial markets,” the FIX protocol relies on English-prose specifications even for its session-level behaviour (e.g., see this link). The only way to ensure that the rules described therein are clear, complete and logically consistent, is to create a formalised model of this protocol logic and mathematically verify its key properties.

At Aesthetic Integration we’ve created Imandra — an automated reasoning engine allowing one to formally (mathematically) verify statements about programs. At its lowest level, Imandra analyses programs written in the Imandra Modeling Language (IML), a subset of OCaml for which we’ve built a mechanised formal semantics. IML defines a functionally pure part of OCaml that can be treated as formal mathematical logic. Imandra is able to reason about code written in IML and, being valid OCaml, the same code can be compiled and executed with standard OCaml.

In this post we’ll describe how we’ve taken the English-prose specification of the FIX 4.4 protocol and created a formal model of this logic in IML and verified the key claims the FIX specification makes (with nuances, of course). What’s special is the ease with which such deep work can be done with Imandra. Indeed, without Imandra, work of this nature is typically only accessible to teams of PhDs (similar to those that work at NASA verifying safety of autopilots). Imandra breaks down the many barriers to this approach, democratising formal verification and clearing the way to further engineering rigour within financial markets.

Financial Information eXchange (FIX) Protocol

The Financial Information eXchange (FIX) protocol is the most widely used protocol for trading within financial markets. The most popular syntax for sending FIX messages is a sequence of tag=value pairs separated by an ASCII \001 character. For example, the following snippet contains a simple NewOrderSingle message (typically used to request a new order to be created by the service provider). For readability, we’ve replaced the \001 character with a vertical bar | :

8=FIX.4.4|9=110|35=D|34=3|49=BANZAI|52=20180531-11:39:25.885|56=IMANDRA|11=ID|21=3|40=1|54=1|55=INTC|60=20180531-11:39:25.885|10=054|

Here’s what it says: the 35=D field tells us that this is a NewOrderSingle message as 35 is a tag for the “message type” and D is a value that encodes NewOrderSingle . Other fields of the message are decoded in a similar manner: 60=20180531–11:39:25.885 tells us the transaction time (tag 60 ) of the order and 54=1 refers to the side (tag 54 ) of our order which is “Buy” (value code 1 ).

The very first field of this message, 8=FIX.4.4 , specifies the version of the protocol used in this transmission. Version 4.4 of the FIX protocol is perhaps the most popular version, so we’ll cover it here.

Over time, to take into account expanding electronification of financial services, the FIX community has introduced more and more message types, field tags and ways to encode field values. For example, FIX.4.4 describes 84 message types with 956 field tags. These rules are not typically followed precisely – virtually all firms have their own ‘views’ on what FIX protocol should be. This manifests itself in two ways:

They add custom fields and values, sometimes contradicting the original specifications;

Quite often, they mix message types from different protocol versions. For example, FIX 4.4 has a OrderMassCancelRequest message (4.2 does not) which is used within specs that are supposed to adhere to version 4.2.

Model architecture

All FIX messages are split into two categories: Application-level and Administrative (or session)-level.

Application-level messages

These are used to carry actual business-related information. Most commonly these are order creation and cancellation messages, various information requests and execution reports, etc. These are the ones that exhibit the most variation between systems implementing FIX.

Dealing with business-related messages requires quite a lot of bookkeeping. To make it simple for our users to encode their business rules quickly, yet precisely, we’ve created a domain specific language — the Imandra Protocol Language (IPL). IPL consolidates the description of all the business-dependent parts of the system in a compact and comprehensible way. From the IPL model, the IPL compiler automatically generates the OCaml/IML code for the parsers-encoders, type-converters and the fix-venue-model , while all the session-level logic is covered by the fix-engine model we’ll cover below.

OCaml (hence IML) is a strictly typed language and even before our work on formalisation of FIX specifications, we’ve discovered a number of inconsistencies in the FIX XML data dictionaries at the type-checking level. For example, the MiscFeeType field, introduced in FIX.4.0, is defined to have a “char” type:

But in later versions new values are added to the specification, that seem to assume that the value encoding is integer:

The IPL compiler automatically finds and corrects such typing errors in the FIX protocol data dictionaries.

You can read more about the IPL language here and here. We will now continue examining the fix-engine session-level model and the ways we can use Imandra to reason about it.

Administrative or session-level messages

These handle the low-level details of session control – ensuring that the business (application) messages arrive in the correct order and are recovered if lost.

There are Logon and Logoff messages for starting and terminating the session, Heartbeat and TestRequest for monitoring the connection, and ResentRequest and GapFill messages for retrieving missing messages if a sequence gap was detected. Representing the session-level logic formally is a complex task, which is explained in detail in the discussion that follows.

Creating a formal model of the FIX protocol

Analogously to the separation between application- and session-level logic, we’re going to create two models. The fix-engine model will be covered in this post. This model contains session-level logic. The second model, fix-venue-model , is for application logic and is generated by IPL models as describe above. The two models are loosely coupled and can be formally verified separately (or analysed modularly in tandem).

In order to be able to compile and execute the code we also need two communication layers:

At the front-end of the system, we have FIX parsers and encoders that read/write the raw messages on the wire and convert them to a strictly-typed IML representation. The layer is also responsible for detecting garbled/malformed messages and informing fix-engine about them.

about them. Between the fix-engine and the fix-venue-model a type converter is needed to translate between datatypes used to represent message data in both models —in the fix-venue-model only business-logic relevant information is kept in message datatypes, while in fix-engine we keep all fields and messages of the FIX specification.

The fix-engine model is almost completely independent of the business logic of the modelled system. It only requires administrative information in the headers of application messages and passes business payload to the fix-venue-model . On the other hand in the fix-venue-model , the parsers-encoders and type-converters are all highly dependent on the business logic.

Modeling the fix-engine

At Aesthetic Integration we’ve adopted a standard way to construct formal models of message-driven systems. At the top of the model we have a single datatype that holds all the data needed to describe the system at a given moment, including incoming and outgoing messages. We call this record type state . Together with this state type we define a one_step transition state -> state function, which performs a single logically isolated step of the simulation and returns the new state after the transition.

Following this pattern, for the fix-engine we introduce a global record type fix_engine_state which contains all the necessary information to describe the state of the FIX session. The fix_engine_state includes the fields for incoming and outgoing FIX messages and another pair of fields for the so-called “internal” messages:

Internal messages are necessary to inform the fix-engine of other non-FIX events that might occur. Most notable is the IncIntMsg_TimeChange

internal message — it is the “clock tick” event that allows the model to deal with various time delays. Communication of the application data with the business-level fix-venue-model also happens via internal messages: if a FIX application message is received normally, then its payload is wrapped into an OutIntMsg_ApplicationData internal message and placed into outgoing_int_msg . And vica-versa, the model can communicate outgoing business data, using the IncIntMsg_ApplicationData internal message.

The state record also contains a “mode” entry of fix_engine_mode type:

It describes the current “mode” of operation of the fix-engine model at a given moment. The behavior of the model can then be roughly described as a state-machine transitioning between possible modes.