TL;DR:

The IOTA Foundation has decided to shift our focus to a new Smart Contract approach derived from our previous work. IOTA Smart Contracts is a layer 2 solution building on Coordicide’s development in GoShimmer. A brief introduction can be found here. Qubic’s further development requires significant resources and time, and our new approach will allow for broader and faster adoption. We have open-sourced a working Proof of Concept of the Qupla programming language, which can be run on FPGA hardware. We invite the open source community to explore it.

The state of Qubic

It has been a while since we had a comprehensive look at the current state of the Qubic project. This article will remedy that by going in depth into where we currently see the project, how far we have come, the lessons we have learned and how the IOTA Foundation will continue development of Smart Contracts going forwards.

Please note: We will assume the reader has a basic understanding of the Qubic system as well as the Qubic Computation Model article series. Some of the terms used in those resources may have changed a bit over time for clarity.

Qubic: a layered system

When you look deeper at Qubic it actually consists of two very loosely coupled layers that don’t seem to need each other all that much. Those two layers are:

The Smart Contract (SC) messaging layer, that runs as a level-2 protocol on top of the IOTA Tangle. This layer governs the execution of SCs by a committee of processing nodes. The nodes generate quorum consensus about processing results, and provide an audit trail on the Tangle. This layer does not care about how processing happens, just that the results are provably correct and verifiable, through the audit trail, SC code, input data and output data. The Abra Virtual Machine (VM) layer. This layer governs the actual execution of the SC code. It has a special focus on being able to execute on low-resource IoT devices. The layer takes input data, executes the (possibly hardware-accelerated) code, and outputs results. This layer does not care where the input data comes from, nor where the results are being sent.

These two separate layers are glued together by the Qubic Computation Model (QCM), an event-processing system. The QCM governs the dispatching of input data from the SC layer to the VM layer and the returning of the results from the VM layer to the SC layer. The QCM is a very generic model. It does not assume anything about event processing, where the event data comes from or where the processing results go. It’s just a simple dispatching mechanism that ensures a complete decoupling between layers that need to talk to each other.

Connecting the layers

So where then is the coupling of these two systems happening if they are decoupled almost completely? First there is the necessity to be able to verify computation results independently. This means that in addition to an audit trail for the input data and results of SC invocations you also need the corresponding SC code that was invoked to be a part of this audit trail. That way you know exactly what code to run with the input data if you want to verify the result.

Note that this would still not mandate the use of the Qubic-specific Qupla code and corresponding Abra VM to execute the SC. You could essentially use any VM and any programming language as long as the SC code is compatible with those. Our initial Proof of Concept implementation of the QCM actually mixes Qubic-specific Abra code with standard Java UI code in the Game of Life demo. But there is a very specific reason to be careful when choosing a VM to execute your SC code on. Because nodes run this code whenever data for the SC arrives automatically and unsupervised, you want to prevent runaway SC programs in the VM layer (unbounded loops/recursions). You need a mechanism to keep the SC program in check so that it runs only for a reasonable, predictable amount of time.

In Ethereum this runaway problem is solved by introducing gas fees. The amount of fees provided corresponds to a certain maximum amount of SC processing. Once that maximum is exceeded the SC execution will be aborted, thereby preventing runaway SC code from continuing indefinitely. The drawback of this method is that it can be hard to predict the amount of gas fees you will need for your SC up front if the SC does any kind of non-trivial processing. And if you provide too little gas fees this could result in your SC execution being aborted before it reaches completion. You have lost the gas fees in this case without ending up with an execution result.

In Qubic we solve this problem deterministically by having a functional data flow programming model in the Abra VM that does not allow unbounded loops/recursion and therefore is not in and by itself Turing-complete. Any function clearly defines how much processing it will do. There is no direct branching or looping in the Abra VM, which means any function will always execute the exact same number of instructions. Even when a function is called recursively it is limited to a certain maximum number of invocations. Unbounded loops can be enabled at a higher level, through the QCM, but the unit of execution is absolutely clear and guaranteed to be bounded at the function level. If you expect to ‘run out’ of invocations for a function you can design your code in such a way that the continuation state of the SC is part of the resulting output. And that in turn allows you to resume execution from that point in a next round of bounded invocations. We therefore say that execution through the QCM is Quasi-Turing-complete.

Note that even though the option of providing rewards exists as a potential incentive for processing, it is not a requirement in Qubic to provide the equivalent of Ethereum fees for processing. There could be other incentives that work just as well or even better. Like when an assembly is specifically designed to aggregate certain sensor data. Having an audit trail or even having redundant calculations could be incentive enough in such a case.

Abra Virtual Machine

The Abra VM programming model is very ambitious in that it does not only provide us with a programming model that can be made Quasi-Turing-complete, but it also moves away from standard opcode-based sequential instructions that are being processed by a complex CPU. Instead it provides us with maximum parallelization of operations through a data flow model that ultimately has only two types of operations that can be used to implement any function: lookup tables (LUTs) and mergers. These operations have been specifically selected for their ability to be easily instantiated directly in circuitry. The maximum parallelization of operations is only achieved when instantiating in circuitry. The only sequential data flow paths are formed by operations that depend directly on outputs of previous operations. Everything else can be executed in parallel. So you get maximal parallel sequential data flow paths.

The simplicity of the Abra VM programming model also means that it is very easy to create a software emulator (interpreter) that can run this VM on a standard processor. You will lose a lot of parallel processing opportunities in that case, but it will still come up with correct processing results. So for simple SCs that are invoked sparsely you could resort to a software emulated Abra VM. But if you need hardware acceleration you need to actually implement the VM in circuitry. This is where we envision three levels of hardware implementation.

Use existing FPGAs to implement the Abra VM. This has the advantage that you only need to create the VM circuitry once for each different type of FPGA and you can subsequently run arbitrary Abra VM code on such a device. This ability alone is ground-breaking. Currently you need a very capable PC that runs for a long time to layout circuitry for a program that runs on a specific type of FPGA. And that is something you need to do every time you change something to your program or when you target a different type of FPGA. Instead you now layout the circuitry for our open source Abra VM only once for each type of FPGA. Once you have that you can simply change the program that you want to run on the VM, recompile it, then load and run it on any FPGA that has implemented the Abra VM without further modification. Use an ASIC to implement the Abra VM directly. This means that we will be creating an open source ASIC design for an Abra VM. The advantage is that we avoid creating a programmable device that runs on another (proprietary) programmable device (FPGA). Instead of programming the building blocks of the FPGA we can directly create ASIC circuitry that implements the Abra VM. That not only means a speed increase, but also a huge decrease in the amount of necessary circuitry. Such an Abra VM ASIC could easily be used as a co-processor for current hardware or as the main processor for certain IoT applications. Once you have programmed specific code for the Abra VM and have verified that it works correctly you could also create an ASIC that implements the operations for that specific code as dedicated circuitry. You would lose the general programmability aspect of the VM implementation, but there are lots of use cases where once you deploy some hardware you will never change it any more. Most notably in sensors and other devices that you deploy in areas where you cannot easily access them after deployment. This is the IoT vision that we are ultimately working towards. The simplicity of the operations that make up the specific program code makes creating an ASIC relatively easy. And the generated circuitry actually allows for a number of crazy optimizations at the logic gate level because it no longer uses general-purpose programmable circuitry.

The real improvements of the Abra VM programming model, such as the reduced energy requirements, the data flow aspects, and the possibility of wave-pipelining will need this last level of implementation to really shine. The more general programmable levels come with certain design restrictions that will cause extra overhead.

Ternary encoding

You may have noticed that in all the above there has been no mention of the word ternary yet. And sure enough, it would be possible to implement this entire vision in binary. But part of the Qubic vision has always been to be able to go beyond the limitations of current execution models and extend hardware beyond Moore’s Law, which has arguably come to an end. To that end we have always had the vision of a ternary execution model. Even while there are no native ternary processors yet this model can still provide some advantages. The most notable immediate advantages are: data density and computation speed.

Our programming model works with circuitry and therefore it works with the lowest level unit of information. On binary systems that would be a bit. But the Abra data flow model also requires a representation of ‘absence of data’ (null), which is crucial to its decision logic mechanism. The normal way to represent that in binary systems would be to have an extra bit to indicate this state. That means that every bit of information needs to be represented by two bits to be able to encode all 3 possible mutually exclusive states (0, 1, and null). But since 2 bits can represent 4 states it only seems natural to use this to the fullest. By using ternary encoding we still need only 2 bits to represent the 4 mutually exclusive states (-, 0, 1, and null). That means that we can encode values more efficiently compared to binary encoding, which would otherwise waste one of these 4 possible states. When compared to ternary encoding, binary encoding needs about 50% more 2-bit information units to be able to represent the same value range. For example, 16 bits can encode values from -32768 to +32767, whereas only 11 trits can already encode values from -88573 to +88573

Once you need less information units to represent the same value range, certain calculations will also be that much faster. Take for example a simple ripple carry adder. To add two values it adds two corresponding information units, but that can result in a carry-over to the next unit’s addition. That means that if you need 50% more units to represent a value range, it will also take 50% longer to execute such an addition. The bottleneck here is the fact that each addition depends on the previous addition, because it needs to factor in the carry. So the next addition can only happen when the previous one is complete and the value of the carry is known. Now take this to another level: multiplication. When using the straight-forward way of multiplying each unit by each unit you get a 2-dimensional dependency, which means that the processing time using the same algorithm with binary encoding takes 150% x 150% or 225% of the time that it would take the algorithm when using ternary encoding.

The ternary encoding we decided to use for our prototype FPGA implementation is what we call 2b encoding. This encodes the 4 states (-, 0, 1, and null) as (10, 11, 01, and 00), respectively. We call this 2b encoding because, as discussed above, it uses 2 bits per information unit. We also have devised an alternative, what we call 3b encoding, which uses 3 bits per information unit, and encodes the 4 states as (100, 010, 001, and 000), respectively. The 3b encoding might seem more wasteful but it has a number of interesting properties that are more relevant to the third level of hardware implementation (direct code to circuitry), and will be explored more in the future. The Abra code is completely independent from the encoding used. The actual encoding used is ultimately dictated by the specific implementation of the Abra VM.

Both 2b and 3b encoding use all zeroes to encode the null state. This should help reduce the energy requirements of the hardware because the null state is the default state for all circuitry and usually most energy is spent changing states. Abra is designed to force decision paths to use null as early as possible, so that no data will propagate through most of the circuitry and that unused circuitry can stay at the default null state without switching.

‘Real’ ternary hardware

What the encoding will look like on actual native ternary hardware is currently hard to envision, because we don’t have any working ternary system to work with yet. One interesting idea that came out of the Qubic research is that, since ternary processors are supposedly working with 3 states, you could encode a binary bit and the required null state by using a single ternary trit. This would mean that ternary hardware can support our data flow model directly with binary information encoding. So instead of needing 2 wires and all associated circuitry per wire to represent a bit and a null flag you would only need a single wire and associated circuitry, thereby maybe even halving the circuitry/energy otherwise needed to do the same binary-encoded processing in binary hardware. Ternary hardware is currently primarily a mental exercise, but it is definitely an exciting pursuit for the future.

Hardware operations

The Abra programming model originally has 5 different explicit operations. 3 of those operations reduce to simple wiring once you implement them at the circuitry level. They are only necessary to group and pass information as trit vectors. But because at the circuitry level everything operates on individual trits those trit vectors are mostly conceptual. Therefore the concatenation and slicing operations are merely enforcing the correct grouping of information. The call operation is only present to be able to efficiently reduce code into a call tree, but once it gets implemented in circuitry every call operation will force the instantiation of the entire called function tree in circuitry and link up the input and output trit vectors to their source and destination vectors.

That leaves only 2 operations that will actually be implemented as logic circuitry. The first is the ternary lookup operation, which takes 3 input trits and outputs a single trit according to the corresponding lookup table. The second is the merge operation, which combines separate data flow inputs into a single output and can be implemented either as a very special lookup operation or by using simple OR gates. Only one input is allowed to be non-null and will become the output. The Qupla language that generates the Abra code ensures that only one path can be non-null. Any other tools that would directly generate Abrac code will be required to ensure the same.

The FPGA proof-of-concept

To facilitate the programming model on FPGA we programmed the FPGA to provide a bank of what we call QLUTs (Qubic Look-Up Tables). Each QLUT takes three 2b-encoded trits as input and then outputs yet another 2b-encoded trit through a configurable lookup table. Mergers are implemented within QLUTs by setting a special configuration bit which causes them to use a predefined lookup operation with slightly different semantics.

Of course the FPGA also contains the additional configuration logic that is necessary to be able to connect any input operation to any output operation. This logic can be configured by a special binary configuration file that can easily be generated directly from the VM code that needs to be run. The code will currently be converted on the fly when it gets sent to the FPGA. It contains the configuration data for each individual QLUT and the configuration data for the connections between QLUTs.

The QLUTs have already been tested by running the entire Qupla test suite in the emulator while hardware-accelerating a specific function. We allow the user to mark one function as hardware-accelerated and the Qupla environment will automatically load the configuration data for that function into the FPGA. Then when it runs the emulator, every time it encounters that specific function, instead of having the emulator calculate the result for that function, it passes the function input data to the FPGA and uses the resulting output data. We have tested several representative functions that way and are happy to report that this works perfectly.

The next thing that we will need is an FPGA version of the Qubic Supervisor, which is the entity that dispatches input data events to the correct VM code and passes on the calculated results. To avoid having to write a miniature operating system we will initially use a simple embedded Linux version that can be run on a CPU that is embedded on the FPGA. That way we can use certain functionality out of the box without having to develop it ourselves. We have managed to boot this Linux on the FPGA already and use it to handle the communication between the emulator and the FPGA using a simple socket protocol. The next step will be to implement the Supervisor and some additional glue code to allow communication with the outside world over HTTP instead.

The next stage would be to implement an HTTP listener on the FPGA’s embedded Linux that can respond to two types of requests. The first request type can be used to transfer the configuration file onto the FPGA and that will then be used to configure the programming model, set up an HTTP callback address, and initialize the Supervisor. The second request type is an input data request that can be used to trigger execution of a function with the provided input data by the Supervisor. Once the result has been calculated the Supervisor will send the resulting output data asynchronously to the HTTP callback address.

Note that this means that the initial end-to-end proof of concept (Qupla-to-FPGA, which was originally scheduled for end January 2020) will not yet have full Supervisor functionality but will only be able to set up and call a single arbitrary function on the FPGA and return its result. Loading and calling multiple functions and more complex scheduling through EEE will become the next step for implementation in the Supervisor code.

On the software side the Qupla compiler is now able to generate a specific version of the Abra code that more easily enables 1-to-1 conversion to the binary configuration file by eliminating all concatenate, slice, and call operations. We have verified that the resulting transformed code still works correctly by running the test suite with this code in the emulator. The emulator itself is now able to interface to the FPGA through the socket interface. It will send the generated configuration file to the FPGA and call this function when necessary. In the future the Qupla supervisor will be able to dispatch events to and from the FPGA through the HTTP interface.

Early lessons from the PoC

So far, everything works as intended. We created a completely new processing model that can do any kind of general processing. The model handles RAM storage just like any other kind of I/O, which means it is outside of the scope of the Abra code and passed off to entities that can store/retrieve data for us by using EEE. This requires a different way of looking at problems and it will be important to evaluate the resulting complexity and/or see if there is a possibility to hide the complexity in Qupla language constructs or if an even higher level computer language is necessary. Already the need for some kind of looping construct was felt, for example.

Two immediate hurdles stand out at the moment and will need a solution.

The current resource requirements of the QLEs on the selected FPGA (DE10-Nano) limit us to only 512 QLEs in total. This may be optimized a little but we don’t expect this to be increased to more than about double. Of course we could select a larger, more expensive FPGA but for the prototyping we want to keep it simple and affordable for people who would like to duplicate our efforts and play around with it. A direct result is that we can only load some smaller functions on the FPGA and the communication overhead when calling these functions mostly outweighs the processing gains. This was to be expected and we hope that once we move to the next step, an ASIC Abra VM, we can go orders of magnitude beyond this. The instantiation of function calls becomes prohibitive very rapidly. Due to the call tree nature of code, expansion of such a call tree has exponential growth. If X calls Y 3 times which in turn calls Z 4 times you will get 3 full Y instantiations, which cause 3x4 or 12 full Z instantiations. You can see how this quickly escalates. Solving this problem will have to be our main focus. There are currently a few ideas floating around but we haven’t looked at all the consequences for the data flow model yet.

By the way, the Abra emulator does not suffer from these problems because it emulates function instantiation by using an ordinary stack-based function call mechanism and stores the lookup-tables in separate memory from the lookup instructions. Current-day gigabyte RAM sizes would allow for millions of Abra VM instructions with no problems.

Conclusion

After a long trajectory with many unforeseen hurdles, we have proven that the Abra data-flow programming model works end-to-end, from the Qupla programming language to the execution of generated code on dedicated FPGA hardware. We invite everyone to play around with the Qupla repository and see what Qupla can do and how it runs in the emulator. For those that are more hardware-minded, you can play around with the Qubic HDL repository and actually run code on a DE10-Nano the way we do, or even adapt the code to your own preferred FPGA system.

A new direction for smart contracts on IOTA

While this work proves the vision behind Qubic and is a great step forward, there are still more questions to be resolved and significant development to be invested in order to make Qubic production-ready. Our extensive work on Qubic has yielded a great deal of valuable research and development, and forms the foundation for our approach going forwards. Accordingly, the IOTA Foundation has made the decision to stop development of Qubic and pivot our focus specifically to the Smart Contract layer.

The main reason for this decision is to allow IOTA to go to market faster, with a novel smart contract solution that works in conjunction with the Coordicide node software. We end up with a complete system — ready for broader and faster adoption.

IOTA Smart Contracts is what we call our Layer-2 Smart Contract solution. It is born out of the work described in this post and has been redesigned from the ground up. With IOTA Smart Contracts we intend to pursue a vision for smart contracts that are scalable and have low transaction costs.

We are currently working towards an alpha version of IOTA Smart Contracts through the GoShimmer implementation, and we will release more information on this new path soon. When the alpha is ready, it will be accompanied by a set of simple smart contract PoCs to get community developers started and demonstrate what is possible to kickstart a new era of Applications powered by IOTA.

Feel free to reach out to our engineering team directly in the Smart Contract channels on the IOTA Discord and ask questions related to this new path we are taking.