Introduction

Neo SPCC is working on a performance test bench for Neo blockchain nodes testing. This project is a spin-off of neo-go (Neo node in Go) development started to answer some questions that the team had over the node potential and ability to sustain high load in networked and isolated environments.

But the project is made to be agnostic to any particular node implementation (and potentially supporting heterogeneous setups) and thus can also be used to compare different implementations and detect bottlenecks. Several metrics are being measured, the most important one is a number of transactions per second that the network (in various scenarios) can process, but we also measure resource consumption and count other things like missed transactions.

Today we present some initial results from this testing where we compare the original Neo2 C# node implementation with the one written in Go.

Testing setup and methods

The setup consists of Neo private network in two variants: a single node consensus and four-nodes consensus with one optional Neo RPC node.

The transaction generator can function in two different modes:

1) Using one sending thread with fixed requests per second (RPS) limit

2) Using multiple sending threads working at their maximum possible rate (limited by the RPC node ability to accept requests).

These transactions are then subsequently redistributed to the consensus nodes via P2P protocol.

Each node (including transaction generator) is a separate Docker container running on the same machine (2,9 GHz Quad-Core Intel Core i7, 16 GB 2133 MHz LPDDR3; Docker 19.03.5, Golang 1.13.5 alpine, Dotnet 3.0-runtime-stretch-slim ).

Transactions are invocation transactions without any UTXO inputs or outputs that contain a simple “PUSH1” script.

We measure such metrics as TPS, CPU and memory usage.

We monitor the network via RPC polling to get and analyze new blocks. This monitoring continues until either we see all transactions packed into blocks or timeout occurs.

To compare two implementations we use C# node version 2.10.3 with the installed Neo plugin ImportBlocks and neo-go node version 0.71.1-pre (will be released as 0.72). The same node versions are being used both for consensus and RPC.

Runs:

Single-node consensus: 10 workers (Case 2, C# and Golang nodes) Single-node consensus: 30 workers (Case 2, C# and Golang nodes) Single-node consensus: 100 workers (Case 2, Golang node) Single-node consensus: 25 Ops/s (Case 2, C# and Golang nodes) Single-node consensus: 1000 Ops/s (Case 2, Golang nodes) Four-nodes consensus: 10 workers (Case 1, C# and Golang nodes) Four-nodes consensus: 30 workers (Case 1, C# and Golang nodes)

Results for single-node consensus test: 10 workers

In this case, we observe that the Golang node shows a higher TPS (Golang node: ≈1320 TPS; C# node: ≈968 TPS).

At the same time, Golang nodes have noticeably lower memory and CPU consumption. The load is easily maintained by both C# and Golang Neo nodes.

Results for single-node consensus test: 30 workers

In this case, we observe that the Golang node shows a noticeably higher TPS than C# node (Golang node: ≈2046 TPS; C# node: ≈934 TPS). Since the total number of transactions was set as fixed for the test, the test for Go-node completed earlier (proportionally to TPS).

At the same time, the Golang node has more CPU consumption because of its ability to efficiently distribute calculations across all cores. The resulting memory consumption at the end of the test is almost equal.

The load is easily handled by the Golang Neo node. Run of the test with more workers with the current C# Neo node has failed — the node cannot handle the load.

Results for single-node consensus test: 100 workers

C# node’s RPC subsystem can’t handle more than 30 worker threads simultaneously pushing requests to it. Golang node can process load generated by more than 100 workers and show average TPS as 1931 Tx/s.

Results for single-node consensus test: 25 Ops/s

In this case, we observe that the TPS for both nodes is equal (≈25 Tx/s).

At the same time, the Golang node has slightly lower CPU consumption. And Golang node has slightly higher memory consumption, but it can be a result of a very low load (a go-node without load consumes a bit more memory than a C# node without load).

The load is sustained by both C# and Golang Neo nodes. But Attempt to run tests with more than 50 Ops/s on C# Neo Node has failed — the node cannot handle the load.

Results for single-node consensus test: 1000 Ops/s

Golang node can process load generated of more than 1000 Ops/s and show average TPS as 1038 Tx/s.

Results for four-nodes consensus test: 10 workers

In the current run, separate workers (threads) to send the maximum possible rate are used.

In this case, with a fixed total number of transactions, the TPS of the Golang node is much higher. At the same time, empty blocks (near the end of the test) are observed in the C# case, despite the fact that some transactions should still be present in the mempool. This C# node behavior is reproducible. Also, 3672 txs have not been processed at all (of 25888 txs) during the test on the C# node — this behavior needs additional research.

The average TPS on the Golang node is around 344 tx/s and for C# average TPS is around 44 tx/s on the current test system.

Nodes have almost a similar CPU consumption.

Golang node has lower memory consumption.

Results for four-nodes consensus test: 30 workers

The TPS of the Golang node is much higher as in the previous run. 563 txs have not been processed at all (of 26556 txs) during the test on the C# node — this behavior needs additional research.

The average TPS on the Golang node is around 343 tx/s and for C# average TPS is around 35 tx/s on the current test system.

Nodes have almost a similar CPU consumption.

Golang node has lower memory consumption.

Summary

This is the first one of the planned comparison scenarios. These test runs confirm that the Golang node can work on a par with reference C# implementation. At the current moment, we observe that the Golang node shows higher TPS than the C# node under the same load.

We observe C# node’s limited possibility to handle high RPC requests load most probably due to issues in the RPC component. C# node can gain a significant TPS increase (up to 2x) and improve node stability under high load if this RPC problem is to be solved. On the other hand, it is necessary to limit the maximum possible load to limit the resources consumed (https://github.com/neo-project/neo/issues/1393).

This issue has been identified for Neo 2.0 and is a good point for improvements to the C# node in Neo 3.0. Neo 3.0 already contains a large number of improvements that will show significantly better results. We hope to show this on comparative test runs soon.

In the future, we plan to allocate a separate test bench for runs on more efficient hardware. And adopt the benchmark to work with Neo 3.

We hope that this new test environment will simplify the work of the Neo community and all implementations of Neo 3.0 will move together to new performance achievements, surpassing all other existing blockchain solutions.

The code of the benchmark will be published in our GitHub soon and can be used for improvement during Neo 3 main C# node development.