Conclusion

Tech journalist Igor Oskolkov of 3DNews.ru has recently tested publicly available version of the evaluation board or, as the vendor calls it, the developer software-hardware complex, under the code name BFK 3.1 with the Russian SoC Baikal-T1 based on the MIPS P5600 Warrior architecture. Here goes the English translation of his text, that was first published in Russian by servernews.ru To begin with, we should emphasize that BFK 3.1 is a developer software-hardware complex (we will keep calling it a board) and not the basis for building final products. No one sane will use this board to build a system. First, such system will be unreasonably expensive. Secondly, such an idea is rather meaningless. This board is needed to develop and debug software, and to evaluate the processor performance and its compatibility with other equipment. It is incorrect to compare BFK 3.1 with single-board micro computers like Raspberry Pi or Cubieboard, although formally it is very close to them.Similar development boards are offered by other processor vendors. Depending on their type and embedded equipment the prices for such boards can vary from hundreds to thousands of US dollars. What is important with BFK 3.1 board, that it is the first board with Baikal-T1 SoC, that is available practically to everyone, and not limited to a narrow circle of individuals and organizations, as it was used to be with the Russian processors before. It costs quite a fortune — 650 USD — it is expensive, but the vendor can’t offer a lower price, taking into account the relatively small volume of this boards production.What the customers gets after registering the product is the access to the limited-access engineering documentation library and the printed circuit board design in Altium Designer format. This will significantly accelerate and simplify the process of creating one’s own hardware and software solutions based on BFK 3.1 boards and Baikal-T1 SoCs.The board itself is equipped with a processor with a basic strapping. Almost all the processor interfaces have outputs to the board, only a 10GbE port has not. The board itself measures 229 × 191 mm (FlexATX). It has two SATA-3 ports (controller version 3.1), one SO-DIMM slot for the DDR3–1600 memory module, two Gigabit Ethernet RJ-45 ports, one USB 2.0 Type A connector, two Mini-B USB ports needed for debugging, one PCI-E 3.0 x4 connector. There also is a 40-pin GPIO (its main controller is 32-bit).For power supply one needs any ATX 2.0 module with 200 watts or higher. This value is clearly given with a huge margin, even taking into account the consumption of PCI-E and SATA devices. The board has separate «on/off» and «power reset» buttons. Starting the board up is extremely simple: install the memory module, connect the PSU, connect the PC with the upper mini-USB port, run your favorite terminal emulator with support for COM ports ( you may need a driver for the bridge itself — use this link ). Well, that’s it, press the «on» button and select the desired item in the boot menu.The board has two NOR-memory modules: 16 MB and 32 MB. The first one is bootable, it contains the firmware itself. Here everything is standard: U-Boot + Linux-kernel + minimal BusyBox image. Booting from the network over an NFS or TFTP server is possible. For embedded systems it will be enough. In such cases, the final product is a relatively compact board with already-welded RAM and ROM of the required volume and a pre-programmed and optimized software environment for a specific range of tasks. As an example, one can think of SOHO-routers.The second option to use this board for is the launch of a full-fledged OS. That’s what we’ll do for the tests. The vendor offers a slightly modified version of Debian 9 with a kernel from the SDK. Please note that they do not themselves re-assemble all software. Ready-made repositories of Debian mipsel-branches are used, so there are no optimizations for this particular processor. However, there are also Astra Linux Special Edition assemblies for Tavolga Terminal 2BT1 devices with the same Baikal-T1 SoC. But they, alas, are not openly available. Also, the support for Alt Linux and Buildroot is expected and there is a possibility to run OpenWRT / LEDE.To start up the Debian OS, one needs to take the kernel images, firmware and ramdisk from the SDK. The SDK itself also includes auxiliary tools for cross-compilation, scripts for building a ROM image and a prepared VM for QEMU, where you can pre-debug your programs. The experience with Debian 9 on BFK 3.1 is still not perfectly smooth: after installation one will have to dig into the settings and install some software, but there are no special problems with this. It’s a pity that there is no full documentation for this board yet: one has to find out through trial and error or ask the vendor directly.In order to run the tests connected the board to an ancient by today’s standards Kingston SSDNow V for the OS and the 4 GB Samsung DDR3L-1600 memory module. This is enough to get acquainted with the SoC capabilities. There has been one more nuance — because of the controller’s features not all the memory of the SO-DIMM module is visible. Another important point deals with the basic builds of test programs from the source code: it all has been done directly on BFK 3.1. Where necessary the compiler keys are specified.The build process, we must say, is not always painless. Somewhere we had to dig into the optimization parameters in order to achieve a better result. Something was going well, but when executed, it failed or did not function correctly. At times, there was a general feeling that the vendor did not suspect the existence of other than x86 platforms. And this applies not only to software. In particular, in PCI-E, most likely, modern GPUs will not work, as, according to the vendor, almost all of them require the presence of UEFI / BIOS x86. Also, there may be problems with devices that use, for example, a PCI PCI-E bridge.Let’s briefly review the processor itself for a start. Baikal-T1 has two 32-bit cores based on the P5600 Warrior (MIPS32 Release 5) architecture with hardware support for virtualization. Each core received an 64 KB L1-cache for data and instructions. Both cores have one common 1 MB L2-cache. Also, each core has its own FPU supporting 128-bit SIMD. All the cores, L2 and FPU operate at the same 1.2 GHz frequency. The processor is capable of performing up to four integer operations, up to two operations on double precision floating point numbers or four single precision per clock cycle. That gives it a theoretical peak performance of 4.8 GFlops FP64 (2 cores 1.2 GHz × 2 FP64) or 9.6 GFlops FP32. However, on a practical level, to open up the SoC potential (as people like to post in comments), a manual optimization of the code, and the compiler, that “knows” about FPU / SIMD the features are necessary.In reality, for example, an unoptimized version of Linpack, compiled by the open GCC, produces a way less than expected result. Such situation is quite normal for new or specific (like Elbrus) architectures. This should be taken into account when evaluating the results given below. Another important point for concerns is the notorious Meltdown and Spectre vulnerabilities. The computational blocks in MIPS32r5 are superscalar and capable of doing extraordinary instruction execution, but there is clearly no talk of deep speculation. The core vendor issued a warning about the possible presence of Spectre (not Meltdown) in the “clean” P5600 / P6600 cores. According to the vendor, in the case of Baikal-T1, the official vulnerability verification code does not produce a result, but it’s too early to state with absolute certainty that the SoC is not affected by the vulnerability. The vendor plans to organize a hackathon to double check the SoC security.The SoC’s cores communicate via AXI bus with the rest of the components. All high-speed interfaces have DMA support. The single-channel memory controller itself supports DDR3–1600 with ECC. The maximum supported CPU capacity of RAM is 8 GB. There is another nuance — the memory controller has a data bus of 32 bits and 8 bits of ECC and supports work with memory chips with a width of 8 to 32 bits. For finished products with already soldered suitable modules, it present no problems, but the board will only see half of the declared volume with the conventional SO-DIMMs, since they usually “look” outward with a 64-bit interface. And it’s speed, obviously, will be lower — up to 6.4 GB/s.The insides of the SoC can be admired at this link The SoC requires a voltage of 0.95 V for power supply and the claimed power consumption is not more than 5 watts. During the tests, the CPU warmed up to 60 with a small plus degrees of Celsius. Active cooling is not required, however in the closed case a radiator will not be superfluous. The frequency of the cores is dynamically adjusted in the range from 200 to 1500 MHz, but this requires OS support, so far in the current Debian build the frequency can be set at the system startup. In any case, during low load operation one core can automatically shut down. Priced at 65 USD Baikal-T1 is manufactured at TSMC 28-nm fab.Now let’s go directly to the benchmarks. The first in the queue — CoreMark –is a specialized benchmark, used to evaluate the performance of processors and SoC for embedded systems. To speak it straight, it was with the announcement of a new CoreMark record that Imagination Technologies began to tell the story about the advantages of the MIPS P5600 Warrior core. The truth is, at that time it was a single FPGA-simulation core running at 20 MHz. Then it was all about the record in the megahertz per core CoreMark value: 5.61, but in reality it is worth counting on the value of about 5. The core vendor even pointed out the greater efficiency of the P5600 in comparison with the desktop Intel CPUs. Formally, Baikal-T1 is the leader in terms of megahertz and megahertz per core. In practice, to achieve performance in absolute terms, the vendor uses extensive methods, increasing the frequencies and the number of cores.Alas, the CoreMark results base is not filled in very carefully, so we had to manually select tests for dual-core chips with frequency characteristics close to those of Baikal-T1 and also an explicit indication that the test had used two streams. For comparison we intentionally included a four-core specimen. In general, one can rank benchmark results by several criteria. However, it immediately produces a lot of nuances: first, ARM and MIPS IP-cores are licensed to third-party companies, so the implementation of the very same design can vary significantly; secondly, a lot depends on the optimization of the software code, its builds and its runtime.For our basic test, we used GCC 6.3 with the following options: -O3 -DMULTITHREAD = 2 -DUSE_PTHREAD -funroll-all-loops -fgcse-sm — fgcse-las -finline-limit = 1000 -mhard-float -mtune = p5600. The tests of the vendor also included Sourcery CodeBench commercial environment. Hereinafter in the tables the following designations are accepted: “official test “- for the results posted on the vendor’s web-site; “precomp” — for running binary benchmark files provided by the CPU vendor; “w/opt.” — our own builds from the open source code with the keys indicated; “opt.” — cross-compilation with SDK and commercial utilities by the vendor “recipes”. Manual optimization allows to achieve better performance, which is very clearly visible in the results table. However, we do not need to sort through the keys and dig into the code, whereas Baikal-T1 software developers for definitely have to deal with it on the regular basis.From the same document one can pick the results of the of the «old school» classic benchmarks. The Stream test for memory bandwidth was compiled for one thread with the following keys: -mtune = p5600 -O2 -funroll-all-loops. Its result makes up about the half of the theoretical RAM speed.All said above about CoreMark applies to Dhrystone2 (that deals with integer computing), which was compiled in the base with a minimum of keys: -O3 -funroll-all-loops -mtune = p5600. Alas, as in the examples above, the measurements base is not an example of accuracy. For comparison, some results were taken for 32-bit computations with explicit indication of optimizations. Unfortunately, they do not post specific CPU models or at least generations. In addition, the matter is complicated by the presence of TurboBoost or similar short-term (and this test is just a short one) increase in the base frequency of the processor, which lubricates the overall picture. Again, the situation repeats in the test with CoreMark — in terms of megahertz, the performance of the P5600 is not bad.But other modern CPUs increase the performance either by increasing the frequency, by supporting 64-bit instructions, and also by the number of cores all at the same time. Whetstone results are mostly the same, only the difference from the increase in fluxes and the use of vector instructions is even more striking. Oh yeah, to build all this stuff we had to slightly adjust the code by removing non-essential calls to the x86-assembly and x86-extensions checks, that are needed only for CPU identification.For a quick check of Gigabit Ethernet adapters we used the iperf 3.1.3 utility, which demonstrated that for one-way connections the speed is the required 940 Mb/s, but in duplex, alas, the speed was at 1.2 Gb / s. When explaining this fact, the noted, that a small tuning at the software level is required to get full performance.We know this idea smells like madness, as PTS (https://www.phoronix-test-suite.com/) is not generally designed for such systems. The build takes place directly on the test machine, so in the case of Baikal-T1 this is just excruciatingly long, as well as the duration of most of tests. We excluded some tests excluded from the suite: those to fail to compile, or to be performed indecently for a long time even on “adult” PCs. The first problem, in theory, can be handled manually, customizing the build parameters. But again, we did not have such a task and either way the test results are unlikely to achieve the maximum possible values.All test results are available at this link . Strictly speaking, all those are rather a reserve for the future, as right now we have nothing to compare. Later we will be able to see, how much better (or not) the results become after fixing the build and optimization. For the curious one can only mention a few randomly coincidental benchmarks configurations of Chinese processors Loongson Godson 3A3000 (4 cores @ 1.5 Hz, L2-cache 1 MB, L3-cache 8 MB, 28 nm, 30 W). Both CPUs are similar in new architecture and software optimization problems for this architecture. So far, the Chinese are ahead with a large margin in absolute terms, but in terms of the core, MHz and consumption the results are little less unambiguous.Baikal-T1 is notable as the modern product, the Russian developers were able to implement by relatively a small team within an industry accepted time. It is reasonably priced (compared to other Russian processsors) and available in the market. But its success (or failure) can really be evaluated only after a year or two — it all depends on who and in what volumes will use this SoC in their products. Right now, only a few of end-products are publicly announced — https://baik.al/sdelano. All these are typical examples of the application areas of Baikal-T1, we would also like to see more of the NAS/SAN, IoT and SDR solutions. However, it’s not about hardware any longer. The CPU itself is really good, especially when evaluating its relative, rather than absolute, performance. But there are still a lot of software problems and roughnesses. Speaking specifically about BFK3.1 board, it has pretty weak documentation. Globally the question is this: who will tool the software for this architecture? Who will develop the tools that make this process as easy as possible? Will there be a strong enough community of software developers? For example, the support for hardware virtualization, which was mentioned in the CPU description, appeared only in the summer of last year, with the release of the Linux kernel 4.12. And in general, we wonder what will be happening next to MIPS architecture. Five years ago the choice of this particular architecture for new products was very reasonable. And now? It is the question…