Contributed by weerd on 2010-02-25 from the here-be-dragons dept.

Some time ago, we featured an article with a request for hardware. Specifically, Jasper Lievisse Adriaanse (jasper@) was looking for a Lemote Yeeloong and Otto Moerbeek (otto@) had recently received one from a donor to work on the Loongson port. Jasper had his Yeelong sponsored by two donors and received his machine less than two weeks after the article was posted.

Quite a few commits have hit the tree (eg. here, here, and here) since then, mostly from otto and miod for src/ and jasper for ports/ and it looks like the 4.7 release will feature an OpenBSD/Loongson port that should work on the Lemote Yeeloong, the Lemote Fuloong and the EMTEC Gdium.

Undeadly followed up on the donations and asked Miod Vallat (miod@), Otto and Jasper about the porting efforts, please read on for their story:

Otto got his Yeeloong late January, as a donation. He's been working on getting OpenBSD running properly on the Loongson machines, recently focussing on the installation process.

I received my donated Yeeloong in the last week of January. At that point in time the port wasn't stable enough to be self hosting, so I set up a cross building environment first. Apart from the processor bugs Miod can tell horror stories about, the gnu C compiler had problems building some of the sources. I started investigating this problem and it turned out to be a bug in the ProPolice code. The fix I applied also fixed similar problems on the sgi platform, so that shows nicely how a new port can cause other platforms to progress. After this fix and Miod's work on avoiding the processor bugs, we had a self-hosting environment. There were some rough edges, most of them had to do with the PMON (boot environment) shipped with the Lemote machines. This particular version of PMON is broken in so many ways it's hard to get started telling you about it. But most importantly, it can only load a kernel or a second stage bootloader via netboot or from an ext2 filesystem. I spent quite some time trying to make PMON load from a fat or iso9660 filesystem, but without success. After creating a working RAMDISK configuration used to create bsd.rd, I decided that to be able to hack and build snapshots at the same time, I needed another machine, so I ordered a Fuloong 2F. Getting this one to work was tricky, mostly because its framebuffer is not yet supported and the serial console code was missing; the Yeeloong machine used to do the initial port does not have a serial port. After Miod helped to get the interrupt routing ok, my code to setup the serial console started working and the Fuloong 2F turned into a supported machine. I then concentrated on the installation procedure, which needed a way to create an ext2 filesystem. So I ported newfs_ext2fs from NetBSD, and wrote the machine specific parts of the install script. This script will automatically either use an existing ext2 partition to place the bootloader on, or create a small ext2 partition to hold it. The bootloader then accesses the ffs root filesystem to load the kernel. It was is good fun working on all this, committing to all corners of the OpenBSD tree.

Jasper, a longtime ports developer, has been working on Loongson too. Here's what he has to say:

Right after I got my Yeeloong I installed OpenBSD, and the installation procedure was not nearly as smooth then, as it is now. As soon as OpenBSD had been installed I checked out the ports tree and assembled a list of ports I felt that were needed most. Which resulted in the first 100 or so packages for mips64el. (mips64el is the application architecture of the Loongson processors.) After a bit more than a week I copied out the first full ports build for mips64el. As on our other mips64 platform, sgi, we suffer from some rather big fallout, due to the fact we don't have Python and GTK+2. These issues are not trivial as they require fixes for binutils and low-level floating point emulation. So that's a "to be continued" part. On the other hand, quite some ports have already been fixed, and even more ports will be fixed after the ports tree unlocks again. Along the way I've fixed an issue where a machine with 2GB of RAM would report having 4096TB, which made the kernel rather upset.

Another developer working on the Loongson, doing much of the low-level grunt work, is Miod Vallat. He found some nice undocumented bugs (turns out these were documented, but in Chinese) during the initial phase of the porting effort:

I had been bribed with a Lemote Yeeloong machine last spring; but as usual, my spare time is close to zilch, so I did not really start working on it until July, and even then, I kept being distracted with other duties. Eventually the hardware hackathon in November allowed me to settle down and spend quality time with this machine. At the end of the hackathon I had the kernel booting up to the point it was asking for a filesystem to mount as root and an init(8) binary to run. Then regular life resumed, but gentle pressure from other developers eventually caused me to commit this work and slowly start working on the userland bits. Matthieu Herrb (matthieu@) started working on userland with the aim of getting X running as soon as possible, and Otto Moerbeek joined the party a few days later. But our systems would not run stably - after a few hours, or sometimes only a few minutes, they would freeze solid. So I put my debug hat on and started to research this. It didn't take long to figure out a reproducible way to trigger the freezes in less than a minute; then I started adding extra sanity checks and guard code to the kernel, to try and gather as much information as possible about the problem. At first I suspected a subtle race condition in my interrupt handling code, which would cause interrupts not to be re-enabled after being serviced; but after carefully reading this code many times, I couldn't find such a bug (and there wasn't any, really). I ended up losing the few hairs I had, adding code deep down in the exception handling code, to figure out in which state the kernel would hang. This allowed me to figure out that the freezes were always happening while servicing a clock interrupt, while a disk controller interrupt was pending or had just been serviced (one more reason to suspect a race). These are the times where you'd give everything you have to get a logic analyzer for two minutes. Unfortunately, I no longer work in a place which has logic analyzers, and even then, had I had access to an analyzer, there is no analyzer probe connector on the Yeeloong laptop. But I did not have such a luxury, and this machine has no serial port. At some point my debugging code was drawing small bars of different colours in the margins of the screen, and when the kernel would freeze I'd gather state information from this meager display. It was sort of a morse code, but with colours! At this point, I started to get desperate and trying anything to get a kernel to survive my test. I tried running with the cache disabled, it was slow but it didn't help. I tried running diskless, it didn't help either. I tried disabling all interrupt-capable devices but the Ethernet interface, and guess what? It did not help. I went further and tried to disable Loongson-specific functionality, and to my surprise, although I did not get a reliable kernel, it would take much more time for it to freeze. But then I was also changing timing, so the ``subtle race in the interrupt codepath'' theory would still stand. Fortunately, at this point, I stumbled upon an archived message from the binutils mailinglist, where a Loongson engineer was discussing changes to the assembler to workaround processor misfeatures... the description of the errata was quite vague, but was matching exactly the symptoms I was seeing. It turns out that this processor has a so-called ``Branch Target Buffer'', which is a cache of the last few recently executed branches through registers (i.e. where the address to branch to is not set in stone in the code, but is held in a register, for example when invoking a function pointer... such as an interrupt handling routine). Since branch misprediction causes a 20 pipeline cycle penalty on this particular processor, it is important to try to prevent suffering such penalties. To do so, this processor has a cache of the last 16 branch addresses, and it will use it to fetch and decode the instructions at the branch address in advance. If the branch is not taken, these instructions will be canceled in the processor pipeline. So far, so good, every modern RISC processor has something similar to this. Now, the Loongson designers decided that, if the instructions to be canceled are loads from memory, it will not hurt to let the memory loads complete, in order to fill a cache line from memory; the rationale behind this being that the odds of this particular memory being used soon are high, even if the branch was not taken yet. Unfortunately, this load is not always correctly ignored, and the processor can end up keeping the memory bus locked (according to the Loongson information). This sounded too horrible to be true. Yet it was worth a try. The suggested workaround was to add extra code around branches to confuse the BTB matching logic, but this looked fishy to me. I decided to go with something guaranteed to work: forcing a BTB clear before every branch through a register. And, as you might have guessed, since then, our kernels have been rock solid, and developers have been able to work on fixing userland bugs, getting X to run, and building and fixing ports for this machine. In retrospect, a logic analyzer would have exposed this bug in no time. And although my workarounds were going in the right direction, I would never had suspected such a horrible errata. I am glad this problem is over, but I still want my hair back.

As you can see, your donation can go a long way into getting new hardware support into OpenBSD. Thanks go to Jasper, Otto an Miod for taking the time to tell us about their work and of course for working on the Loongson. Note that Paul Irofti (pirofti@) very recently also added his request for a Loongson machine to want.html to work on suspend / resume on these machines, so if you missed the chance to send hardware to The Netherlands, perhaps you can send some to Romania.