Why would you do that to yourself?

A few weeks ago Gynvael Coldwind announced a contest (I’m sorry, the link is in Polish) related to his excellent OS dev streams (again, in Polish, but if you do understand it, definitely consider watching them). The task was simple: make a BIOS-bootable diskette image with the prettiest graphical effect; all in 16-bit text mode, with binary size limit of 512 bytes.

It’s as simple as providing the right target to clang, right?

Aesthetics isn’t exactly my thing (as one could conclude from perusing this blog), but I decided to try and see if I could write an entry in C++, using newer standards freely. As it turns out, both gcc and clang claim to be capable of generating 16-bit code with the -m16 switch.

Let’s give it a try:

void foo ( ) { // text mode text buffer begins at 0xB8000 char * textBuffer = reinterpret_cast < char * > ( 0xB8000 ) ; textBuffer [ 0 ] = '+' ; } void foo() { // text mode text buffer begins at 0xB8000 char* textBuffer = reinterpret_cast<char*>(0xB8000); textBuffer[0] = '+'; }

Compiled with the following (actually, I tried g++ and clang++ with -Os and -Oz and chose the smallest binary every time):

clang++ -Wl,--oformat=binary -nostdlib -fomit-frame-pointer -fno-builtin -nostartfiles -nodefaultlibs -Wl,-e,0x7c00 -Wl,-Tbss,0x7c00 -Wl,-Tdata,0x7c00 -Wl,-Ttext,0x7c00 -Oz -std=c++1z -m16 main.cpp -o kq.bin clang++ -Wl,--oformat=binary -nostdlib -fomit-frame-pointer -fno-builtin -nostartfiles -nodefaultlibs -Wl,-e,0x7c00 -Wl,-Tbss,0x7c00 -Wl,-Tdata,0x7c00 -Wl,-Ttext,0x7c00 -Oz -std=c++1z -m16 main.cpp -o kq.bin

Produces binary:

00000000 67C60500800B002B mov byte [ dword 0xb8000 ] , 0x2b 00000008 66C3 o32 ret 00000000 67C60500800B002B mov byte [dword 0xb8000],0x2b 00000008 66C3 o32 ret

The 8-byte mov instruction looks really suspicious (and is terrible for the contest, taking 1/64th of the available space). When tested with BOCHS, it simply doesn’t work — at least not while still booting. Moreover, it doesn’t look at all like the Segment:Offset addressing the 16-bit code should be full of. A quick look into the documentation solves this particular mystery quite easily, though.

Clang:

The generated code and the ABI remains 32-bit but the assembler emits instructions appropriate for a CPU running in 16-bit mode, with address-size and operand-size prefixes to enable 32-bit addressing and operations.

GCC:

The -m16 option is the same as -m32, except for that it outputs the .code16gcc assembly directive at the beginning of the assembly output so that the binary can run in 16-bit mode.

Working around the compilers

Okay, so it turns out it’s not possible (or I don’t know the magic switches) to have either of those compilers generate truly 16-bit code without a medium-to-major time investment of writing the backend myself. I don’t claim to know nearly enough about this topic to discern the cause of this peculiar behaviour, but for my purposes, knowing that another way had to be found was sufficient. What about the asm blocks?

Let’s check:

void foo ( ) { asm ( "mov 0B800h, %ax;" "mov %ax, %es;" ) ; } void foo() { asm("mov 0B800h, %ax;" "mov %ax, %es;"); }

Bingo!

00000000 A100B8 mov ax , [ 0xb800 ] 00000003 8EC0 mov es , ax 00000005 66C3 o32 ret 00000000 A100B8 mov ax,[0xb800] 00000003 8EC0 mov es,ax 00000005 66C3 o32 ret

Unfortunately, this could hardly be called a C++ solution, not when it’s a thinly veiled assembly implementation and any normal pointer data access is impossible, because it would generate 32-bit instructions. What is more, attempting to use different segment registers would require rewriting the code or applying an ugly macro (or, possibly in C++20, using something akin to string mixins from the D language). I went with a macro.

Creating the building blocks — output

It’s far from ideal, but it was a workable start:

void foo ( ) { SegmentedAddress < 0xB800 , SegmentRegister :: gs > video_buffer ; video_buffer. raw_write < 16 > ( 0x1234 , 16 ) ; } void foo() { SegmentedAddress<0xB800, SegmentRegister::gs> video_buffer; video_buffer.raw_write<16>(0x1234,16); }

produces:

00000000 6657 push edi 00000002 B800B8 mov ax , 0xb800 00000005 89C0 mov ax , ax 00000007 8EE8 mov gs , ax 00000009 B83412 mov ax , 0x1234 0000000C B91000 mov cx , 0x10 0000000F 89CF mov di , cx 00000011 658905 mov [ gs : di ] , ax 00000014 665F pop edi 00000016 66C3 o32 ret 00000000 6657 push edi 00000002 B800B8 mov ax,0xb800 00000005 89C0 mov ax,ax 00000007 8EE8 mov gs,ax 00000009 B83412 mov ax,0x1234 0000000C B91000 mov cx,0x10 0000000F 89CF mov di,cx 00000011 658905 mov [gs:di],ax 00000014 665F pop edi 00000016 66C3 o32 ret

After special-casing the text mode video buffer a readable hello world can be created:

void foo ( ) { VideoBuffer buf ; buf. writeLine ( "Hello, World!" , 20 , VideoBuffer :: Colour :: Red ) ; } void foo() { VideoBuffer buf; buf.writeLine("Hello, World!", 20, VideoBuffer::Colour::Red); }

Picture 1. Hello, World! Picture 1. Hello, World!

Creating the building blocks — keyboard input

With help in the form of Ralf Brown’s Interrupt List, creating an abstraction over keyboard input was simple:

struct Keyboard { static inline bool keyAvailable ( ) noexcept { u8 ret = 1 ; asm volatile ( "movb $1, %%ah;

\t " "int $22;

\t " "jnz 1f;

\t " "movb $0, %0;

\t " "1:" : "=q" ( ret ) : : "ax" ) ; return ! ret ; } static inline u8 getKey ( ) noexcept { u8 k ; asm volatile ( "xor %%ax, %%ax;

\t " "int $22;

\t " "movb %%ah, %0;

\t " : "=q" ( k ) : : "ax" ) ; return k ; } } ; struct Keyboard { static inline bool keyAvailable() noexcept { u8 ret = 1; asm volatile( "movb $1, %%ah;

\t" "int $22;

\t" "jnz 1f;

\t" "movb $0, %0;

\t" "1:" : "=q" (ret) : : "ax" ); return !ret; } static inline u8 getKey() noexcept { u8 k; asm volatile( "xor %%ax, %%ax;

\t" "int $22;

\t" "movb %%ah, %0;

\t" : "=q" (k) : : "ax" ); return k; } };

Creating the building blocks — random memory access

Since, as is written above, using standard C/C++ pointers was not possible, I created an abstraction over the SegmentedAddress class templates.

template < u16 Addr, u8 Size, typename Elem = kq :: sized_type < Size > , u16 Elements = 1 > struct MemoryEntity { constexpr static u16 addr = Addr ; constexpr static u8 size = Size ; constexpr static u16 elements = Elements ; using type = Elem ; using storage_type = kq :: sized_type < Size > ; template < typename T > static inline void set ( T && val, u16 n ) noexcept { data. raw_write < Size > ( nasty_cast < storage_type > ( val ) , Addr + n * Size ) ; } static inline type get ( u16 n ) noexcept { return nasty_cast < type > ( data. raw_read < Size > ( Addr + n * Size ) ) ; } } ; template<u16 Addr, u8 Size, typename Elem = kq::sized_type<Size>, u16 Elements = 1> struct MemoryEntity{ constexpr static u16 addr = Addr; constexpr static u8 size = Size; constexpr static u16 elements = Elements; using type = Elem; using storage_type = kq::sized_type<Size>; template<typename T> static inline void set(T&& val, u16 n) noexcept { data.raw_write<Size>(nasty_cast<storage_type>(val), Addr + n * Size); } static inline type get(u16 n) noexcept { return nasty_cast<type>(data.raw_read<Size>(Addr + n * Size)); } };

It worked, and it worked well — the class compiled down to nothing and the resultant abstraction was fairly readable (but it could be better):

constexpr static auto Blocks = MemoryEntity < 0x200 , 16 , Point2D, 256 > { } ; constexpr static auto Blocks = MemoryEntity<0x200, 16, Point2D, 256>{};

The Result

The source and the binary are available here; the mechanics of the snake game are trivial and I’ll skip them. With clang++ and -Oz the resultant binary had exactly 512 bytes. That means it passed the contest criteria — if only just. Although I had to forego adding the proper boot signature bytes at the end.



Conclusions / lessons learned

First of all, it shows that I started doing this project without any kind of plan. Basic components are tacked on others and do not complement each another. Even for a toy project, it is jarring.

Secondly, when they say readability is important, they are right. Using similarly-sized integers for data values and offsets is, simply put, dumb, especially when I could have trivially boxed the offset in its own type

‘Zero-cost abstractions’ is, I believe, a term coined by Bjarne Stroustrup. While doing this project I inspected the resultant binary after each compilation and I can say with 100% certainty that the abstractions I used cost me nothing in terms of binary size or speed, since my results were identical to those I hand-crafted in the C language.

Optimizers in modern compilers are truly great. I did not go out of my way to help them, yet they performed their job admirably.

Using modern compilers to target old and semi-forgotten platforms wasn’t the best choice. I had to work around the compiler and I still ended up with a bulky binary because the compiler used 32-bit versions of instructions.

Side notes

Since the contest was about creating a pretty graphical effect, my work didn’t win. In fact, it was the ugliest one, although it may have had a chance to win in the “most interesting” category. At least in my opinion.

On the other hand, the quality of works sent by others was just jaw-dropping, even for someone passably familiar with the scene. You can see them all + sources here. The two best entries can be seen in action here and here. The reader is reminded that these works have not left the text mode, nor have they gone over the decreed 512 bytes.

Gynvael will be testing the waters with an English stream about “a CTF challenge or two, probably exploitation or reverse engineering” on July, 15th, at 19:00 UTC+2 — more info on his blog. Given the quality of his Polish streams, I urge you to check it out.