In this article, we discuss how to write our own "hello, world" program into the boot sector. At the time of this writing, most such code examples available on the web were meant for the Netwide Assembler (NASM). Very little material was available that could be tried with the readily available GNU tools like the GNU assembler (as) and the GNU linker (ld). This article is an effort to fill this gap.

When the computer starts, the processor starts executing instructions at the memory address 0xfff0. This is usually a location in the BIOS ROM. Thus the BIOS code is executed by the processor. It checks several things, does many tests including POST (power-on self test), and then finds the boot device. It loads the code from its boot sector into the memory and executes it. From here, the code in the boot sector takes control. In IBM-compatible PCs, the boot sector is the first sector of a data storage device. This is 512 bytes in length. The following table shows what the boot sector contains.

Address Description Size in bytes Hex Dec 000 0 Code 440 1b8 440 Optional disk signature 4 1bc 444 0x0000 2 1be 446 Four 16-byte entries for primary partitions 64 1fe 510 0xaa55 2

This type of boot sector found in IBM-compatible PCs is also known as master boot record (MBR). The next two sections explain how to write executable code into the boot sector. Two programs are discussed in the these two sections: one that merely prints a character and another that prints a string.

The reader is expected to have a working knowledge of x86 assembly language programming using GNU assembler. The details of assembly language won't be discussed here. Only how to write code for boot sector will be discussed.

The code examples were verified by using the following tools while writing this article:

GNU assembler (GNU Binutils for Debian) 2.18 GNU ld (GNU Binutils for Debian) 2.18 dd (coreutils) 5.97 DOSBox 0.72

The following code prints a single character in yellow color on a blue background:

.code16 .section .text .globl _start _start: mov $0xb800, %ax mov %ax, %ds movb $'A', 0 movb $0x1e, 1 idle: jmp idle

We save the above code in a file, say char.s , then assemble and link this code with the following commands:

as -o char.o char.s ld --oformat binary -o char.com char.o

The .code16 directive tells the assembler that this code is meant for 16-bit mode. The _start label is meant to tell the linker that this is the entry point in the program.

The video memory of the VGA is mapped to various segments between 0xa000 and 0xc000 in the main memory. The color text mode is mapped to the segment 0xb800. The first two instructions move 0xb800 into the data segment register, so that any data offsets specified is an offset in this segment. Then, the code for the character 'A' (usually 0x41 or 65) is moved into the first location in this segment and the attribute (0x1e) of this character to the second location. The higher nibble (0x1) is the attribute for background color and the lower nibble (0xe) is that of the foreground color. The highest bit of each nibble is the intensifier bit. The other three bits represent red, green, and blue. This is represented in a tabular form below.

Attribute Background Foreground I R G B I R G B 0 0 0 1 1 1 1 0 0x1 0xe

We can be see from the table that the background color is dark blue and the foreground color is bright yellow. We compile and link the code with the as and ld commands mentioned earlier and generate an executable binary consisting of machine code.

Before writing the executable binary into the boot sector, we might want to verify whether the code works correctly with an emulator. DOSBox is a pretty good emulator for this purpose. It is available as the dosbox package in Debian. Rename the binary file to char.com and then run it with DOSBox with the following commands:

dosbox -c cls char.com

The letter A printed in yellow on a blue foreground should appear in the first column of the first row of the screen.

ld

com

--oformat binary

ld

Once we are satisfied with the output of char.com running in DOSBox,we write the binary and the MBR signature into the boot sector with these commands:

dd if=char of=/dev/sdb printf '\x55\xaa' | dd seek=510 bs=1 of=/dev/sdb

Caution: One needs to be absolutely sure of the device path of the device being written to. The device path /dev/sdb is only an example here. If the dd command is used to write to the wrong device, access to the data on it would be lost.

Now booting the computer with this device should show display the letter A in yellow on a blue background.

The following code prints a string in yellow color on a blue background:

.code16 .section .data message: .asciz "hello, world" .section .text .globl _start _start: nop xor %di, %di mov $0xb800, %ax mov %ax, %ds mov $message, %si move: xor %dx, %dx mov %cs:(%si), %dl cmp $0, %dl idle: jz idle mov %dl, (%di) inc %di movb $0x1e, (%di) inc %di inc %si jmp move

There are two sections in this code. The data section has the null-terminated string to be displayed. The text section has the code. The code moves the first byte of the string to the location, 0xb800:0x0000, its attribute to 0xb800:0x0001, the second byte of the string to 0xb800:0x0002, its attribute to 0xb800:0x0003 and so on until the string terminates which is detected by the null byte in the end. The statement movb %cs:(%si), %dl moves one character from the string indexed by the SI register in the code segment into the DL register. The reason why we are reading the characters from code segment will become clear after understanding the the linker commands discussed below.

While booting, the BIOS reads the code from the first sector of the boot device into the memory at physical address 0x7c00 and jumps to that address. However, while testing with DOSBox, things are a little different. In DOS, the text section is loaded at an offset 0x100 in the code segment. This should be specified to the linker while linking so that it can correctly resolve the value of the label named message . Therefore the object file has to be linked twice: once for testing it with DOSBox and once again before writing it into the boot sector.

To understand the offset at which the data section can be put, it is worth looking at how the binary code looks like with a trial linking with the following command:

as -o string.o string.s ld --oformat binary -Ttext 0 -Tdata 100 -o string.com string.o objdump -bbinary -mi8086 -D string.com xxd -g1 string.com

The -Ttext 0 option tells the linker to assume that the text section should be loaded at offset 0x0 in the code segment. Similarly, the -Tdata 100 tells the linker to assume that the data section is at offset 0x100.

The objdump command is used to disassemble the file. This shows where the text section and data section are placed. Let us take a close look at this portion of the output:

1b: 47 inc %di 1c: 46 inc %si 1d: eb ec jmp 0xb ... ff: 00 68 65 add %ch,0x65(%bx,%si) 102: 6c insb (%dx),%es:(%di) 103: 6c insb (%dx),%es:(%di)