September 2010

I've been interested in writing an OS for a long time now. An OS consists of many components with one of the most fundamental being its booting mechanism. Had I been writing a production OS, I would have made use of a package such as GNU GRUB or LILO. However, as a hobbyist I was interested to know exactly what my PC was doing during the boot process. I decided that a good way to start would be to study a simple operating system -- MS-DOS. An MS-DOS boot sector has a very simple job: load the first 3 sectors of IO.SYS into memory and execute it.

After your BIOS completes its POST, an IBM PC compatible computer will read the first 512 B block from disk into memory at location 0x07C00 and begin executing it. The last 2 B of the boot sector must have the value 0xAA55; this value is known as the boot signature. This leaves 510 B for code.

MS-DOS expects the disk to be formatted with the FAT file system and will populate the boot sector with an 8 B OEM name and a 51 B data structure known as the BIOS parameter block. The first 3 B are expected to contain a jump instruction. This finally leaves us with 448 B for code. Had I been writing a production DOS boot sector, I would have written the code in an assembly language under such extreme constraints. However, as a philocalist and masochist I felt compelled to write legible code and decided to use C.

Free and reserved bytes in an MS-DOS boot sector (1 B per square)

The BIOS parameter block contains important information about the layout of the filesystem. Here is a table describing its layout:

Length Name 2 Bytes per sector 1 Sectors per cluster 2 Number of reserved sectors 1 Number of file allocation tables 2 Number of root entries 2 Number of sectors (if < 65 536) 1 Media descriptor 2 Sectors per file allocation table 2 Sectors per track 2 Number of heads 4 Number of hidden sectors 4 Number of sectors (if ≥ 65 536) 1 Disk drive index 1 Reserved 1 Volume signature 4 Volume ID 11 Volume label 8 Volume type

The CPU will be in real mode when the boot sector is loaded. This means we can only use 16-bit opcodes and address up to 1 MiB of memory. The first 640 KiB are available to our program while the remaining 384 KiB are used for assorted system-specific purposes. These memory areas are known as conventional memory and the upper memory area, respectively.

Some parts of conventional memory are reserved by the system. The first 1 024 B are used for the interrupt vector table and the next 256 B are used for the BIOS data area. Also, recall that the boot sector is loaded in 512 B in [0x07C00, 0x07E00). We can safely use 29.75 KiB B in [0x00500, 0x07C00) and 480.5 KiB in [0x07E00, 0x80000) for a total of 510.25 KiB. There are also 128 KiB in [0x80000, 0xA0000), but some systems consume part of this region for the extended BIOS data area.

Free, partial, and reserved bytes in conventional memory (1 KiB per square)

In my boot sector implementation, I use 5 B in [0x07E00, 0x07E05) to store the number of sectors on the disk and the logical block address of the root directory and IO.SYS. I use 29.75 KiB in [0x00500, 0x07C00) for the root directory index. Each root directory entry is 32 B, meaning that IO.SYS must be one of the first 952 entries. (MS-DOS 4.0 expects IO.SYS to be the first record in the root directory.) Here is a table describing the layout of each root directory entry:

Length Name 8 Filename 3 Extension 1 Attributes 1 Reserved 1 Creation time (microseconds portion) 2 Creation time 2 Creation date 2 Last access date 2 Reserved 2 Last modified time 2 Last modified date 2 Cluster offset 4 File size in bytes

Dates are 16-bit, little-endian values stored in the following format: YYYYYYYMMMMDDDDD. Timestamps are 16-bit, little-endian values stored in the following format: HHHHHMMMMMMSSSSS.

Once IO.SYS is found, I store its first 3 sectors at 0x00700. I expect these 3 sectors to be unfragmented. This leaves 512 B in [0x00500, 0x00700) free for IO.SYS to store a copy of the boot sector later on.

Compiling the code into a raw binary with 16-bit opcodes became my next challenge. I was pleased to find that this is possible with GCC and binutils with a little bit of magic. First, I had to add the .code16gcc assembler directive to my C code. I also had to create a custom linker script to create a raw binary with a boot signature. The script instructs ld to construct a binary with a code segment, read-only data segment, and a boot signature. It also sets the instruction pointer to the correct memory offset.

The source code is released under the MIT license and is also available on GitHub at github.com/kjiwa/x86-boot-sector-c.