Itty Bitty Stack Machine

A Virtual Computer for Operating System Development



This document describes the 32-bit Itty Bitty Stack Machine ( IBSM ), hopefully in sufficient detail as to enable a competent programmer to implement it, the virtual computer we use for operating system research and development. There are two objectives served in this design:

1. The virtual computer must be fast, that is, it should be capable of emulation at a raw speed not slower than 10% of the host native hardware.

2. It must be simple, so that not very much host code is needed to implement it.

The initial IBSM implementation, written in HyperTalk compiled to 68000 code emulated in the MacOS PowerPC emulator, is neither fast nor simple. However, it does provide a lot of debugging facilities that would not be present in the production implementation.

The first implementation had a very clumsy tagged address mechanism to support multiple address segments. That has been removed and simplified, so all memory addresses are now offset from a single Base Register. There is now only one load and one store instruction, for local frame-based access, and a couple conversion operators for converting that to Base-Register form and back.



Architectural Overview

IBSM

The peripheral hardware is idealized to simplify the system management of it:

The Keyboard provides keypress interrupts with ASCII data and the real-time state of modifier keys. The Mouse provides button interrupts and real-time position information, relative to the virtual screen. The Screen maintains its own buffer and supports four high-level graphical operations, plus the ability to set rectangle-based clipping regions. The virtual Disk drive is sector-mapped direct-memory transfer, with interrupt on completion. There is a Host interface for sending a variety of operations out to the host platform, and for receiving results (and interrupts) back. There is a place for Serial interfaces, not presently implemented (these are all handled by the Host).

IBSM

The IBSM packs up to six (or sometimes seven) 5-bit instructions into a 32-bit instruction word. This allows for 31 5-bit frequently used instructions, and 32 10-bit less-frequently used instructions, of which the first 5-bit nibble is an escape from the first set. These are described below in detail. There is no requirement to fill the instruction word; unused bits filled with zero are no-operations, skipped by the emulator. However, denser code puts less of a burden on the memory bandwidth, resulting in faster execution. Furthermore, because the multiple-op instruction word is not normally interruptable in the middle, it runs faster in emulation due to fewer tests for interrupts.

Memory addresses in memory are relative to a base register, so that both program code and data can be dynamically relocated by the operating system while preserving the addresses of data stored in there. The base register is not normally accessible to programmers in application programs, but is maintined by the system software for memory management.



Registers

BR is the Base Register, which relocates everything. Code if offset negative from BR, and the stack is offset positive. HL is the Heap Lower bound register, typically below the code. SX is the Stack eXtent, the upper limit of memory allocated for function stack growth.

There are three registers which control program operation:

FP is the current local variable frame set on entering a function, and restored to its previous value on exit. It is always GL-relative. PC is the program counter, the address of the next instruction fetch. It is always CB-relative. SP is the current stack pointer, the place in memory where data is pushed and popped. The SP always points to the word most recently pushed, the top of the stack. The stack grows upward. When pushed onto the stack, SP is GL-relative; when stored in connection with an interrupt, it is always absolute.



During interrupts, the registers are popped off the stack in the order given above (BR,SX,FP,PC) and pushed in the reverse order. The HL register is reloaded from memory offset from BR. The SP stored (and retrieved) in an interrupt context switch always points to the pushed BR value thus:



SP Æ BR SX - BR FP - BR PC - BR other data

Semaphores

IBSM

IBSM

There are two instructions to operate on semaphores: Signal and Wait . If the semaphore has a (negative) count, Wait increases the value and continues executing; otherwise it triggers the interrupt. If a semaphore value is positive, Signal triggers its interrupt; otherwise it counts down (in the negative direction) and continues execution.

The operating system is expected to handle the interrupts and effect appropriate process management for the processes that block ( Wait ) on a semaphore with no (negative) counts, or conversely to prepare for continued execution processes already blocked on a semaphore that is Signal led.

A reader-writer pair of processes on a circular buffer can set up a pair of semaphores, Fill and Take , with Fill pre-counted with the buffer size. The reader process Waits on the Take semaphore, and when unblocked, advances its own buffer pointer and takes the data, and finally Signals the Fill semaphore. Conversely, the writer process Waits on the Fill semaphore, then advances its own buffer pointer and inserts a new datum, and finally Signals the Take semaphore. No deadlock, buffer overrun, nor race condition is possible in this simple example.



Process Context Switching

IBSM

A. Process timeout (timer interrupt).

B. I/O pending or completion (I/O interrupt).

C. Semaphore Signal/Wait (semaphore interrupt).

D. System Call request by the software (SysCall interrupt).

Interrupts

Cocall

Cocall

The following are the interrupt vector assignments, which is the absolute address of that vector entry:

0 Illegal operation

1 PC=0 (program exit)

2 Stack overflow

3 System Call

4 Signal

5 Wait

6 Timer

7 Disk I/O complete

8 Mouse click

9 Keypress

10 Host event

11 Serial I/O

15 Debugger trap

IBSM

Cocall

Input/Output Space

All I/O space addresses are in low memory:

10 Interrupt pending bits 11 Video command; a store initates the action.

12 Video memory address

13 Video datum/color

14 Video top bound

15 Video left bound

16 Video bottom bound

17 Video right bound 18 Disk datum/command; a store here initiates the action.

19 Disk word count

1A Disk sector address

1B Disk memory address 1C Serial datum/command

1D Serial port address

1E Serial word count

1F Serial memory address 20 Keypress datum, modifiers in high half 21 Mouse button(s) in high bits

22 Mouse vertical coordinate

23 Mouse horizontal coordinate 24 Current time in seconds from 12:00am 2000 Jan 1.

25 Current time in milliseconds from sometime recent (last boot? midnight?) 26 Host Command; a store here initiates the action.

27-2F additional Host parameters



Video Commands

IBSM

0. Screen Rectangle. Returns the boundary rectangle of the screen hardware or host window.

1. Text bits. Beginning at the specified address, each word of data represents one pixel column, initially aligned with the top/left coordinates supplied, and continuing one word per pixel column until the right bound is reached. The least significant bit is the top pixel, as shown:

2. Fill rectangle. The boundary rectangle is filled with the specified color; the memory address is ignored. Horizontal and vertical lines can be programmed from 1-pixel-wide rectangles.

4. Blit image. Consecutive words are copied to the screen, four pixels per word. The image is defined by the boundary rectangle, but the source data at the memory address can be arranged in lines of a different length that the specified width. The source line width (in 4-pixel words) is given in the datum.

5. Capture image. This is the same as Blit, but the pixels go the other direction.

8. Clip rectangle. This sets a clipping rectangle in screen coordinates, outside which drawing will not happen. It is initially by default defined to be the whole screen.

9. Hide rectangle. This removes from the clipping area the specified rectangle. There is an implementation defined limit to the number of Hides that can take chunks out of a single Clip, but it is at least 20. The example below can be programmed by one Clip (10,20,50,90) and two Hides (10,65,25,90) and (35,20,50,70):

The normal use for Clip/Hide is to Clip an application window for drawing, then remove from it by Hide all the rectangles of windows partially overlapping it. The current implementation has enhanced the clipping region handling somewhat.

Colors

IBSM

Blue Green Red 0 0 0 36 6 1 72 12 2 108 18 3 144 24 4 180 30 5

Summing the color values for each of the primaries gives a reasonable distribution of colors. Black is 0, white is 215, and gray is any multiple of 43. Using color values greater than 215 may have unpredictable results. This will not produce beautiful JPEG photographs, but it works for software development. Future implementations will probably define a larger color space.



Instruction Codes

One of the primary instructions captures the next 10 bits as a signed constant in the range [-512,+511] , and pushed onto the stack. The constant must be in the same instruction word, but if it extends off the end it is merely limited to [0,+3] or [0,+127] , depending on how many bits are available.

Another primary instruction pushes a whole word constant onto the stack. The word comes from the next instruction word pointed to by the PC, which is then incremented over it. You can push up to six consecutive constants this way in one instruction word, followed by the six constant values. The next instruction is taken from the word following the sixth constant.

There are no addressing modes in the instructions themselves; all memory addresses are pushed onto the stack as constants (or calculated), and then used by one- or two-nibble opcodes.

0 NOP -- no operation, also word filler

1 BZ -- branch if false: pop 2 words, add the top word to PC if the next =0

2 CALL -- push PC; load PC from subroutine in popped word

3 SYS -- software interrupt

4 ZERO -- push 0

5 ONE -- push 1

6 TWO -- push 2

7 THRE -- push 3

8 MTWO -- push -2

9 MONE -- push -1

10 PSH -- 10-bit constant in instruction word

11 PUSH -- 32-bit constant (next word)

12 LDF -- replace top of stack from FP-based word it addresses

13 STF -- pop 2 words, top is FP-based address, next is word to store

14 GFR -- globalize frame ref, then tag as GL-relative

15 NEG -- negate top of stack

16 AND -- push the bitwise logical AND of the top two words

17 ADD -- push the sum of the top two words

18 MPY -- push the product of the top two words

19 DIV -- pop 2 words, divide top into next, push quotient then remainder

20 EQU -- push a boolean result, =1 if the top two words are equal

21 LSS -- push a boolean result, =1 if the top word is greater than the next

22 ROT3 -- remove the top word and insert it into the stack under the next two

23 SWAP -- exchange the top two words

24 POP -- remove the top word

25 DUPE -- push a copy of the top word

26 RNG -- replace top word on the stack with a boolean, =1 if next is in-range

27 GLOB -- tag address as GL-rel

28 MZERO -- push 80000000

29 SOS -- pop top word, then use it as in index into stack, to swap with the new top

30 SHFT -- pop top word, shift next left or (negative) right

31 ESC -- enables the secondary set... 0' ERRZ -- pop top word, take illegal interrupt if =0

1' EXIT -- function exit

2' COCALL -- cocall/interrupt exit

3' ENTER -- function entry

4' SIGNAL -- signal

5' WAIT -- wait

6' -- (reserved for future use)

7' -- (reserved for future use)

8' DEBUG -- enter debugger (not yet fully defined)

9' COPY -- pop 3 words: word count, destination and source addresses; copy words

10' BYCPY -- same, but each address is 2 words, including a byte offset, for a byte copy

11' -- (reserved for future use)

12' LDD -- based off FP, float-double load

13' STD -- based off FP, float-double store

14' -- (reserved for future use)

15' FNEG -- floating negative

16' -- (reserved for future use)

17' FADD -- floating add

18' FMPY -- floating multiply

19' FDIV -- floating divide

20' FEQU -- push a boolean result, =1 if the top two float doubles are equal

21' FLSS -- push a boolean result, =1 if the top float doubles is greater than the next

22' FLEQ -- push a boolean result, =1 if the top float doubles is greater than or equal to the next

23' BES -- swap top two words if Big-Endian, else do nothing

24' -- (reserved for future use)

25' -- (reserved for future use)

26' -- (reserved for future use)

27' -- (reserved for future use)

28' pSP -- push SP-BR, from the value SP was before push

29' pPC -- push PC-BR

30' pFP -- push FP-BR

31' pBR -- push BR

BZ There is only one branch instruction. It pops two words, an offset to be added to the PC if the next word is zero. To get an unconditional branch, just push a zero before the jump offset. To get an indexed jump, calculate the offset instead of pushing a constant. To jump to an absolute address, push the address, then push the PC, convert it to absolute, and subtract it (depending on whether you get this all into a single word, you may need an adjustment).

CALL All subroutine calls are indexed negatively in a BR-based table by number. The subroutine number is popped off the stack, the return address is pushed, then the PC is replaced by the offset found in the indexed table (BR-index). Any instructions remaining in the instruction word are discarded.

ENTER This pops one word off the stack, which it takes to be the number of local variable words to allocate, added to the SP after pushing the FP and setting the new FP to point to the old one on the stack. Thus the first variable is at offset +1, and the last argument pushed before the CALL is at offset -2.

EXIT This pops one word off the stack, which it takes to be the number of function parameters words to discard, after restoring the FP that was saved by ENTER and reloading the PC from where it was saved by CALL .

STF pops an address off the top of the stack, then the data to store.

COPY This is either a multi-word copy, or a multi-word constant fill. Push three values, a source value or address, then a destination address, then the number of words. If the word count is positive, that number of words is copied from source to destination; if negative, then the source value is replicated to fill that many words. Overlapping operand are copied properly with no collision. If the opcode is alone in its instruction word, then if an interrupt occurs during a long copy, the stack is restored to reflect the work already done, the interrupt is taken, and the copy resumes upon return. If there are other instructions in the instruction word, then it cannot be interrupted. BYCPY allows for character string copy.

RNG This is a generalized array bounds test. The top number is taken as the array upper bound, and the next word as the index. The index is left on the stack and the top word is replaced by a boolean, 1 means that the index is less than the upper bound and not negative. A special case occurs if the upper bound is negative: the next word below the index is takesn as a pointer to an array with the length stored at that address -1; that is the upper bound used instead of the given -1. This assumes that all index calculation has already been done.

GLOB There is no GL-based load or store; to access (global) variables at a fixed offset from the GL register, push the offset constant, then apply the GLOB instruction to subtract the FP from that offset. Subsequent LDF or STF instructions will get the right address.

GFR To pass a local variable address as an address, it needs to be tagged in such a way that the subroutine using it will not be confused into using its own FP as a base register. GFR adds the FP to an offset and tags it as GL-relative, which will work anywhere in this program, even if the entire data frame is subsequently relocated. To pass the address to another program requires absolutizing it (see UAA ).

NEG, ADD There is no subtract instruction. Just negate the top word and add.

DIV The integer divide operation produces both a quotient and a remainder. POP the one you don't want.

SHF The top word on the stack is a shift count. For simple shifts up to 31 bits in either direction (negative shifts right), just push the bit count. If the count is in the range -64 to -68, magic happens: the low 8 bits of the word to be shifted are shifted into the position where they belong for the proper endianness of the underlying hardware, or conversely the byte is shifted down to the low 8-bit position. There are built-in compiler functions to access these operations properly, they are too hard to describe. They make strings work right. Aren't virtual computers wonderful? You can do anything you want.

