BOX-256 Threads

The key to creating fast solutions to the BOX-256 challenges is to have a number of threads working in parallel. New threads can be created with the THR instruction. Every cycle each thread executes one instruction.

Before each cycle the memory is stored in a buffer. Memory reads come from the buffer but instructions executed are loaded from the current memory. The program counter for the first thread is stored at the end of memory (0xFFh). The PC for the next thread is stored at 0xFEh, etc.

There are two techniques for launching multiple processes. We borrow the terminology from Core War.

Vector Launched Threads

To create a number of parallel threads subtract one from the number required, convert the result to binary and code a THR 004 for every 1-bit and a MOV @00 @XX 004 for every 0-bit where XX is the address of the current instruction.

For example to create 23 processes, subtract one = 22, convert to binary = 10110:

00 - THR 004 ; 04 - MOV @00 @04 004 ; 08 - THR 004 ; create 23 parallel threads 0C - THR 004 ; 10 - MOV @00 @10 004 ;

At this point the parallel threads are all executing the same address. To dispatch the threads to different addresses use a MOV to copy an array of locations over the program counters:

14 - MOV @40 @E9 017 ; copy an array of 23 (0x17h) addresses over the PCs ... 40 - 0?? 0?? 0?? 0?? 44 - 0?? 0?? 0?? 0?? 48 - 0?? 0?? 0?? 0?? 4C - 0?? 0?? 0?? 0?? 50 - 0?? 0?? 0?? 0?? 54 - 0?? 0?? 0?? 000

The program counters are stored at the end of memory. Remember the thread at PC 0xFFh will execute first, the thread at PC 0xFEh next, etc. Threads can be pinned into place so they execute the same instruction over and over by sending one thread back to execute the MOV again:

14 - MOV @40 @E9 017 ; copy an array of 23 (0x17h) addresses over the PCs ... 40 - 0?? 0?? 0?? 0?? 44 - 0?? 0?? 0?? 0?? 48 - 0?? 0?? 0?? 0?? 4C - 0?? 0?? 0?? 0?? 50 - 0?? 0?? 0?? 0?? 54 - 0?? 0?? 014 000 ; 1st thread jumps back to execute the MOV at 0x14h

Remember memory is buffered at the start of each cycle. Memory reads come from the buffer, but instructions executed are loaded from the current memory so it's possible for one thread to modify the instruction another thread will execute.

A practical example - 5 pinned threads display a Siérpinski triangle:

00 - THR 004 ; 04 - MOV @00 @04 004 ; create 5 parallel threads 08 - MOV @04 @08 004 ; 0C - MOV @10 @FB 005 ; copy an array of 5 addresses over the PCs 10 - 024 020 01C 018 14 - 00C 000 000 000 ; 1st thread jumps back to execute the MOV at 0x0Ch 18 - PIX 000 @29 000 ; 2nd thread 1C - MOV @29 @28 010 ; 3rd thread 20 - ADD @29 @28 @38 ; 4th thread 24 - ADD 001 @19 @19 ; 5th thread 28 - 000 008 000 000 ; seed data for Siérpinski triangle

Binary Launched Threads

Another technique for launching multiple threads is a binary tree. THR is a branch node and each leaf node contains 2–3 instructions. Binary launched threads can display any image of up to x ≤ ~45 pixels in ⌈1+log 2 x⌉ cycles.

A practical example - 8 binary launched threads display a square: