Contents show]

Optimisation Edit

WARNING: SPOILERS! While no particular solution to any particular puzzle is given here, no guarantee is made that examples aren't very relevant to certain puzzles. You may find it is more fun to discover these strategies for yourself. You are recommended to try optimising as best you can without this guide, and then - once you are stuck for ways to find improvements - come back and see what other tricks you can try.

Introduction Edit

Optimisation is trying to improve one (or more) of the three level metrics (cycles, space, activity). The level score is given for the worst-case performance. The metrics are:

Cycles - the time it takes (in cycles) to complete the task. Space - the number of lines of instructions it takes to complete the task. Activity - the number of successful LINKs performed while completing a task.

General tricks Edit

Watch performance closely, and try to identify wasted cycles / instructions / links.

It may be necessary to sacrifice one metric to help another. Improved cycles performance can often be found using methods that use more space (paradoxically), or more LINKs

Activity optimisation Edit

Any time you use a link twice, consider whether it is necessary.

Save two jumps through a link by jumping once, and then using a REPL to create your second worker.

Save a return trip by sending information required back via GLOBAL M, and then dying.

Space optimisation Edit

If you have repeated sections, consider whether you can create a loop, or a unified section to JUMP to.

Avoid doing things like DROP ; HALT at the end of your file

DROP ; HALT does the same thing as simply HALT.

This also applies in places where the EXA would crash trying to go into the next section.

Re-order code so your EXA can just drop into the next section without having to JUMP there. (You may be able to avoid the MARK too).

A pattern like COPY M X ; ADDI X val X can be replaced with ADDI M val X.

A pattern like FJMP END ... MARK END ; DROP ; HALT is very similar to DIVI X T X

MARK takes up space, which may be counter intuitive for some people coming from other languages.

Using Canaries and Terminators can avoid writing clean-up code.

For example:

NOTE 19 INSTRUCTIONS REPL WRITER REPL READER HALT MARK WRITER COPY M X GRAB 300 MARK WLOOP TEST X = F FJMP WLOOP MODE COPY F X DROP HALT MARK READER GRAB 301 MODE COPY M F DROP HALT

can be simplified to:

NOTE 10 INSTRUCTIONS @REP 0 SWITCHING MODES HERE AVOIDS DUPLICATING MODE CALL LATER @END COPY M X MODE REPL READER NOTE NO NEED FOR REPL WRITER NOTE WRITER MARK UNNECCESSARY MARK WLOOP TEST X = F FJMP WLOOP COPY F M NOTE FALLING THROUGH TO READER NOTE CODE WILL CRASH ANYWAY MARK READER GRAB 301 COPY M F NOTE DROP AND HALT UNNEEDED

Cycle optimisation Edit

This is, to my mind, the most in depth and satisfying area to optimise. (Opinions may vary.)

Choose the right algorithm Edit

For each challenge, split it up into tasks, and handle each task individually. Some tasks may require others to be completed first.

Some consideration should be given to how tasks are split up. For example: a task like:

"Find out which values of A between 10 and 29 can have some operation performed on them" could be split into "Can 10 have this operation performed on it?"; "Can 11 have this operation performed on it?" ... and so on.

Potential problems Edit

Execution paths: if you have a task that can be done, but no EXA working on it, you could try adding more EXAs.

Communications: time spent sitting on the M channel to read/write is time spent not doing anything. This can be true even when your EXAs are well synchronised, since a send down M always takes at least two cycles.

Blocking conditions generally: if you find that any of these come up, see if you can figure a way to avoid waiting for stuff to clear. Sometimes this involves carefully synchronising your EXAs to dance a very precise dance. When this is done correctly, the low cycle count is is the second best feeling in the world (right behind seeing the dance you choreographed go correctly).

Excessive bookkeeping: mostly this affects loops and branches.

Too many or unnecessary instructions. Fewer instructions (normally) take less time to execute.

Strategies Edit

Micro-optimisations

Many of the tricks to reduce size in the Space Optimisation section are useful here. Reducing instruction count as well as saving cycles (e.g. ADDI F 10 X over COPY F X ; ADD X 10 X) will also free up space for when you want to trade space for cycles with other optimisations.

It's always worth examining your COPY commands, to see if the copied data has any work done it. You may be able to combine some of the work with the copy.

Re-ordering instructions

It is worth examining instruction orders, particularly around M read/writes, and REPL instructions.

For example, the earlier an EXA starts work, the earlier it finishes, so if there are instructions before a REPL that could be done after the REPL, it might be worth moving the REPL forward. (Of course, if the new EXA then shows up too early and messes with orderings, or blocks another EXA that's not so good.)

When doing an M read or write, you want to do this as close to the other EXA doing the write or read. If you find an EXA is waiting for a cycle or two before reading M, consider moving the read so it is an instruction or two later. If there are useful things to do in this time even better. It might also be worth looking to see if the write can be done earlier.

Clamping in two instructions

The naive method of clamping is something like:

NOTE X CANNOT BE HIGHER THAN 100 TEST X > 100 FJMP CONTINUE COPY 100 X MARK CONTINUE

This is 3 cycles and 4 instructions. It can be done with two cycles, however the EXA only stores values between -9999 and 9999. Trying to make it larger than 9999 will clamp it at 9999 (and making it smaller than -9999 will clamp at the other end). So you can limit X to 100 by going ADDI X (9999-100) X ; SUBI X (9999-100) X.

If you're passing this value to another EXA, you can even do the clamp for free:

NOTE SENDER ADDI X (9999 - max_value) M NOTE RECEIVER SUBI M (9999 - mav_value) X

Get magnitude in 1 instruction

This trick is one I haven't actually found use for in the game yet:

TEST X < 0 FJMP XPOSITIVE MULI X -1 X MARK XPOSITIVE

instead you can just say `SWIZ X 4321 X`.

You can also combine this with using SWIZ to multiply, divide, or modulo by 1000, 100 or 10, and so get both in a single instruction.

SWIZ can also be used to pack data, if values are between 0 and 99 it can be used to retrieve two values in one register.

There are lots of tricks with SWIZ.

Loop unrolling

When to use it: when you have loops - particularly small loops - and lots of spare space.

Consider this code, which writes the numbers 0 to 9 into a file:

COPY 10 T MARK WRITELOOP SUBI T 1 T SUBI 9 T F TJMP WRITELOOP

This takes 31 cycles to run, and spends about 2/3rds of the time doing bookkeeping instead of writing to the file.

We can do this in 10 cycles (the theoretical minimum - the file cannot receive values faster than that), and (as a bonus) this doesn't even clobber T:

@REP 10 COPY @{0,1} F @END

The main disadvantage here is that the first option is 5 instructions, and the second is 9. In this case it's not such a big difference, but if you were writing 0 to 99 to a file this would probably use too much space (100 instructions). In that case, though, you can always adopt a hybrid solution:

COPY 100 T MARK WRITELOOP SUBI 100 T X SUBI T 10 T @REP 10 ADDI X @{0,9} F @END TJMP WRITELOOP

This is 15 instructions altogether, but runs in 141 cycles, as opposed to the 301 it would take using a simple loop (and compared to the 100 cycles of the theoretical best case).

Loop unrolling can also be used when you don't know the full size of the loop, by testing if the loop size is greater than a certain threshold and then using an unrolled loop, and then having a standard loop for the last few items.

Parallelisation

Doing lots of things at once is often better than doing one thing at a time. The main way to parallelise in these puzzles is to form a pipeline. Messengers are good for this as you can have one EXA at every step of transferring the message. It means the message takes longer to get there, but you no longer stall reading from M, so the actual number of digits processed per second increases. If you had a loop where you did this: COPY #INPT M, you might stall waiting for M to be read. However, if instead you did a COPY #INPT X ; REPL MESSAGE, and have MESSAGE transfer the data directly, this would be less of a problem. Of course, it's two cycles to do this, but the other parallelisation strategy is "going wide", where two EXAs do the same thing in sync with each other. This means you could have one EXA reading from #INPT while the other REPLs, and then they switch. (This doesn't work if #INPT is a file.)

When things need to be done in a particular order, parallelisation can be tricky since you need to introduce synchronisation, but they are difficult fun problems to solve. While M channels can be used to synchronise EXAs, it's more fun to make it work without them.

One trick that can work is to save yourself a TEST and TJMP/FJMP by assuming either case could be true, and REPLing an EXA to follow the FALSE path, while following the TRUE path locally. This works if the wrong EXA will crash without causing any damage.