There’s one more topic to discuss first, what bytecode actually is / looks like.

Instructions, either in machine code, assembly language, or a bytecode, tell the machine what to do. They’re made up of an opcode (operation code I guess?) and optionally some operands (parameters for the opcode). The opcode of an instruction tells the machine what to do, and the operands tell it what to do it with. I was going to use some x86 as an example here, however x86 has opcodes from 1-3 bytes long, optional prefix bytes and about 4 different types of operands (also of different lengths) instructions are anywhere from 1 to 15 bytes long, and encoding is very complex, sometimes operands are actually implied by the opcode, other times parts of them (like a high bit) is given as part of the prefix (for the extra registers in x86_64). The point I was going to make here, with a nice tidy example, was that real machines and virtual machines use the same opcode followed by operands paradigm.

In some CPU architectures, and many bytecode formats, the opcode is a fixed number of bytes (usually 1). Followed by the operands. This is true in the JVM. In the JVM an increment instruction looks like this:

84 02 01

This increments the 2nd local by 1. 84 is the opcode meaning 'increment' and 02 and 01 are each of the two operands, 02 selects the 2nd local variable, and 01 is how much to increment by. In this case 02 can be considered a selector, it tells the machine what to perform the operation on (the 2nd local variable), in a register machine this may refer to one of the registers. While 01 can be considered an immediate value, a value used by the program that is stored in the instruction stream. The JVM, PZ, and many other virtual machines are stack-based machines meaning they have no general-purpose registers and operations are almost-always performed on the values located at the top of the stack.

Plasma currently has two different bytecode formats, one for on disk, and one for in memory. On disk a PZ (that’s the Plasma abstract machine) bytecode instruction has a one byte opcode, two separate one byte operand size bytes, and an optional immediate value (pz_format.h). The opcode says whether the other bytes will be present. An instruction like 'add' has a single operand size byte, this tells the interpreter what size of data 'add' should add eg: 8-bit, 32-bit, pointer-sized etc data. Most instructions require one of these bytes, some don’t (like 'roll' and some require two like 'sign-extend'. 'Load' and 'call' instructions require an immediate value. For 'load' instructions this says what to place on the stack, for 'call' it is the number of the procedure to call.

While Plasma is under development the format is not fixed, so it doesn’t make sense to talk about the constants for particular opcodes and operand sizes, so there is no example here.

These instructions follow one-after-another to form an instruction stream. The program will execute each instruction in the stream one after another unless it encounters something like a 'call', 'jump' or 'return' instruction. For example if in the JVM we incremented local 2 by 1, then local 3 by 2. We would have an instruction stream like

84 02 01 84 03 02