When bin-filling the binary instruction encoding, does your software take into account expected popularity of instructions to prefer common instructions for tighter encoding and uncommon instructions as an otherwise needlessly larger encoding to make room for the common ones?

Not really. What you describe is a simplified Gray coding, and we don’t use frequency in determining the encoding of the ops proper. We just determine the natural minimal encoding of each op, and pack for minimal total length. If there’s one op with so many arguments or information to represent that it takes several bits more than any other op then the algorithm will likely use half of the overall entropy just on that one op, and it will encode as the natural length plus one bit to distinguish between that one and all the rest. And so on, for minimal total length.

That said, there are cases in the encoding in which we are in effect encoding by frequency. For example, excess entropy in the exu block (the computational 2-in-on-out ops like ADD) is soaked up by immediate forms, such as add immediate and equal immediate. The range of immediate values supported in any given slot is determined by how much entropy is left over; we add immediate values until we almost but don’t actually go over a bit boundary. The added supported values are small integers, in order, so one slot may support an add-immediate with values 1-17 and the next slot with values 1-243, depending on the rest of the encoding.

This algorithm is in a sense a Gray code, because immediate value usage in actual programs drops in frequency with increasing value – add(3) is more common than add(27). So the algorithm is choosing more frequent operations to encode specially, even though the encoded bit length in any given slot is fixed.

There are other places where the encoding is frequency-sensitive, but in all of them the situation is like with the immediates: we don’t pack single operations by frequency, but we do pack multi-operation sequences by frequency. If there is not an immediate form with the right immediate value, then an expression such as X+243 must be scheduled and encoded as two ops, an con to get the constant on the belt, and then the add. Doing it in one op has better entropy, and also improves schedule latency and slot and belt pressure, so there really is frequency consideration in the algorithm, just not in the encoding of single ops.