Recent performance improvements in Parrot to avoid aggressive buffer copying and to avoid unnecessary buffer reallocations demonstrate how bugs, mistakes, and design infelicities at the lowest levels of your program stack can have dramatic negative effects on the whole program.

In the case of Parrot, they also demonstrate a worthwhile experiment currently underway.

What if we forbade modifying strings in place?

It sounds crazy, but consider the evidence. Most of the strings in Parrot and Rakudo get read many more times than written. Yet because strings can change in place, many parts of the system which return strings must return copy-on-write string headers, to avoid those modifications.

For example, the Class PMC in Parrot contains a string which, if present, represents the name of that class. Given a class object, you can ask for that string. The Class must make a copy-on-write header for this operation. If it didn't, any modification performed on the string it returned would change the name of the class in place, even unintentionally.

Almost none of the uses of this introspection interface on classes throughout Parrot and Rakudo ever perform any modifications on that string. In other words, the interface prevents something rare and catastrophic from happening while penalizing the common behavior.

Certainly creating a new copy-on-write string should be cheap, and most of these strings become collectable garbage very quickly, but there's no garbage collection mechanism cheaper than not creating garbage at all.

The important change is to forbid all string modification functions from operating on buffers in place. Instead, they create new string headers return them directly. The caller has the responsibility of storing the modified string as appropriate. This pushes the allocation to the point of modification.

(Some might wonder "Why not modify the string in place and copy the old string?" That requires you to root around in memory to find all references to the existing string, and you're not going to do that quickly or safely without changing the way memory works throughout Parrot, or at least building a huge data structure to keep up to date about what's present where.)

There are two caveats to this system. First, there are legitimate performance reasons to allow in-place modification. Within a loop, appending to a single string can be much cheaper if you do allow modification: the rule about not creating and throwing away immediate garbage applies there. The right solution is some kind of StringBuilder container which allows in-place modification with immutable strings. (The JVM did, eventually, get this right.)

The second caveat is that high-level languages may support mutable string semantics. Yet a similar approach works for this as well; a high-level language should use a PMC as a container for primitive strings. The contents of that PMC can change, but if everything refers to that PMC, it effectively appears to change in place.

I expect measurable performance improvements from this experiment, somewhere between 5-10% on Rakudo benchmarks. Not only does this create far fewer garbage headers to collect (which decreases the amount of time spent in garbage collection), but it allows more pervasive sharing of constant strings. We have another experiment in progress to coalesce all identical strings into one representation, and that has great memory savings as well.

This experiment won't make it into Parrot 2.3, due on 20 April 2010, but we should have performance numbers by then. If it's suitable for merging, it should be available for Parrot 2.4 as well as Rakudo Star.