I think the time has come for a standard programming language safety score. I want to use this model to help show that the concept of safety is much more nuanced than a binary bit of “has strong-static types”.

When someone says “programming language safety”, it typically invokes thoughts of unit tests, long build times, and red squiggles in an IDE. But, in day-to-day development, there are so many times when we are bitten by things that somehow just slip through the cracks.

I put together this scoring model to get a sense of how safe a language is at the primitive level, and if it isn’t safe by default, how much it costs to manually make it safe. Since all abstractions eventually result in a series of primitive operations, I decided that focusing only on primitives would still be a valuable (if incomplete) data point. While any good library will handle all primitive checks and present the consumer with a well-designed abstraction, in the end, the consumer is still left wiring libraries together, building their own primitive abstractions for integration. Due to the impossibility of measuring the quality of abstractions in all libraries for a language, I left that entirely out of scope of this model, unless it is designed as a primitive check.

By focusing on only primitive operations: making and calling functions, naming data, working with sequences, and dealing with language primitive data types, I slimmed down the large range of possible error vectors to a small handful. While in some languages it is common to use user-defined classes to wrap around a set of primitives, those classes are still doing the same primitive work, just hidden behind a user-created abstraction. The more ways it is possible to make a “mistake” with a primitive, the more difficult it is to build such good abstractions.

This model is not about language "power".

This model is not about ranking the “power”, “expressiveness”, or “abstract-ability” of a language. In any language that supports abstractions (functions, classes, modules, naming data), I am convinced, given enough code, all Turing complete langauges can do the same work. This model is only about the costs to prevent unexpected “confusion” between the programmer and the machine at the primitive level.

Rather than focus on what is possible with a language, I will instead focus on what is typically idiomatic to that community. For example, if it is possible to achieve a level of safety in a language but by doing something uncommon, that should not be counted.

To score a language, simply figure out how many characters it costs to “prevent” a certain type of error, and add that to the total. Newlines, spaces, and tabs do not count, but all other punctuation does. If a specific check is language enforced, like F#’s Option or C#’s parameter type enforcement, that is given a -30 (by default) to make up for the lack of unit tests and code exercising needed to run that “path”. Do not count import lines for libraries, as importing the module will have a negligible effect on the code size and complexity.

If there is a safety feature that is not possible to achieve programmatically, we will add +30 (by default) for a “every change run and debug to fix” cost, such as Java not having a way to prevent stack overflow exceptions caused by recursion.

A lower score is "safer", needing less (or no) code to achieve the same level of safety.

Rather than tell you my thoughts (or survey for) hard-coded weightings, all checks are weighted the same by default. Feel free to apply your own weightings, to better match to your or your team’s specific needs and preferences. The languages are masked by default to protect the innocent. You can unmask the names and see the code used below the table.