How would the CLR Be Different? Tuesday, January 13, 2009

UPDATED: Added improved generics with higher-kinded polymorphism

There was a good discussion on Twitter a couple of nights ago that arose due to some issues that with an expression that might return a value, or might not (void) and how you handle them. From those questions an interesting question was posed by Ted Neward, “Knowing what we know now, how would you change the CLR?” Note that this isn’t necessarily a language discussion, but how the underlying framework actually works. It’s a good question that I’ll just lightly dive into, but what I really want to know is, where are the pain points?

If I Had Only Known…

There were a few things to came to mind immediately on how I should answer this. I’ve been bitten by a few items that I’ve seen as limitations imposed on me. I’ve thought a bit about these after my time in Haskell, F# and other languages to come up with a nice list. Some thoughts from Michael Feathers on his ideal language also solidified my thoughts. Let’s go through just a few of them.

Void not treated as a generic argument type

Non-null references

Make immutability easier

Sheer complexity of Code Access Security

Pluggable JIT

Improved generics with higher kinded polymorphism

What do I mean by each of these? First is the infamous System.Void not treated properly as a type. I’ve covered this in the past in my functional C# posts here. As noted, the ECMA Standard 335, Partition II, Section 9.4 "Instantiating generic types" states:

The following kinds of type cannot be used as arguments in instantiations (of generic types or methods): Byref types (e.g., System.Generic.Collection.List`1<string&> is invalid)

Value types that contain fields that can point into the CIL evaluation stack (e.g.,List<System.RuntimeArgumentHandle>)

void (e.g., List<System.Void> is invalid)

This means that I cannot fully generalize functions and then have to differentiate between the Func<TResult> and Action delegates. In F#, they get around this issue by exposing another type of void, the Unit otherwise known as the empty tuple, so that you can handle those differences. Then, ultimately, it’s up to the compiler to decide what the return should be, whether it gets compiled to void or Unit. I think it should have been allow for this behavior in the BCL, and then it’s up to the language implementation to allow or disallow this behavior.

The second item is the non-null references. One QCon London 2009 presentation caught my eye recently on this very topic, by Tony Hoare, entitled "Null References: The Billion Dollar Mistake". The session is described as the following:

I call it my billion-dollar mistake. It was the invention of the null reference in 1965. At that time, I was designing the first comprehensive type system for references in an object oriented language (ALGOL W). My goal was to ensure that all use of references should be absolutely safe, with checking performed automatically by the compiler. But I couldn't resist the temptation to put in a null reference, simply because it was so easy to implement. This has led to innumerable errors, vulnerabilities, and system crashes, which have probably caused a billion dollars of pain and damage in the last forty years. In recent years, a number of program analysers like PREfix and PREfast in Microsoft have been used to check references, and give warnings if there is a risk they may be non-null. More recent programming languages like Spec# have introduced declarations for non-null references. This is the solution, which I rejected in 1965.

I think the abstract alone describes the problem quite well. Indeed, technologies such as Spec# introduced features to allow for non-null references and is a great piece of technology. There is also a switch that sets this behavior as default and then an opt-out option for all those variables that allow null references. But there are some issues of course. Let’s define a quick example of an ArrayList that takes an existing non-null ICollection interface.

public ArrayList (ICollection! c)

modifies c.*;

ensures _size /*Count*/ == c.Count;

{

_items = new object [c.Count];

base ();

InsertRangeWorker( 0 , c);

}

This looks rather straight forward in terms of the bang notation to specify the non-null behavior, but unfortunately, when compiled down to IL, is handled in a rather ugly way through the use of a modopt, such as the following:

public ArrayList(ICollection modopt(NonNullType) c) { ...

My CodeBetter, Greg Young colleague has noted his objections to the modopt in the past such as here. So, there are issues in the CLR which prevent us from having this rich behavior at this time.

Moving onto the third item brings us to making immutability easier. This way, we can specify that certain classes, fields, parameters and so on, once assigned, cannot change. This metadata can then be used by the JIT to take advantage of it and further optimize. The information is there, but not used in the way I would think it should be.

The fourth item is the sheer complexity of Code Access Security (CAS). Does anyone really understand it, let alone use it? Anyone? * crickets * The ideas seem noble, but I cannot honestly say I’ve seen this used in practice.

The fifth item on the list is dealing with a more pluggable JIT, so that it opens a pipeline for us to do further refining. For example, on constrained systems, we want to further optimize the IL.

Another item that Lennart touched upon below in his comments and me in turn in my last post on monadic substitution was around higher-kinded polymorphism in the CLR generics. Type classes in Haskell for example, provide this example, don’t need to take a type variable of kind *, but take one of any kind. An example is the Haskell monad class such as this:

class Monad m where

(>>=) :: m a -> (a -> m b) -> m b

return :: a -> m a



instance Monad Maybe where

(Just x) >>= k = k x

Nothing >>= _ = Nothing



( Just _) >> k = k

Nothing >> _ = Nothing



return = Just

fail _ = Nothing

In the previous post, I wanted to accomplish something like this which would allow me to build a generic monad builder and then extend the option type to be a part of this:

type MonadBuilder<'M> =

abstract member Bind : 'M<'a> * ( 'a -> 'M<'b> ) -> 'M<'b>

abstract member Return : 'a -> 'M<'a>

abstract member Delay : ( unit -> 'a ) -> 'a



let m =

{ new MonadBuilder<option> with

member x . Bind ( x : 'a option, k : 'a -> 'b option ) : 'b option =

match x, k with

| Some x, k -> k x

| None , _ -> None

member x . Return ( x ) = Some x

member x . Delay ( f ) = f ( )

}



let res = m { return ! Some 42 }

Unfortunately, something such as this is impossible given the state of our generics implementation. That’s not to say that we can’t do type classes, because we can in a very limited way and I’ll cover that in another post in regards to type classes for QuickCheck. Hopefully that’s on the table for a future version of F#. Even if F# fixes this issue, it still will be impossible at the CLR level without some sort of hackery.

But Is That All?

There are other issues such as generic constraints and such, but my thoughts aren’t fully thought out as far as what they should be right now. So, I’ll open it up to you, keeping in mind we’re talking about the CLR and not the BCL nor any language implementation. Knowing then what you know now, how would the CLR be different?