Would You Bet $100,000,000 on Your Pet Programming Language?

Here's my proposition: I need an application developed and if you can deliver it on time I'll pay you $100,000,000 (USD). It doesn't involve solving impossible problems, but difficult and messy problems: yes.

What language can you use to write it? Doesn't matter to me. It's perfectly fine to use multiple languages; I've got no hangups about that. All that matters is that it gets done and works.

As with any big project, the specs will undoubtedly change along the way. I promise not to completely confuse the direction of things with random requests. Could you add an image editor with all the features of Photoshop, plus a couple of enhancements? What about automatic translation between Korean and Polish? 3D fuzzy llamas you can ride around if network transfers take a long while? None of that. But there are some more realistic things I could see happening:

You need to handle data sets five times larger than anticipated.

I also want this to run on some custom ARM-based hardware, so be sure you can port to it.

Intel announced a 20 core chip, so the code needs to scale up to that level of processing power.

And also...hang on, phone call.

Sadly, I just found out that Google is no longer interested in buying my weblog, so I'm rescinding my $100,000,000 offer. Sigh.

But imagine if the offer were true? Would you bet a hundred million dollars on your pet language being up to the task? And how would it change your criteria for judging programming languages? Here's my view:

Libraries are much more important than core language features. Cayenne may have dependent types (cool!), but are there bindings for Flash file creation and a native-look GUI? Is there a Rich Text Format parsing library for D? What about fetching files via ftp from Mercury? Do you really want to write an SVG decoder for Clean?

Reliability and proven tools are even more important than libraries. Has anyone ever attempted a similar problem in Dolphin Smalltalk or Chicken Scheme or Wallaby Haskell...er, I mean Haskell? Has anyone ever attempted a problem of this scope at all in that language? Do you know that the compiler won't get exponentially slower when fed large programs? Can the profiler handle such large programs? Do you know how to track down why small variations in how a function is written result in bizarre spikes in memory usage? Have some of the useful but still experimental features been banged on by people working in a production environment? Are the Windows versions of the tools actually used by some of the core developers or is it viewed as a second rate platform? Will native compilation of a big project result in so much code that there's a global slowdown (something actually true of mid-1990s Erlang to C translators)?

You're more dependent on the decisions made by the language implementers than you think. Sure, toy textbook problems and tutorial examples always seem to work out beautifully. But at some point you'll find yourself dependent on some of the obscure corners of the compiler or run-time system, some odd case that didn't matter at all for the problem domain the language was created for, but has a very large impact on what you're trying to do.

Say you've got a program that operates on a large set of floating point values. Hundreds of megabytes of floating point values. And then one day, your Objective Caml program runs out of memory and dies. You were smart of course, and knew that floating point numbers are boxed most of the time in OCaml, causing them to be larger than necessary. But arrays of floats are always unboxed, so that's what you used for the big data structures. And you're still out of memory. The problem is that "float" in OCaml means "double." In C it would be a snap to switch from the 64-bit double type to single precision 32-bit floats, instantly saving hundreds of megabytes. Unfortunately, this is something that was never considered important by the OCaml implementers, so you've got to go in and mess with the compiler to change it. I'm not picking on OCaml here; the same issue applies to many languages with floating point types.

A similar, but harder to fix, example is if you discover that at a certain data set size, garbage collection crosses the line from "only noticeable if you're paying attention" to "bug report of the program going catatonic for a several seconds." The garbage collector has already been carefully optimized, and it uses multiple generations, but there's always that point when the oldest generation needs to be scavenged and you sit helplessly while half a gigabyte of complex structures are traversed and copied. Can you fix this? That a theoretically better garbage collection methodology exists on paper somewhere isn't going to make this problem vanish.

By now fans of all sorts of underdog programming languages are lighting torches and collecting rotten fruit. And, really, I'm not trying to put down specific languages. When I'm on trial I can easily be accused of showing some favor to Forth and having spent time tinkering with J (which honestly does look like line noise in a way that would blow the minds of critics who level that charge against Perl). Yes, I'm a recovering language dilettante.

Real projects with tangible rewards do change my perceptions, however. With a $100,000,000 carrot hanging in front of me, I'd be looking solely at the real issues involved with the problem. Purely academic research projects immediately look ridiculous and scary. I'd become very open to writing key parts of an application in C, because that puts the final say on overall data sizes back in my control, instead finding out much later that the language system designer made choices about tagging and alignment and garbage collection that are at odds with my end goals. Python and Erlang get immediate boosts for having been used in large commercial projects, though each clearly has different strengths and weaknesses; I'd be worried about both of them if I needed to support some odd, non-UNIXy embedded hardware.

What would you do? And if a hundred million dollars changes your approach to getting things done in a quick and reliable fashion, then why isn't it your standard approach?

permalink December 23, 2007

previously