

"After memory safety, what do you think is the next big step for compiled languages to take?"



Broadly applicable problem areas

Modules

Errors

Coroutines, async/await, "user-visible" asynchronicity

Effect systems, more generally

Extended static checking (ESC), refinement types, general dependent-typed languages

Actual Formalization / Metatheory

Grab bag of potential extras for mainstream languages

Session types, behavioural types, choreographies

Parametric mutability, permission / state polymorphism, ???

Richer patterns

Unit-of-measure types

Cost-model type systems

Heterogeneous memory and parallelism

Open implementations

The rest...

Warning: this has turned out to be a .. long post.Recently, on the twitters, Stephanie Hurlburt suggested that it'd be healthy for people who have been around the computering industry for a while (*cough cough*) to take some "audience questions" from strangers. I obliged, and someone asked me an interesting one Setting aside the fact that "compiled" languages have had various more-or-less credible forms of "memory safety" for quite a long time, I agree (obviously!) that cementing memory safety as table stakes in all niches of language design -- especially systems languages -- continues to be an important goal; but also that there's also lots more to do! So I figured I'd take a moment to elaborate on some areas that we're still well short of ideal in; maybe some future language engineers can find inspiration in some of these notes.Before proceeding, I should emphasize: these are personal and subjective beliefs, about which I'm not especially interested in arguing (so will not entertain debate in comments unless you have something actually-constructive to add); people in the internet are Very Passionate about these topics and I am frankly a bit tired of the level of Passion that often accompanies the matter. Furthermore these opinions do not in any way represent the opinions of my employer. This is a personal blog I write in my off-hours. Apple has a nice, solid language that I'm very happy to be working on, and this musing doesn't relate to that. I believe Swift represents significant progress in the mainstream state of the art, as I said back when it was released That all said, what might the future hold in other languages?These are either ubiquitous abstractions or problems that still need work, that will need work in basically any mainstream language that comes next, as far as I can tell.This might come as a surprise to hear, but most languages have module systems with serious weaknesses. And I agree with Bob Harper in his assessment that Modules Matter Most . Many languages have no module system at all, and many that have one have it only as a way of (say) managing namespaces or compilation order. More-powerful module systems exist -- you'll have run into some of the components if you've worked with dependent types, type classes, traits, signatures, functors -- but there's a bewildering array of design constraints to navigate (generativity, opacity, stratification, coherence, subtyping, higher-order-ness, first-class-ness, separate compilation, extensibility, recursion) when arriving at a practical, usable module system. Few if any languages have "done this right" in a convincing enough way that I'd say the problem is solved. The leading research in the field, at the moment, is probably Andreas Rossberg's work on 1ML. But there are decades of ground-work that you should really, really read basically all of if you're going to explore this space.(Writing this makes me think it deserves a footnote / warning: if while reading these remarks, you feel that modules -- or anything else I'm going to mention here -- are a "simple thing" that's easy to get right, with obvious right answers, I'm going to suggest you're likely suffering some mixture of Stockholm syndrome induced by your current favourite language, Engineer syndrome, and/or Dunning–Kruger effect. Literally thousands of extremely skilled people have spent their lives banging their heads against these problems, and every shipping system has Serious Issues they simply don't deal with right.)This too might feel like a surprise, but I'm not convinced that we've "solved" error management, in general, in any language. We have a few credible schools of design-philosophy mostly hold together well enough to ship a language: algebraic effects and handlers, checked and unchecked exceptions, crash-failure with partitions and supervision trees, monads, result sum-types, condition / restart systems, transactions with rollbacks; but none of them completely solves the design space well enough that the problem feels "done". Even committing to one or another such regime -- and it really does involve not just a few mechanisms, but a whole set of interlocking protocols that support error management -- there are serious abstraction leakages and design tradeoffs in nearly every known approach (modularity, compositionality, locality, synchronicity, soundness, cognitive load, implementation costs, interface costs with different regimes). Errors are absurdly hard to get right.Daira Hopwood has some notes and design material in hir work on the Noether language , and there's (again) a lot to read before deciding how to proceed. Doing it well takes you on a long journey through both a lot of fussy technical material and a lot of highly subjective, downright philosophical topics too, like "how to defend yourself against your own mistakes" and "what does it mean for a value to be right or wrong".It's in vogue at the moment for new languages to have something like async/await. This does not mean it's a done deal: lots has been done, but lots is still messy. The boundary between synchronous-world and asynchronous world -- in terms of types, control flow, correctness, errors, modularity, composition -- is still very awkward. Whether and how to mitigate between different synchronicity regimes, especially across FFIs or differing runtimes, is hard. Integration with effects is hard. Integration with parallelism is hard. Which parts need to be supported by the language and which parts surface to the user is hard. Cognitive load is still very high.There's still not a strong consensus on where and how to integrate effects into mainstream compiled languages. The type-systems research world seems to have blown past this point, and now speaks breezily of "type and effect systems" as though they're a solved problem; but most working programmers in most languages have no access to any meaningful effect system, much less a state of the art, extensible, inference-heavy one. Languages like Eff or Koka are leading in promising directions, but it's still far from mainstream or solved: modularity, polymorphism and encoding issues abound, as does the general cognitive load of the feature.This has been revisited over the decades of language design more often than Godzilla movies have been remade, and it's because it's a fundamentally good idea, it's just very hard, in terms of the design space.The idea is to embed a "logic" in your type system (the boundary is formally not really there , but notationally and cognitively it sure is!) such that users regularly, freely mix their use of types that assert the in-memory shapes of data and functions, with other "logical types" that assert some classes of more-general, (semi-)computable predicates about those data and functions. In other words, let the user write "general" correctness conditions about their programs in a full-blown (but say, primitive-recursive) expression language, and have the "type system" statically check (prove) those conditions always hold (or emit an error message showing a counterexample, just like a type error).These systems are usually a few steps back from "full blown" higher-order dependent type theory proof assistants, a la Isabelle, Coq, Lean or such; though in some cases the type systems inch into the same territory. The design problem hinges on the annotation burden being low enough that the user doesn't give up in frustration: like most type systems, the user always has to provide some information for the system to infer the rest, but the balance can tilt pretty dramatically towards "way too much annotation" when more-expressive logics enter the game. In many cases, the scale of the annotations overwhelm the program being annotated, and the maintenance burden of those annotations (as the user edits the program) are greater than the coding burden. Which, as a language designer, is Really Not A Good User Experience.So far, most exercises in this space have ended in frustration, or have had limited application to areas with much higher costs for failure (safety-critical embedded systems, etc.) Most mainstream languages have decidedly more .. uh, decidable type systems than ESC, refinement typing or dependent-typed language projects have proposed. But this does not mean it's a dead end. It means that (if you're among the faithful, which I am) it's a design space that's not yet had its breakthrough product! Cell phones weren't a thing either, until they were. Maybe in another decade or two, we'll be looking back on these as the dark ages of boring types.Projects in this space run a bit of a spectrum of levels of expressivity in their type systems (and degree to which they unify "logic" and "types"), as well as the spectrum from research testbed to attempt at mainstream viability; I don't feel qualified (and this isn't the place) to do a detailed compare-and-contrast job, so I'll just dump some links. If you want to play in this space, you ought to study at least Sage Dependent Haskell , and Liquid Haskell Be prepared if you venture into this area: the complexity wall here can make other areas of computering look .. a bit like child's play. It gets very dense, very fast. The comic about how computer people would write math books ? This stuff all reads like that.Speaking of formal logic: an unfortunate fact about most languages -- even simple ones with very pedestrian type systems -- is that they usually ship with next-to-no formalization of their semantics, nor proofs that any such formalizations have any interesting metatheoretic properties (eg. soundness). We tend to just make do with testing and informal, whiteboard-and-email level reasoning about the systems, hoping that'll get us close enough to correct that it'll be ok.This approach is, to say the least, wearing thin. Compilers still have serious bugs decades after they ship, failing to implement the languages they're supposed to. Worse, languages in the field still have serious design flaws decades after they ship, failing to uphold safety properties when subject to formal analysis. Long term, we have to get to the point where we ship languages -- and implementations -- with strong, proven foundations. There are promising moves in this direction, both in designed-for-language-designer tools like K framework or Redex , and in the general set of libraries and projects being undertaken in general proof assistants like Isabelle and Coq.These are issues with (IMO) potential applicability to mainstream languages, but I think a little less clear of a foundational role in structuring the language; a little more like "features it'd be nice to include in the design". Some have come and gone before, others are still in research form.There's a family of work called Behavioural Types , Session Types or Choreographies, that involve describing the possible interaction traces of both sides, or all sides, of a multi-party interaction, into a single "global type" -- the type of the interaction itself -- and then systematically refining / decomposing that type into a set of separate but dual endpoint-types, that each enforce only-legal-transitions (in sending, recieving and state-changing senses) on the participants.There are various ways to encode this idea in a language, and lots of components; people have done encodings and experiments, even some prototype languages (eg. Links and Scribble ), but nothing that's broken through to the mainstream yet. But I think it's a very promising area that's not too hard to understand. Lots of minable papers , good potential return on investment for taking the research mainstream.Here's a small but tasty question that's unfortunately awkward to answer in most languages with mutability control: how often do you find yourself having to write two copies of a function, that differ only in the mutability qualifiers associated with its parameters or return values? Or, further: does the system of function types and effects scale to expressing polymorphism over the mutable-ness of a type, such that you can fold (say) a higher-order mutating function over a mutable value with the same higher-order function that you'd use to fold a non-mutating function over an immutable value?(Or I guess if you're a functional nerd: how often do you find yourself writing lifts between monads?)Newer languages try to enforce fancier reader/writer regimes (fractional permissions, borrow checkers, etc.) but often at some cost to parametricity / polymorphism. And this is not new! Open your average C++ file and look at the number of different ways you have to const-qualify a library's methods to make it compose correcly with all of its users. There's a form of abstraction and polymorphism seemingly missing in here.Another small, tasty question: what's the most complicated pattern you can write in a match expression in your language? We have plenty of automata theory for compiling-down all sorts of complicated patterns to efficient matchers, and we have tons of evidence that users enjoy writing pattern-based programs, that users write shorter, simpler and less failure-prone programs when they can express a task in pattern/action form, and that pattern-sets are amenable to extremely helpful static analysis / diagnosis along the lines of exhaustiveness checking, equivalence checking, emptiness checking and so forth. So why not push the patterns in your language to the limit?I think this is a very fruitful area to explore, especially building out from tree-patterns and visibly-pushdown patterns. Some languages make the pattern system extensible, eg. F# active patterns at the cost of formal reasoning about it at compile time; others push a specific extended pattern-automaton theory into the type system. Many languages in this space, eg. CDuce or FAST , are (unfortunately) "XML oriented" which turns people off the technology lately, since XML is out of fashion, but they're worth looking into!Physical "units" attached to scalars, with arithmetic operators distributing through to the units: a little language support supposedly goes a long way. Frink pioneered some techniques here, F# picked it up (as with many language experiments!) and I do not know if it's "done" yet or even if users consider it a universally welcome idea, but it seems to me that it has potential. It's "small" but rather important not to mix units!This is a family of features for a type system wherein a complexity bound (a la Big-O notation) is calculated on the resources consumed by performing a computation, letting you "typecheck" the cost-model of a program. The language RAML is one of the more promising general-purpose works in this space, though there's also a long line of work in WCET analysis of embedded systems timed automata and so forth.These are languages that try to provide abstract "levels" of control flow and data batching/locality, into which a program can cast itself, to permit exploitation of heterogeneous computers (systems with multiple CPUs, or mixed CPU/GPUs, or coprocessors, clusters, etc.)Languages in this space -- Chapel Legion -- haven't caught on much yet, and seem to be largely overshadowed by manual, not-as-abstract or not-as-language-integrated systems: either cluster-specific tech (like MPI) or GPU-specific tech like OpenCL/CUDA. But these still feel clunky, and I think there's a potential for the language-supported approaches to come out ahead in the long run.Many older and more "dynamic" high-level languages (Lisps, Smalltalks, Forths) were designed around a kind of uniform programs-as-data model, and the presence / presupposition that the compiler would always be in-process with your program: programs thus more commonly invoked (and extended) the implementation significantly at runtime, did a lot of dynamic metaprogramming, reflection, and so forth. This was maybe a kinder, gentler time when "arbitrary code execution" wasn't quite so synonymous with "security nightmare"; but it also had a sort of internal logic, represents a design aesthetic that puts pressure on the language machinery itself to be programmable: pressure to keep the language, its syntax, its type system, its compilation model and so forth all simple, uniform, programmable.We've since been through a few different eras of language sensibilities around this sort of thing,including some imaginary mobile-code stories like Telescript Obliq , and eventually JVM/CLR. These latter were weird since they tried to be mobile (which rarely worked), and tried to have semi-open implementations (at least compilers-as-libraries and some access to the bytecode loaders) but didn't quite make it to the point where it was easy or obvious to do source-level metaprogramming (with the notable exceptions of F# quotations and F# type providers ). But through all this, in the background there's been this somewhat unfortunate, competing "grown-up" model that tends to dominate mainstream languages (everything from FORTRAN to C++): a pretty complex grammar and AST, a pretty hairy compilation model, a very heavy batch-compiler that's definitely not part of the normal process runtime, and programs that seldom do any metaprogramming, even in cases where it'd be appropriate. Recent "compiled" languages have adopted this style, I suspect in part because LLVM is simply shaped that way, and I suspect also in part as a response to negative experiences with both JVM/CLR environments and overzealous use of metaprogramming in scripting languages.I don't think, however, that the baby ought to be thrown out with the bathwater. I don't think a few bad open implementations invalidates the idea, any more than a few bad static type systems invalidates that idea. They can be done well. Julia for example has quite a nice static type system and compiler, but also a uniform syntax that's friendly to dynamic metaprogramming and JIT'ing. There are also several static metaprogramming and staged-programming systems: MetaOcaml ScalaMeta and so forth. So .. there's a spectrum, a design space.I'm not sure exactly where to go with this topic, except to say I'm a bit dissatisfied with how hard it is to do tooling for current languages, how large the feedback cycle is between a language and its own (meta)programming tools, how distant the tools are from the users, and perhaps to point out that dynamic compilation is not entirely dead: we appear to be entering an era with a new high-integrity universal bytecode sandbox , designed for mobile code and dynamic JIT'ing, and with a lot of industrial support. It might be an interesting time to consider projects (even "static" ones) that take a slightly more nuanced view of the code/data relationship, the program/metaprogram/compiler relationship, and make the whole compilation model a little more .. pliant (yes that was a Pliant reference and if you remember what that was, congratulations you've been on the internet too long, here's your TUNES badge of merit).I had some extended notes here about "less-mainstream paradigms" and/or "things I wouldn't even recommend pursuing", but on reflection, I think it's kinda a bummer to draw too much attention to them. So I'll just leave it at a short list: actors, software transactional memory, lazy evaluation, backtracking, memoizing, "graphical" and/or two-dimensional languages, and user-extensible syntax. If someone's considering basing a language on those, I'd .. somewhat warn against it. Not because I didn't want them to work -- heck, I've tried to make a few work quite hard! -- but in practice, the cost:benefit ratio doesn't seem to turn out really well. Or hasn't when I've tried, or in (most) languages I've seen. But who knows? There are always exceptions, and I'm wrong far more often than I'm right, so please don't let me stop you if your heart is set on it!