



Batch processing

Since the very beginning of computing, the amount of precious hardware resources spent on adapting to the seeming frivolity of human-operator interests has been a point of contention. Computers are often bought to do a particular job, and often that job is replacing humans. In which case: who cares if they're easy to "program"? In many cases the point is to program them once (or "rarely") and then run them non-stop with humans out of the loop. This attitude is not rare. Well into the mid-20th century IBM was still selling

Since the very beginning of computing, the amount of precious hardware resources spent on adapting to the seeming frivolity of human-operator interests has been a point of contention. Computers are often bought to do a particular job, and often that job is replacing humans. In which case: who cares if they're easy to "program"? In many cases the point is to program them once (or "rarely") and then run them non-stop with humans out of the loop. This attitude is not rare. Well into the mid-20th century IBM was still selling unit record machines with fixed functions for bulk data processing: machines that could be altered at best via swapping plugboards . Despite knowing perfectly well how to alter machine function by card-coded instructions since early 19th century loom-control systems , it just wasn't important to change programs often. For more sophisticated calculations, analog computers held out for quite a long time also. The devices we now think of as "computers" are the offspring only of post-war research devices called "stored-program digital computers" ; by the time they came into general use there had been nearly 60 years of industrial, government and military use of various electromechanical computing machines. The stored-program digital computer just got cheaper faster than its competitors.





This "expensive hardware, rarely reprogrammed" culture of computing persisted in many environments. It is typically called "batch processing", and it's still alive and well today. To some extent the lower (stable) levels of all computing systems are like this. Early use of the term refers to mainframe computing, running assembly code transcribed into



Minor inefficiencies can be a big deal: the difference between a program that runs for a month rather than a week, or drains your battery in one hour rather than six, can be a big deal. So early high level languages -- so called



It's important to realize that this culture is not wrong. The existence and ubiquity of computers is, at some level, all about costs: if it were cheap enough to do all these calculations ourselves, we'd use humans. We use machines because they're faster, cheaper. Accounting for costs matters.

This "expensive hardware, rarely reprogrammed" culture of computing persisted in many environments. It is typically called "batch processing", and it's still alive and well today. To some extent the lower (stable) levels of all computing systems are like this. Early use of the term refers to mainframe computing, running assembly code transcribed into punched cards by keypunch operators ; but "batch processing" can just as easily be used to describe the lower levels of all our operating systems, or the fleets of unattended servers humming away in AWS here in 2014. In this culture of computing, machine costs are measured and paying for abstractions requires justification, because programs run for a long time and attempt to use the machine's resources to the greatest extent possible.Minor inefficiencies can be a big deal: the difference between a program that runs for a month rather than a week, or drains your battery in one hour rather than six, can be a big deal. So early high level languages -- so called autocodes -- were viewed with skepticism for a long time; they introduce abstraction penalties, so they run slower than assembly. This skepticism persists in the culture of "systems languages" (the space, incidentally, that Rust targets), and what it really comes down to is "take computing hardware seriously". Count bytes. Count cycles. Count cache misses. Count context switches. Count I/O bandwidth. And so forth.It's important to realize that this culture is not wrong. The existence and ubiquity of computers is, at some level, all about costs: if it were cheap enough to do all these calculations ourselves, we'd use humans. We use machines because they're faster, cheaper. Accounting for costs matters.





interactive computing

At the opposite end of the cultural spectrum we have

At the opposite end of the cultural spectrum we have "interactive computing" , which took shape out of research scientists' experiences "figuring" on computers, interactively, and to some extent out of the imagination of people directing research programs such as Bush Licklider and Englebart . This culture of computing values human time over computer time (the latter gets cheaper much faster than the former anyways), and focuses its attention on those computing tasks that intrinsically have a human in the loop, because the human is doing exploratory programming : using a computer as a thinking-tool, like a notebook or a whiteboard, to figure out what they want to calculate in the first place. Insofar as interactive computing involves writing linear programs, they are often written in very "human-friendly" languages, with latent types and reflective , dynamic, interpreted semantics. Often quite slowly, compared to batch processing languages. Nonetheless, human productivity can be quite high in these languages.





The form "interaction" takes in this culture has varied somewhat over time; it was a novel and in many cases unaffortable luxury to leave a computer waiting on a human's input, at least until viable



This culture throws around science-fiction terms like



It's important to realize that this culture is also not wrong. Human time is valuable. Idle computers are pointless, as are computers busy doing the wrong thing. Keeping humans in (some) processing loops can be very important.

The form "interaction" takes in this culture has varied somewhat over time; it was a novel and in many cases unaffortable luxury to leave a computer waiting on a human's input, at least until viable timesharing systems were invented. The most long-lived, venerated "interactive computing" mechanisms take the form of command-line-interfaces (CLIs) or read-eval-print loops (REPLs) , operated via teletype (TTY) ; lots of these still exist, and much of this post is about REPLs. Later on, "interactive computing" culture gave rise to interactive video displays, hypertext systems, WIMP GUI s, jazz hands multitouch and all the modern delights we now take for granted.This culture throws around science-fiction terms like "human augmentation" or "intelligence amplification" , but what it comes down to is "take human cognitive needs seriously". If humans forget important details you can infer, fill them in. If humans need diagrams to understand things, draw diagrams. If humans are bad at keeping facts consistent, correct them. If humans draw a blank and need prompting, prompt them with lists of choices. If humans want to not pay attention to some detail, automate it. And so forth.It's important to realize that this culture is also not wrong. Human time is valuable. Idle computers are pointless, as are computers busy doing the wrong thing. Keeping humans in (some) processing loops can be very important.





Hybridity: Ousterhout's Dichotomy

The cultures of batch and interactive computing are the ends of a spectrum, and most real computing systems embody a mixture of the two. Most batch systems require some degree of interactive reconfiguration; and most interactive systems have some very stable parts of their logic that benefit from going as fast as possible.



The mixing of such concerns is often accomplished by a particular non-linguistic or extra-linguistic pattern which, for the sake of this writing, I'll call



That is, the tension between human- and machine-oriented computing is often not mediated by a single language, but two specialized languages with some (kinda gruesome) bridging device between the two that nobody much wants to think about, and everyone wishes would go away.



Thus we have

The cultures of batch and interactive computing are the ends of a spectrum, and most real computing systems embody a mixture of the two. Most batch systems require some degree of interactive reconfiguration; and most interactive systems have some very stable parts of their logic that benefit from going as fast as possible.The mixing of such concerns is often accomplished by a particular non-linguistic or extra-linguistic pattern which, for the sake of this writing, I'll call Ousterhout's Dichotomy : systems composed of exactly two languages, one "systems" language and one "script" language, with the former coding the inner loops and the latter coding the outer loops, and hooked up to the REPL to talk to the human operator.That is, the tension between human- and machine-oriented computing is often not mediated by a single language, but two specialized languages with some (kinda gruesome) bridging device between the two that nobody much wants to think about, and everyone wishes would go away.Thus we have JCL wrapped around assembler or COBOL Bourne shell wrapped around C Matlab wrapped around Fortran , or (as you read this) JavaScript wrapped around C++ . The outer language is slow, forgiving, easy to change, human-interaction friendly (often ships with a REPL) and often a bit under-powered; the inner language is powerful, fast, fussy, precise, machine-oriented and a pain to make changes to.





This pattern emerges over and over and over again in computing. It's ubiquitous. Sometimes it's relatively harmless, a reflection of a language designer's area of expertise or interest. Many well-engineered systems and script languages get designed in conscious awareness and anticipation of Ousterhout's Dichotomy: they're designed to be attached to a "twin", on the other side of a glue layer.



Other times, sadly, Ousterhout's Dichotomy is a reflection of the absence of language design from a system; and in its place a sort of accretion of awkward hacks and idiosyncratic language behaviour by developers who simply don't know or care about language engineering as such, who have more important things to do. This is quite understandable: after all, most people buy computers to do non-language-engineering things, so if they happen to put together a low-quality script language to automate whatever task they had in mind -- science experiments, business accounting, teaching, engineering, communication, etc. -- one can hardly blame them.



It's just a bit sad for language nerds, when this happens, because such systems would often be better off just reusing someone else's better-engineered language. There are

This pattern emerges over and over and over again in computing. It's ubiquitous. Sometimes it's relatively harmless, a reflection of a language designer's area of expertise or interest. Many well-engineered systems and script languages get designed in conscious awareness and anticipation of Ousterhout's Dichotomy: they're designed to be attached to a "twin", on the other side of a glue layer.Other times, sadly, Ousterhout's Dichotomy is a reflection of the absence of language design from a system; and in its place a sort of accretion of awkward hacks and idiosyncratic language behaviour by developers who simply don't know or care about language engineering as such, who have more important things to do. This is quite understandable: after all, most people buy computers to do non-language-engineering things, so if they happen to put together a low-quality script language to automate whatever task they had in mind -- science experiments, business accounting, teaching, engineering, communication, etc. -- one can hardly blame them.It's just a bit sad for language nerds, when this happens, because such systems would often be better off just reusing someone else's better-engineered language. There are lots





Scientific computing

For the remainder of this post I'm going to leave all those other things people might want to do with computers behind, and just zoom in on scientific computing.



Scientific computing, like any part of computing, contains many instances of Ousterhout's Dichotomy. Fire up any scientific package from



The "interactive" language you're typing in will probably be a bit underpowered, possibly a little "unique", but it will have certain "curiously fast" core functions implemented by calling into code written in a corresponding systems or numerical language: usually C, C++, Fortran or assembler. Mastery of the system involves, at least partly, learning to arrange your code to delegate heavy lifting to the underlying systems code.

For the remainder of this post I'm going to leave all those other things people might want to do with computers behind, and just zoom in on scientific computing.Scientific computing, like any part of computing, contains many instances of Ousterhout's Dichotomy. Fire up any scientific package from R to IDL and you'll find yourself writing script code at a REPL, with certain amounts of tab completion, interactive documentation, perhaps a GUI for data entry and another for graphing.The "interactive" language you're typing in will probably be a bit underpowered, possibly a little "unique", but it will have certain "curiously fast" core functions implemented by calling into code written in a corresponding systems or numerical language: usually C, C++, Fortran or assembler. Mastery of the system involves, at least partly, learning to arrange your code to delegate heavy lifting to the underlying systems code.





There is a further split in scientific computing worth noting, though I won't delve too deep into it here; I'll return to it in the second post on Julia. There is a division between



For now it suffices to mention the split in passing, and note that if some scientific tool calls itself a

There is a further split in scientific computing worth noting, though I won't delve too deep into it here; I'll return to it in the second post on Julia. There is a division between "numerical" and "symbolic" scientific systems. The difference has to do with whether the tool is specialized to working with definite (numerical) data, or indefinite (symbolic) expressions, and it turns out that this split has given rise to quite radically different programming languages at the interaction layer of such tools, over the course of computing history. The symbolic systems typically (though not always) have much better-engineered languages. For reasons we'll get to in the next post.For now it suffices to mention the split in passing, and note that if some scientific tool calls itself a "computer algebra system" (CAS) it is at least partly symbolic; whereas if a tool calls itself a "numerical analysis" system it is probably mostly non-symbolic, dealing with large volumes of concrete values. Mathematica for example is a CAS (with numerical capabilities), as are Maple Axiom , and Maxima . R, Matlab, Octave , IDL and such are mostly numerical.





Scientific computing also has somewhat unique interaction needs. Partly because the subject is often quite intellectually challenging, partly because science is as much about communicating research to other scientists as it is determining answers for oneself. Many interactive scientific systems put a great emphasis on high-quality typesetting (usually in



Thus a particular variant of the REPL has emerged in scientific computing called the notebook interface, popularized (I think?) by Mathematica and Matlab, in which typeset-quality symbolic mathematics, programming language expressions, natural-language narrative and graphs are intermixed in a single continuous, editable, "live" document that can have any portion recalculated on demand. A sort of "scientific word processor".

Scientific computing also has somewhat unique interaction needs. Partly because the subject is often quite intellectually challenging, partly because science is as much about communicating research to other scientists as it is determining answers for oneself. Many interactive scientific systems put a great emphasis on high-quality typesetting (usually in LaTeX ), graphing, data presentation and so forth. Both for viewing and reviewing one's own work, while one thinks, and conveying that work to others in convincing publication / presentation forms.Thus a particular variant of the REPL has emerged in scientific computing called the notebook interface, popularized (I think?) by Mathematica and Matlab, in which typeset-quality symbolic mathematics, programming language expressions, natural-language narrative and graphs are intermixed in a single continuous, editable, "live" document that can have any portion recalculated on demand. A sort of "scientific word processor".





The Modern Situtation #1: Python

We now come to the modern situation, and why at least the first few tools I mentioned are of interest: SymPy/NumPy,



The plain fact is: Python is

We now come to the modern situation, and why at least the first few tools I mentioned are of interest: SciPy IPython , and Sage . You'll notice these are all Python packages. Much of this post talked about the background to Ousterhout's Dichotomy, and Python is a prime example of a language designed around full acceptance of Ousterhout's Dichotomy. It's a (relatively) well-engineered, very human-friendly language with a great community, zillions of packages and a stable implementation (if you can ignore the 2-to-3 split). It started as a teaching language and evolved with a very clear emphasis on comprehensibility and human factors over performance.The plain fact is: Python is kinda slow . It's a single-threaded, globally-locking bytecode interpreter that is still tinkering around with its first, seldom-used JIT . But the Python community largely doesn't mind this, because they emphasize practical hybridization with C, C++, Fortran and assembler code whenever necessary for performance. This avenue is getting increasingly easy with inner-loop solutions such as Cython and Numba





The SymPy/

The SciPy NumPy ecosystem is a set of libraries that straddles Ousterhout's Dichotomy relatively well and plays to Python's strengths; it appears to be gaining ground in a lot of scientific computing spaces. Compared to the "accreted" languages inside many science systems, Python performs "about as well" but is a much better-engineered language. It's also free software and universally available, so plays well to the internet's strength of "promoting free stuff". Roughly, NumPy and SciPy are numerical-analysis portion of the package, and SymPy is an accompanying CAS for symbolic values. They are quite full-featured and can be used online. SymPy for example has a handy instance running on live.sympy.org





is a scientific REPL, Python-flavoured but relatively language-agnostic, that supports a very pleasant and high quality scientific

IPython is a scientific REPL, Python-flavoured but relatively language-agnostic, that supports a very pleasant and high quality scientific notebook interface : many desktop GUI toolkits are supported, along with a loopback HTML interface you can point a local web browser at. This notebook interface supports display of inline LaTeX, command and calculation output, code, prose, plotting, data tables, hyperlinks, embedded images and so forth, all in multiple languages and with calculations run in the background, in a separate server process. Very simple escape mechanisms exist for embedding non-python code such as R or Octave. Notebooks can be shared and edited online in browsers, versioned and published like normal source code artifacts.





Finally,



I think these systems are a big deal because, at least in the category of tools that accept Ousterhout's Dichotomy, they seem to be about as good a set of hybrid systems as we've managed to get so far. The Python language is very human-friendly, the systems-level languages and libraries that it binds to are well enough supported to provide adequate speed for many tasks, the environments seem as rich as any interactive scientific computing systems to date, and (crucially) they're free, open source, universally available, easily shared and publication-friendly. So I'm enjoying them, and somewhat hopeful that they take over this space.

Finally, Sage is a ... different, somewhat strange Python package that functions as a sort of meta-CAS: it has an internal representation of mathematical expressions, which it knows how to translate to and from the representations of many other mathematics packages -- some in Python, but many not -- and so compensates, to some extent, for the phenomenon I mentioned above, wherein not-very-good script languages accrete around core numerical libraries in systems languages. In Sage, the not-very-good REPL languages of dozens of CASs and numerical analysis systems are "upgraded" to modern python and made interoperable with one another, their expressions translated on the fly to one another's idiosyncrasies. Sage also provides a notebook interface, runs a hosted cloud service, and ships with SciPy/NumPy/SymPy as sub-libraries.I think these systems are a big deal because, at least in the category of tools that accept Ousterhout's Dichotomy, they seem to be about as good a set of hybrid systems as we've managed to get so far. The Python language is very human-friendly, the systems-level languages and libraries that it binds to are well enough supported to provide adequate speed for many tasks, the environments seem as rich as any interactive scientific computing systems to date, and (crucially) they're free, open source, universally available, easily shared and publication-friendly. So I'm enjoying them, and somewhat hopeful that they take over this space.

But, But, I Hate Python!

This is the first half of a two-part blog post that is motivated-by and mostly "about" two software ecosystems I've been poking around at recently. The first (and subject of this post) is I'll roughly call "interactive scientific Python", which includes primarily SciPy IPython , and Sage ; the second (subject of next post) is a new interactive scientific language Julia , which has a lot of complicated and subtle relationships with the former. If you hate Python, skip this post because it is really just background for why I care about the fate of a few Python packages.I've been trying to write something coherent and useful and not infinitely-long about these systems for over a month, and failing every time I get started writing because there's just too much history and too many angles to approach them from. So taking a cue from @flipzagging I will attempt to start "in the midst of things" and work my way out. More than the past few posts, this one is super long and full of dull backstory, so I will again cut for brevity.The midst of things is, I suppose, the page referred to as the IJulia Preview Notebook ; though since this post's going to omit Julia for now we'll go with the stock Matplotlib IPython Notebook . If you look at that page, it's not really clear what's interesting about it; there are a bunch of code examples and a few plots and basically it looks like another programming tutorial. Dime a dozen. Big deal!To understand why I think it is actually a big deal, I'm going to start by referring back to a previous post in which I mentioned my belief that programming languages are mediating devices, interfaces that try to strike a balance between human needs and computer needs. Implicit in that is the assumption that human and computer needs are equally important, or need mediating. It turns out many people don't think human and computer needs are equally important, and in many cases they have reasonable arguments with which to disagree. In fact there is a sort of spectrum of opinion, the extreme ends of which we can call "batch processing" and "interactive computing" . A lot of language work, and a lot of ... how should I put this ... "inter-language" work, has to do with attempting to bridge this gap.Stay tuned! The Modern Situation #2, Julia, is coming up in the next post.Unfortunately if you really hate Python the next post won't satisfy because Julia keeps pretty close relations with the Python community, including IPython; but I will at least give a little time to the general family of languages that reject (or otherwise ignore) Ousterhout's Dichotomy. Because it's not the only way to reconcile tensions!If you are a fan of some other Ousterhout-Dichotomy-friendly language, sorry! This pair of posts is not about your language. Perhaps another time. I'm tired of writing, and anyone who got this far is surely tired of reading.