Two researchers at UC Berkeley have investigated programming languages adoption from a sociological perspective. This article summarizes their research and includes an interview with the authors.

Leo Meyerovich and Ari Rabkin, two researchers at UC Berkeley, have tried to answer the question “Why some programming languages succeed and other fail?”, arguing for a sociologically-based programming language theory dubbed Socio-PLT. With their paper Socio-PLT: Principles for Programming Language Adoption (PDF), the authors want to raise awareness in the software community that programming language adoption should not be simply seen as a marketing hurdle, but rather be based on scientific social theories, presenting an agenda for further investigation on ways of language adoption.

Meyerovich and Rabkin base their study on a survey conducted on thousands of BS in Computer Science participants to an online Berkeley course, another online survey on programming languages spanning two years and having over 13,000 respondents coming from Slashdot, Hacker News and Reddit, and the SourceForge repository data including over 300,000 projects. While the researchers do not claim they have yet the scientific answer to the original question on language success, since they are still sifting through a large amount of collected data trying to make more sense of it, they do have a number of questions and hypotheses meant to open the way to further research on language design and adoption especially from a sociologically-based approach. Following is a selection of questions:

Question 3. What actually convinces programmers to adopt a language? Question 5. What data should language designers track? Question 9. How can we exploit social networks to persuade language implementers and programmers to adopt best practices? Question 11. How many languages do programmers strongly and weakly know? Is there a notion of linguistic saturation that limits reasonable expectations of programmers? Question 16. How adverse are programmers to longer compilation or interpreter startup times? How willing are they to trade time for improved error checking? Question 21. How do the values of a language community change over time? For instance, do designers become more or less performance-focused as languages become popular? More or less focused on ease of implementation?

Also, some of the hypotheses:

Hypothesis 2. Both programming language designers and programmers incorrectly perceive the performance of languages and features in practice Hypothesis 3. Developer demographics inﬂuence technical analysis. Hypothesis 4. Programmers will abandon a language if it is not updated to address use cases that are facilitated by its competitors. Hypothesis 8. A signiﬁcant percentage of professional programmers are aware of functional and parallel programming but do not use them. Knowledge is not the adoption barrier Hypothesis 10. Users are more likely to adopt an embedded DSL than a non-embedded DSL and the harmony of the DSL with the embedding environment further increases the likelihood of adoption. Hypothesis 12. Implementing an input-output library is a good way to test the expressive power and functionality of a language. Hypothesis 13. Open-source code bases are often representative of the complete universe of users. Hypothesis 15. Many programming languages features such as modularity mechanisms are tied to their social use.

InfoQ has contacted Rabkin and Meyerovich to find out more on what they think about language adoption.

InfoQ: What are some of the main points you have taken from this study? What does it tell us?

AR: So the first thing I should say is that this study isn't finished, and indeed is barely started. We're collected a lot of data but are only at the beginning of analyzing it.

I can give you some conventional wisdom to answer your questions, but this is based primarily on my personal unscientific understanding, not on hard data.

The big thing that Leo (my colleague) and I are interested in is whether language designers and the programming language research community have an accurate understanding of what users want. For instance, it's conventional wisdom in many quarters that static types are good for catching mistakes and that they improve modularity.

When you survey programmers, they don't seem to agree. The ones who like static types, more often say they like them for the way they document code and the way they allow refactoring. People require unit tests for debugging and that obviates types in many cases.

We think there will be a bunch more cases like that, where the programmers often care about slightly different things than the designers. Finding and documenting these I think is likely to be our big contribution.

LM: Maybe at an even higher level: I want to "socially optimize" programming language research and design.

A lot of our work has been finding social theories, polling programmers, and interviewing language designers to see how we should think about language use and where there are disconnects. We've seen evidence that suggests basic PL concerns such as DSLs, program modularity, type safety, and language/API/program evolution have (socially) broken foundations.

Second, I want to create a new class of languages that exploit social phenomena. For example, can programs automatically get faster and safer as more people use them? If code and execution traces are shared across the internet, can we turn 2-20 minutes of Googling about API usage into 1-2 minutes of smarter code completion? Before blindly going off, we've been examining how sociologists think about technology. The more we look, the more we get excited :)

InfoQ: What is the social relationship between programmers and programming languages?

AR: Part of what's neat about the topic is that the social relationship between users and languages isn't one-way. Programmers often build libraries and even create special-purpose tools for modifying the language they're given. For most popular languages, there's a whole ecosystem and this can really modify what the language feels like. For instance, the numpy library makes Python a reasonable language choice for high-performance scientific codes, which the language would otherwise be completely unsuitable for. Ruby wouldn't be nearly as interesting a language if it weren't for all the tools that have been developed for managing big complicated applications -- especially web applications.

What's particularly interesting is that the language design can shape the libraries and tools in subtle ways. You can easily do things like mock out code for testing in a dynamic language, when it would be much harder, say, in Java. So the relationship between the

language-as-designed and its programmers is an interesting one.

LM: That's an open-ended question as there are many people and processes involved.

For example, we should also consider language designers and who funds them. Government research funding (NSF/DOD/DOE) has become weak, so we're seeing more industrial influence (Microsoft/Intel/...). This, in turn, influences the new language features being designed.

The relationships within an individual language are also funny. The web shrunk communication boundaries between language designers and programmers. This is good, but (non-representative) early adopters and vocal minorities can easily misrepresent what's really going on. Luckily, such misrepresentation is currently tempered by the likelihood of the language designer being an expert in the niche being targeted. A practicing language designer, for the same reason, is probably untrained: they have to reinvent the wheel for basic design and performance considerations, and invariably get important things wrong. As seen with languages like JavaScript, security / semantics / performance experts have to perform expensive surgery after-the-fact, and with only limited success. It's a wacky world :)

As hinted above, I expect the relationships to change. For example, as programmers share more with each other, I expect languages and tools will be able to do more. In 10 years, hopefully we'll look back at ideas like searching Google for open source code or posting a question on StackOverflow as primitive. This will involve both social and technical changes to programming.

InfoQ: Why some languages are a lot more popular than others?

AR: Here I really don't have data, so this is more opinion than fact.

Languages seem to catch on when either there's some killer app or else when there's heavy industrial backing. For examples of the first case: Ruby took off because of Rails. C took off because for a long time it was the easiest high-level language to use for modifying Unix

or writing Unix tools.

For examples of the second: Java and C#. There's nothing particularly novel about either language -- they're about halfway between Smalltalk and C/C++. The big thing they had was really good documentation, comprehensive libraries, and cross-platform implementation. All that stuff is expensive to develop and wouldn't have happened without strategic decisions by Sun and MSFT to build up a "modern" language platform.

LM: Yes, though there is no guarantee of success. Basic ecological reasoning tells you it shouldn't be that easy: each new language is a drain on society's resources (e.g., programmers take time to become fluent and build libraries).

Scala is a great example of a language designed with adoption in mind. The political landscape of Java is uncertain, so Scala is pitched as a Java replacement (historical linguists would approve!). While Scala is really a general language, its team paid a lot of attention to early adopters in the finance and startup worlds. Both finance and startups are niches and bring prestige. Furthermore, I suspect these programmers will not struggle to learn the language, so switching costs are low. Likewise, supporting old Java codes -- both APIs and legacy code -- is really compelling to them. While the language has academically advanced features, it is not pedantic about them: it picks its ideological battles where most programmers wouldn't even notice. I can go on :)

InfoQ: Why some languages last a lot longer than others?

AR: There's an old joke that "I don't know what language engineers will be using in 15 years, but they'll call it FORTRAN." Old languages like Basic, Fortran, etc have evolved almost beyond recognition since being created, so it's not always easy to say "how long the language lasts." Even C has been regularly improved since K+R 1.0.

Maybe a way to look at it is to ask when people stop using a language for new projects. Sadly there isn't yet a lot of data on this – I don't know how to draw a curve for "when did Modula-2 usage peak". So I don't yet have a very good answer here.