BLOG@CACM Python Is Now the Most Popular Introductory Teaching Language at Top ­U.S. ­Universities

Summary

At the time of writing (July 2014), Python is currently the most popular language for teaching introductory computer science courses at top-ranked U.S. departments.

Specifically, eight of the top 10 CS departments (80%), and 27 of the top 39 (69%), teach Python in introductory CS0 or CS1 courses.

Motivation

Python has been getting more popular as the first language to teach novices. Three years ago, Mark Guzdial blogged about the rise of Python as a teaching language and predictions for future teaching languages. Top-ranked CS departments at MIT and UC Berkeley recently switched their introductory courses to Python. The largest three MOOC providers (edX, Coursera, and Udacity) all offer introductory programming courses in Python. And professors in diverse subfields (e.g., Lorena Barba) are now advocating for teaching Python to novices.

On a personal note, the educational tool I've been developing for the past four years, Online Python Tutor (www.pythontutor.com), has seen its usage skyrocket due to the growing popularity of the language.

However, despite numerous anecdotes, I haven't seen any recent numbers quantifying how popular Python is for teaching programming. So I tried to find out by documenting how many of the top-ranked U.S. computer science departments use it to teach their introductory courses. Along the way, I also ended up cataloging occurrences of other popular teaching languages.

Why do this analysis? Because the choice of what language to teach first reflects the pedagogical philosophy of each department and influences many students' first impressions of computer science. The languages chosen by top U.S. departments could indicate broader trends in computer science education, since those are often trendsetters for the rest of the educational community.

Methodology

I considered the top 39 departments, as ranked by U.S. News in 2014. (This link may become outdated as U.S. News issues new rankings.) Why did I stop at 39? Because there was an 8-way tie for 40, so the differentiating signal is weak by that point. No ranking scheme is perfect, and rankings change over time. In particular, this ranking considers only American Ph.D.-granting universities. And my cut-off of 39 excludes many universities that I respect, including my own. We could debate endlessly about which schools to include, and how to rank them. But I had to choose some ranking scheme and cut-off, and this felt reasonable.

For each university, I looked for CS0 and CS1 courses in the CS, CSE, or EECS department, whichever is the home to computer science. I define a CS1 course as the first required course for CS majors, and a CS0 course as introductory programming (not just basic computer literacy) for either non-majors or those who need extra preparation before CS1. Neither CS0 nor CS1 should have any CS prerequisites, since otherwise they can't be the first CS class that students take. (Specifically, CS0 should not be a prerequisite for CS1.) I didn't count mini-courses, special accelerated CS1+ courses, higher-numbered electives, or programming courses offered by other departments. Note that there isn't a strict official definition for CS0 or CS1, so I had to exercise some personal judgment. I tried to abide as much as possible by each department's official course descriptions and prerequisite tables.

Although some instructors consider introductory programming to mean CS1, for this analysis I purposely grouped CS0 and CS1 together since those courses are where many students, regardless of major or eventual career, study their first programming language in college. Given the unfortunate state of K-12 computer science education in the U.S., these college courses are where many students, especially women and underrepresented minorities, likely see their first programming language. And even if students arrive with prior programming experience, their first officially-sanctioned exposure in college is still influential. I want to give CS0 equal standing with CS1 because as programming becomes more pervasive across science, engineering, and even non-STEM disciplines, more and more non-majors are taking CS0. Also, a non-trivial minority of CS majors take CS0 before CS1, especially students without much prior programming exposure. Thus, CS0 courses serve as important gateways into computing for students who might otherwise never know that they had such passions.

Results

The chart below shows how many of the top 39 departments teach either CS0 or CS1 using the seven most common languages. The bar heights add up to more than 39 since many schools offer both CS0 and CS1.

Discussion

Even though I'm a big Python proponent, I'll try my best to present just the facts without editorializing.

Python is the most popular language in this list. It narrowly surpassed Java, which has been the dominant introductory teaching language over the past decade. Some schools have fully switched over to Python, while others take a hybrid approach, offering Python in CS0 and keeping Java in CS1. However, at the high school level, Java is still used in the AP (Advanced Placement) curriculum.

The next most popular language is MATLAB, which is often used in CS0 courses to introduce scientists and engineers to programming. C and C++ are next on the list, but they've been firmly supplanted by Java over the past decade. The high school AP curriculum even replaced C++ with Java in 2003. Also, some introductory courses that use C (such as Harvard's CS50) teach it alongside other languages rather than having it be the sole language.

Scheme-based languages are popular amongst a devoted subset of educators and programming language researchers. Most notably, two (somewhat rival) philosophical camps -- SICP and HtDP -- have created acclaimed textbooks and courses around the Scheme ecosystem. But in recent years, Scheme has been phased out in favor of Python at places such as MIT and UC Berkeley. It's being used in only five schools in this list.

Scratch is the only visual, blocks-based language that made this list. It's one of the most popular languages of this genre, which include related projects such as Alice, App Inventor, Etoys, Kodu, StarLogo, and TouchDevelop. The creators of these sorts of languages focus mostly on K-12 education, which might explain why they haven't gotten as much adoption at the university level.

Finally, note that three interesting sets of languages didn't make it on this chart because they were used in either zero or one university in our sample:

Statically-typed functional languages such as Haskell and ML variants, which are popular amongst PL researchers

Dynamically-typed languages such as JavaScript, Ruby, and PHP, which are popular amongst Web programmers

Widely-used industry languages that are commonly associated with specific proprietary platforms, such as Objective-C (Apple) and C#/Visual Basic (Microsoft)

If we revisit this analysis in five, ten, or twenty years, which language will be in the lead then?

Appendix: Full Course Listing

Here is the raw data that generated the main chart in this article. I manually collected all of this data in July 2014 by browsing through department and course websites, mostly from the 2013-2014 academic year. (This table is a bit hard to read since I could not add the proper lines and spacing using ACM's blogging software.)

I have not linked to individual course Web pages, since those links often refer to past offerings and quickly grow stale. But course names should remain stable for the foreseeable future.

Note that many schools offer both CS0 and CS1 courses (often multiple versions of each), and that some courses are taught using multiple programming languages.

Please email any questions or correction requests to philip@pgbovine.net

Rank University Python Java MATLAB C C++ Scheme Scratch 1 Carnegie Mellon 15-110, 15-112 15-122 1 MIT 6.00, 6.01 1 Stanford CS106A 1 UC Berkeley CS61A CS10 5 UIUC CS103 CS125 CS101 CS101 6 Cornell CS1110 CS1112 6 U.Washington CSE140 CSE142 8 Princeton COS126 9 Georgia Tech CS1301, CS1315 CS1371 9 UT Austin CS303E CS312 11 Caltech CS1 11 U. Wisconsin Madison CS302 CS202 13 UCLA CS31 13 U. Michigan EECS182 EECS182, 183 15 Columbia ENGI E1006 COMS W1004-1 COMS W1005-1 15 UCSD CSE8A CSE7 CSE5A 15 U. Maryland - College Park CMSC 198C,D,E CMSC131 18 Harvard CS50 19 U. Penn CIS 110, 120 20 Brown CSCI0931 CSCI0150 CSCI0040 0170,0190 0080 20 Purdue CS17700 CS18000 CS15900 CS15800, CS15900 20 Rice COMP 140 20 USC CSCI 101 CSCI 103 CSCI 103 20 Yale CPSC112 CPSC201 25 Duke CompSci 101 25 UMass Amherst CMPSCI 119 CMPSCI 121 25 UNC Chapel Hill Comp 110 Comp 401 28 Johns Hopkins EN600.107 29 NYU CSCI-UA.2 CSCI-UA.0101 29 Penn State CMPSC 121 29 UC Irvine CSE 41 29 U. Minnesota CSci 1001,1901 CSci 1103 CSci 1901 29 U. Virginia CS 1120 CS 1110, 1111 34 Northwestern EECS110 EECS110 EECS111 34 Ohio State CSE201 CSE 205 CSE 202 34 Rutgers CS111 34 UC Davis ECS 10 34 UC Santa Barbara CS8 34 U. Chicago CMSC 12100, 12200, 12300 CMSC 12100, 12200, 12300 CMSC 12100, 12200, 12300 CMSC 10500, 11500 Total 27 22 8 7 6 5 3

Here are the edits made to this table after its initial publication on 2014-07-07, thanks to reader-submitted corrections:

Added Columbia ENGI E1006 (Python), since a current student emailed me about it. Here's an excerpt: "Very few people are required to take the Matlab course [1005] (mostly just a few specific engineering majors), whereas all CS majors, and all engineers in general have to take 1006. This python course was only added in the last two years, so it's certainly a sign of Columbia following the national trend of Python becoming more popular as an intro course!"

Added UNC Chapel Hill Comp 110 (Python), since an instructor emailed me about it.

Added Northwestern EECS 111 (Scheme), since a student emailed to me: "CS1 is called EECS111, and it is taught in Meta (A scheme-like language with .NET binding built by a professor here), Scheme, or Racket, depending on the professors. I'd say its fair to say that CS1 is taught in Scheme, as all the languages, specially the features we use, are almost Scheme."

Added UMass CMPSCI 121 (Java), since a professor emailed me about it.

Added Georgia Tech CS1371 (MATLAB), since a current student emailed me.

Added USC CSCI 103 (C and C++), since a current student emailed me about it.

Added Brown CSCI0190 (Scheme) and CSCI0931 (Python), since a current student emailed me.

Added U. Minnesota CSci 1103 (Java), removed 1901 as Java, and added 1901 as Python since a current student emailed me and said how 1901 was switching over to Python.

Removed U.T. Austin CS105 (Python) and added CS303E (Python), since an alum emailed me.

Added Carnegie-Mellon 15-122 (C), since a professor emailed me.

Added Yale CPSC201 (Scheme), since an alum emailed me.

Comments

Bryant Son

Very interesting analysis. However, I think some of the data elements are misleading. I finished the computer science degree program at University of Texas at Austin, and I took CS105 Introduction to Python class as well. And the CS 105 course is an optional class that not many UT computer science students decide to take it.

Jay Howard

When I was at the University of Texas, CS105 wasn't really an introductory course. A brief google search suggests that it still isn't. Instead, it was sort of "primer" course for specific languages. There was a C++ version, a Java version, a PHP version, etc. The students in these courses usually ranged over the entire spectrum of experience, from freshmen to graduate students.

From what I can tell, the three introductory courses are currently: CS 303E, CS 305J and CS 312. The first two seem to be for non-majors and/or folks with zero prior exposure to programming.

Leon Mar

Hi,

I would like to correct one piece of information in your discussion section. In relation to C#, it is a myopic view that C# is "with specific proprietary platforms". C# for your information is a ISO/ECMA standard language that is widely implemented in Linux/iOS/Android (under the name of Mono project, http://www.mono-project.com/Main_Page).

It is wrong to lump Visual Basic with C# because the former one is proprietary much like Java without conforming to any standard (only vendor standard). VisualBasic.Net has been implemented in Linux/Android/iOS though not as complete as C#. Of course Visual Basic (non-.Net) is both obsolete and proprietory.

John Orcutt

This is quite useful in introducing students to Python. I am UCSD, but at the Scripps Institution of Oceanography department. I teach Python to first year PhD students and interest is great from marine biology to geophysics.

Stephen Lecheler

While I am all for python I take umbrage of using PHD level rankings to determine the top 39 schools in terms of the sample set for intro to programming language. Using the end product of the educational environment to filter the begining of the process is not logical. Some colleges may have superb undergraduate programs, but because they are out of the main stream in terms of not having the tier-1 prestiege of name they are not considered. A better sample set may be to look at the top 3 college programs per state (ex: University of Texas - Austin, Texas A&M - College Station , Texas Tech - Lubbock) to see what is being offered.

Robert Harper

Carnegie Mellon does not use Python for introductory programming for CS majors or majors in related disciplines such as ECE. We do offer a service course for non-majors that is based on Python, but it has nothing to do with core CS curriculum, which is based on ML and (a safe dialect of) C. Even our introductory data structures and algorithms course is now based on functional programming in ML, with emphasis on parallelism and persistent data structures.

David Andersen

I'll respectfully add a bit of nuance to Bob's comment: Python *is* used in the introductory course for CS and ECE majors (15-112), but students can skip this course by demonstrating prior programming experience. The majority (perhaps 75%, from memory) of first-year CS students do skip it, but a significant fraction take it, and a larger percent of ECE students do. 15-112 is a "skippable" prerequisite for the ML and safe-C introductory courses. Bob is correct that the median first-year CS undergraduate with extensive prior programming experience will jump directly into the more formally-minded core courses for imperative and functional programming.

Michael Vaganek

This is an unfortunate trend. Python has its uses and its place, but as an introduction to CS and programming, it's not much better than BASIC was, way back in the day.

While Java remains widely taught, it too has drawbacks. Neither langauge could be considered a general purpose programming language. Each has a paradigm, and each can be better appreciated after a grounding in general purpose programming.

Yes, folks, good ol' C is an ideal language for introduction to programming. It's a general purpose language, is actually a standard (unlike C#, despite partisan claims to the contrary), and is not locked into any one idiom or approach to programming.

No matter that it may not be the best vocational preparation. Languages, tools, and platforms come and go. But a good foundation in the fundamentals prepares someone to continue learing as things evolve in the larger world.

'Higher level' languages like Java, Python, Ruby, C#, and the like hide too much. Good to have the convenience and support of those languages later, after one can "do it by hand".

At it's core, programming is about thinking. Too much 'magic' too early does a disservice.

Wayne Brehaut

I'm not sure of the correctness of your statement that "Python...narrowly surpassed Java". Python seems to have scored about 27 and Java about 22, so though a difference of 5 may seem "narrow" it is, in fact, almost 23% greater, if this sample size makes such a comparison relevant? Certainly any betting person would consider that significant and hardly "narrow".

What we need is someone to resurrect the more comprehensive annual survey that I believe Jean E. Sammet used to do up to a few decades ago--but now make it self-reporting--say on the SIGCSE site?

K.R. Chowdhary

The high level languages are designed for two purposes: 1) to do the job more efficiently, accurately, and with less cost, 2) through the language the architecture is visible to a programmer. The point (1) is having commercial aspect, while the (2) may have commercial aspect in along run, but more important are: it helps the programmers how things work internally in a computer, how the memory arranged, how it can be acquired and released, where to allocate (in cpu or in RAM), how the data structures are related to physical memory, how low level operations are actually performed, and how an algorithm actually works, and so on.

The feature (1) is prevalent in C++, Java, Fortran, Perl, Python, etc, while feature (2) is prevalent in C in a most beautiful way, and no where else to that extent. Hence, in my opinion, the language C will continue to rule in the engineering and technical fields, and also, where light programs with speed are the prime criteria. Hence, this langauge should be taught as first course to all engineers as one language, preferably, the first. In other words, replacing C by some thing else is as difficult as replacing trigonometry or calculus !!

View More Comments