Python as a First Language John M. Zelle

Department of Mathematics, Computer Science, and Physics

Wartburg College

Waverly, IA 50677

zelle@wartburg.edu Abstract

Currently, there is little consensus about which programming language is most appropriate for introductory computer science classes. Most schools use a traditional system programming language such as C, C++, Java, or Ada in CS1 and CS2. However, scripting languages such as Tcl, Perl and Python are becoming increasingly popular software development tools. This article discusses the advantages of using a scripting language as the first language in the computer science curriculum. Scripting languages are simpler, safer and more flexible than system languages. Python in particular emerges as a near ideal candidate for a first programming language.

1 Introduction

Some have suggested that the ``language wars'' are over and the (only) reasonable choices are: C, C++, Java, Ada (and maybe Eiffel).([footnote] This seemed to be the consensus of the panel discussion ``Possible Futures for CS2'' at SIGCSE '98.) I believe declaring the wars over is admitting defeat too soon. A time when there is little consensus on a single first language seems the appropriate time to go back to first principles and consider what a first language should be. Along the way, we might discover that some of the best candidates have not even been considered.

In this paper, I argue that very high-level scripting languages such as Python, Perl, Tcl, Rexx, and Visual Basic are better candidates for a first language. One in particular, Python, seems nearly ideal.

2 Criteria for a First Language

2.1 Assumptions about CS1 and CS2

First, the CS1/CS2 sequence is fundamentally about computer programming. While these courses certainly address broader issues of computer science theory and practice, the core of computer science and of these first classes is still problem-solving, design, and programming. Learning to program is very much a hands-on activity, and these classes involve design and programming projects of various sizes in either open- or closed-lab settings.

Second, the programming language per se is not the focus of these classes. While our students often refer to these classes as ``the C++ class'' or ``the Java class,'' the courses are designed to provide an introduction to the field of computer science. The language being used is really a secondary issue. The tendency to think of CS1 as being an introduction to a particular language is a symptom of the complexity of the languages that are often used.

Third, these classes introduce students to the primary paradigms for design and problem solving in use today, namely structured and object-oriented methods implemented in an imperative (statement-oriented) language. While there are strong advocates for other approaches (e.g. using a functional language in the first class), the vast majority of introductory classes tend to follow the more traditional path.

Fourth, and perhaps most controversial, I assume that the goal of CS1 is to engage and educate computer science majors and, perhaps, recruit new ones. Given the current demand for our graduates and the need for all educated individuals to understand information technology, it is a major disservice to consider CS1 as a ``weedout'' class. Programming is hard, but we should strive to make it no harder than it needs to be.

2.2 Implications for Programming Language Choice

The hands-on, experimental nature of the introductory courses also has implications for the choice of language. A language that allows designs to be expressed with minimal overhead encourages experimentation and rewriting. Therefore, the language should be very high-level and flexible, enabling students to quickly and easily experiment with alternative designs. This facilitates thinking about algorithm and design issues rather than low-level implementation details. To the extent possible, the language should also provide safety for experimentation. We should favor languages that guard against mysterious crashes from pointer or array-bounds errors. Students at this level are just learning techniques for tracking down and fixing errors; the language should help them learn, not frustrate them.

It is also important that the language support modern approaches to design involving abstraction, encapsulation and object-oriented techniques. While object-based designs can be implemented in any language, doing so in a language that supports objects is much more intuitive and straightforward. Again, this allows the course to focus on higher-level conceptual issues rather than implementation details.

Finally, there are practical considerations in choosing a language. It is highly desirable that the language be widely available on a variety of platforms. Similarly, a language that is used by practitioners outside of academia is preferable to a language that is for ``teaching only,'' provided it meets the other criteria discussed above. Teaching language X simply because it is a commonly used language should not be an important consideration by itself. Our students will learn and use many languages during their careers. What is important in the first courses is giving them the best possible foundation of core principles and techniques that will allow them to adopt and adapt to the various languages they will confront in the ``real world.''

3 The Python Advantage

3.1 The Case for Scripting Languages

...several recent trends, such as faster machines, better scripting languages, the increasing importance of graphical user interfaces and component architectures, and the growth of the Internet, have greatly increased the applicability of scripting languages. These trends will continue over the next decade with more and more new applications written entirely in scripting languages and system programming languages used primarily for creating components.

Traditionally, computer science programs have emphasized system programming languages over scripting languages. However, scripting languages would seem to offer a number of benefits, particularly for the introductory programming sequence. Scripting languages generally have simpler syntax and semantics than system languages. Because of dynamic typing and interpretation they are very flexible and encourage experimentation. The very high-level nature allows students to build more sophisticated and interesting projects with less implementation effort.

Probably, the lack of interest in scripting languages has stemmed from the perception that they are ``toy'' languages and not suited to general purpose programming. While that may have been true of early scripting languages (e.g. Unix shell scripts), it is certainly not true of modern variants.

3.2 A Little Bit of Python

3.2.1 Python is simple

if (x < 0) cout << "x was negative"; x = -x;

In Python the corresponding code executes as expected, since the indentation itself determines the block:

if x < 0: print "x was negative" x = -x

Python supports the use of functions and classes but does not force it. Simple programs really are simple. For example, consider the ubiquitous ``Hello World'' program in Python:

print "Hello World!"

C++ requires this program to be wrapped in a function and preceded by a preprocessor directive:

#include <iostream.h> int main() { cout << "Hello World!"; }

In Java, the situation is even worse, as all code must be inside of a class:

public class helloWorld { public static void main(String [] args) { System.out.println("Hello World!"); } }

Semantically, Python is also simple. Python is dynamically typed, so there is no need for variable declarations. This reduces the amount of code that students have to write and also eliminates common errors stemming from misunderstanding the subtle distinctions of declaration, definition and use. For example, students in C++ and Java often ``accidently'' redeclare variables where they really only want to use them (typing int count = 0; when they mean count = 0 ). Such mistakes can be hard to track down.

Python has a minimal but complete set of simple control structures: one selection construct ( if-elif-else ), one definite loop ( for ) and one indefinite loop ( while ). Python also has a modern exception handling mechanism similar to that found in C++ and Java. Unlike Java, however, you do not have to understand the exception mechanism to write simple programs. From a pedagogical perspective, Python's for loop is illustrative. It allows a control variable to take on successive values in a sequence. It can be used to iterate through any sequence such as a list (array) or string. For example, the items in a list can be printed as follows:

for item in List: print item

The range operation produces a sequence of numbers in a given range. For example, range(5) produces the list [0,1,2,3,4] . This can be used to provide numerically-controlled loops. The previous code could have been written (less clearly) as:

for i in range(len(List)): print List[i]

for

Python has a simple uniform data model. Variables are always references to heap allocated values (objects). The model is consistent, avoiding the confusion over heap versus automatic variables in C++ or primitive versus object types in Java. Both of these languages require the teaching of multiple allocation models to implement even relatively simple programs.

Similarly, Python has only a single parameter passing mechanism (by value). Parameter passing is simply assignment of actual to formal parameters at call time. Once students understand the simple assignment model, they get parameter passing for free.

3.2.2 Python is safe

3.2.3 Python supports object-oriented programming

One weakness of Python from a systems development perspective is that encapsulation is only enforced through convention. There is no mechanism for specifying that class members are private. Pedagogically, this does not seem to be a major weakness, since it is still possible to teach the principles of data-hiding; it is just not enforced by the language. The language provides an elegant mechanism but keeps it simple, avoiding the complexity of the various ``visibility modes'' that must be discussed in C++ or Java.

The dynamic typing model of Python makes it particularly convenient for discussing container classes in a data structures course. A stack class, for example, can be used to store any type of object. It can be an int stack, a float stack, a string stack, or a mixture of types. This is accomplished without having to introduce generics (templates) or performing dynamic type casting. Figure 1 shows an example definition of a simple bounded stack class.

class Stack: def __init__(self,size): self.data = [None]*size self.size = 0 def push(self,item): self.data[self.size] = item self.size = self.size + 1 def pop(self): self.size = self.size - 1 return self.data[self.size] def is_empty(self): return self.size == 0 def is_full(self): return self.size == len(self.data)

Figure 1: bstack.py --a simple bounded stack.

Python also provides a clean module system that dynamically loads files at run-time similar to Java (minus the cumbersome package organization restrictions). This allows for easy management of modular projects (Laird & Soraiz, 1998a) without the need for the header files and preprocessor directives of C++. The implementation of stack could be used via an import statement:

from bstack import Stack myStack = Stack(100) myStack.push("Hello")

A different implementation can be substituted by simply changing the module name in the import statement.

3.2.4 Python is fun

3.2.5 Python is practical

4 Some Obstacles (Real and Imagined)

4.1 Lack of Compile-Time Checking

Experience teaching languages like Pascal, C++, and Java has convinced me that the supposed advantages of compile-time error checking for neophyte programmers are illusory. First, the vast majority of errors detected by a compiler are quite pedestrian (e.g. the ubiquitous missing ";"). A language like Python eliminates many of these common errors through a simpler syntax. Furthermore, most remaining pure syntax errors will also be reported immediately by the Python interpreter, which analyzes the syntax of the program at load time. Second, there is little advantage in catching more subtle errors (e.g. type incompatibilities) at compile time. A common type error is caused by mismatches between declaration and use. Many of these errors are simply errors in declaration. In a language without declarations, these errors do not occur. When the error is a genuine error in how a type is used, the error will still be caught in a dynamically-typed language. The difference is that it will be caught and diagnosed at run-time. For the types of programs typically written in the first two CS classes, compile-time checking is not much of an advantage. The simplicity of the edit-interpret cycle far outweighs any benefit of finding multiple errors at compile time.

Compile-time checking can actually be detrimental for some students in a couple of ways. First, it is demoralizing. Students must have a complete syntactically correct program before they get any results. Compiling a program and staring at a screenfull of nagging messages is a dull and exasperating activity. With an interpreted language, at least something happens; the program generates partial output before stopping for an error. The student sees the program in (partial) action and has just one error to fix at a time. Each fix brings more progress. This is a much more encouraging situation.

A second detraction of extensive compile-time checking is that it gives students the illusion of thoroughness. They believe that once a program compiles, it must be pretty much correct. One symptom of this is inadequate testing. Another problem is that it delays detection of design errors. In my experience, it is not uncommon for students to invest considerable time on a program and feel they are almost done because there is just one little bug they can't figure out. Only it turns out the one little bug is still a compile error. Far from being ``almost done,'' the assignment is due in a couple hours and they haven't even gotten the program to compile yet! They have not had a chance to discover major flaws in the logic of their solution. In an interpreted language, finding errors goes hand in hand with testing; the presence of type errors does not necessarily prevent the discovery of more serious design errors.

4.2 Scripting Languages are Too Inefficient

In any case this concern for execution efficiency is misplaced. Beginning programmers are usually not writing production code. Their programs generally do not have hard time constraints and will only be run a few times. The real efficiency concern is the amount of time spent developing the program. This is where scripting languages shine. The fast edit/interpret cycle, absence of declarations, and the very high-level nature of scripting languages make them a perfect tool for this environment.

4.3 Students Need to Learn a System Language

The question becomes, then, which should be the language of the first class(es)? One argument for teaching a language like C++, Java or Ada in the first class is that these languages themselves are so complex and difficult that students need to start learning them right away in order to have enough time to master them. This argument gets it backwards. The first classes should not be about language, but rather about computer science and, fundamentally, design. Trying to teach a complex language inherently detracts from that goal, since students must spend more time mastering the language and, hence, less time mastering other material. This is the reason some educators are looking at Java as a simpler alternative to C++, but even Java is very complex compared to Python.

A more sensible approach is to teach design first, starting students with a simple but powerful language. With a solid grasp of programming and design it is much easier to understand concepts such as static typing, visibility, generics and polymorphism. As a thought experiment, imagine really explaining the meaning of each part of the helloWorld Java program to a novice programmer. Imagine how much easier it would be to explain public , class , static , void , and String [] , to a student who already understands functions, classes, instance variables, class variables, data types and arrays. The more complicated constructs of system languages (e.g. C++ templates, virtual methods, dynamic casts) are really mechanisms for achieving some of the flexibility provided by dynamic languages inside a statically typed framework. Why not teach the concepts first in a language that does not require such complexity to express them? A sophomore-level principles of programming languages class or a systems programming class seems a more appropriate place to tackle the intricacies of a system language.

4.4 Python is Unfamiliar

4.5 Our Students Want Language X

4.6 There Aren't any Textbooks

This article is a wake-up call for those who are considering a switch to Java. If you might be switching languages, carefully consider the motives for your switch. Pedagogical considerations argue for a move away from system languages to scripting languages, and Python is a very good choice. What is needed is a few stout souls to take the plunge with Python and develop the appropriate materials. Perhaps some current authors might be also be persuaded to develop Python versions of popular texts.

4.7 What About Scheme?

The weakness of Scheme is that it is perceived as a marginal language that is quite different from the system languages that are used elsewhere in the curriculum. This has limited its popularity in CS1. Python offers many of the advantages of Scheme while still being similar to languages such as C++, Java, and Ada, thus easing the transition to those languages. For those wishing to explore multiple programming paradigms in the first courses, Python has support for the functional style including first-order functions, map, apply, lambda and closures.

5 Conclusions

One of the major precepts we try to instill in our students is the idea of using the proper tool for a given job. If a client came to me and suggested she needed a program written in a very short time frame, that the program had no tight time or memory constraints, and that once completed it would only be run a few times, I would immediately suggest a scripting language as the appropriate tool. These are exactly the conditions under which programming occurs in most CS1 and CS2 classes. Given the existence of scripting languages such as Python that also provide good support for modular and object-oriented program design, there is no good reason for not using them. Scripting languages are the most appropriate tool for our introductory courses. Given the current lack of consensus on a single first language, it seems like the ideal time to begin a movement to scripting languages. Consider using Python in your introductory classes.

References

Laird, C., Soraiz, K., (1998). Get a Grip on Scripts, Byte, June, pp. 89-96.

Lutz, M., (1996). Programming Python, O'Reilly & Associates, Inc.

McCauley, R. and Manaris, B., (1998). Computer Science Programs: What Do They Look Like? Proceedings of the 29th SIGCSE Technical Symposium on Computer Science Education, February, pp. 15-19.

Ousterhout, J., (1998). Scripting: Higher Level Programming for the 21st Century, IEEE Computer, March.

Watters, A., van Rossum, G., Ahlstrom, J., (1996). Internet Programming with Python, M & T Books, New York, New York.