The Explorer

The Adventures of a Pythonista in Schemeland/7

by Michele Simionato

October 21, 2008



Scheme and Lisp have a particular data type which is missing in most languages (with the exception of Ruby): the symbol.

From the grammar point of view, a symbol is just a quoted identifier, i.e. a sequence of characters corresponding to a valid identifier preceded by a quote. For instance, 'a , 'b1 e 'c_ are symbols. On the set of symbols there is an equality operator eq? which is able to determine if two symbols are the same or not:

> (define sym 'a) > (eq? sym 'b) #f > (eq? sym 'a) #t

#f e #t are the Boolean values False and True respectively, as you may have imagined. The equality operator is extremely efficient on symbols, since the compiler associates to every symbol an integer number (this operation is called hashing) and stores it in an interal registry (this operation is called interning): when the compiler checks the identity of two symbols it actually checks the equality of two integer numbers, which is an extremely fast operation.

You may get the number associated to a symbol with the function symbol-hash :

> (symbol-hash sym) 117416170 > (symbol-hash 'b) 134650981 > (symbol-hash 'a) 117416170

It is always possible to convert a string into a symbol and viceversa thanks to the functions string->symbol and symbol->string , however conceptually - and also practically - symbols in Scheme are completely different from strings.

The situation is not really different in Python. It is true that symbols do not exist as a primitive data type, however strings corresponding to names of Python objects are actually treated as symbols. You can infer this from the documentation about the builtin functions hash e intern, which says: normally, the names used in Python programs are automatically interned, and the dictionaries used to hold module, class or instance attributes have interned keys. BTW, if you want to know exactly how string comparison works in Python I suggest you to look at this post:

Scheme has much more valid identifiers than Python or C, where the valid characters are restricted to a-zA-Z-0-9_ (I am ignoring the possibility of having Unicode characters in identifiers, which is possible both in R6RS Scheme and Python 3.0). By convention, symbols ending by ? are associated to boolean values or to boolean-valued functions, whereas symbols ending by ! are associated to functions or macros with side effects.

The function eq? , is polymorphic and works on any kind of object, but it may surprise you sometimes:

> (eq? "pippo" "pippo") #f

The reason is that eq? (corrisponding to is in Python) checks if two objects are the same object at the pointer level, but it does not check the content. Actually, Python works the same. It is only by accident than "pippo" is "pippo" returns True on my machine, since the CPython implementation manages differently "short" strings from "long" strings:

>>> "a"*10 is "a"*10 # a short string True >>> "a"*100 is "a"*100 # a long string False

If you want to check if two objects have the same content you should use the function equal? , corresponding to == in Python:

> (equal? "pippo" "pippo") #t

It you know the type of the objects you can use more efficient equality operators; for instance for strings you can use string=? and for integer numbers = :

> (string=? "pippo" "pippo") #t > (= 42 42) #t