A.D. Corlan > Language Benchmarks

Programming language benchmarks

Translations of this page

This page contains the same program, implemented in the same way, in C, Ada, FORTRAN, Lisp, FORTH, Java, Perl, R and Ruby and run on a 300MHz Pentium under the "woody" Debian GNU/Linux, using exclusively tools that come with the Debian distribution, except for compiling Java I used the jikes compiler from IBM.

This is relevant for applications which involve intensive computations for multiple users, such as simulators with open web access (like warfarissimo and autovaca). For such applications, performance translates directly into cost: the number of machines you need to buy and maintain in order to serve a given number of users or to achieve a required response time is directly proportional to computation time.

The program we benchmarked computes the same 100-term polynomial 500,000 times, using exactly the same algorithm. In all programs the indices of the polynomial are kept in a local float vector. In this, the program only tests the quality of code which accesses local vectors and performs simple arithmetics in loops, and is free from differences in the standard library, operating system calls and, indeed, the presence of any advanced language features.

There are two versions of the program: (1) all computations are done inside a single function body; (2) each polynomial is computed through a call to a function; the vector of the 100 coefficients are kept in a statically dimensioned array inside this function; it is called 500000 times.

The execution speed of the first version was also explored through benchmarks on AMD 64x2.

Results are below (in seconds needed to perform the simple task outlined above). Click on the link with a language name to see the details (program and how to compile and run) for each language:

naive (no hand-optimization) gcc-compiled C is some 30% slower than fortran and ada, otherwise none is affected by a detectable overhead when allocating a local vector in a function;

the gcj compiler does a good work, increasing the speed of code more than twice; however, allocation of local variables (the `pol' array for example) offsets all the gain obtained through native code compilation;

the performance of the CMU-CL lisp compiler was stellar; it was better than Java, even compiled natively, despite its highly dynamic nature; of 10.69 seconds, some 6.5 are for garbage collection.

bytecompiled Perl code is comparable in speed with interpreted R code and is not much slower than Perl code compiled natively; notice the fact that in R we use vectorial operators which also make the text very compact.

Discussion. The relevance of these tests rests in the fact that in any language, performance is governed by how quickly operations, loop, access to array elements, function calls, etc, are performed. Machine-oriented languages (C, FORTRAN, Ada and to some extent Java) perform on average 100 times faster than human-expression-oriented languages (Perl, R, Python, Ruby).

Programming a 1 GHz Pentium [fastest, 1000$, 2002 PC] in Perl is like programming a 10 MHz something [overclocked, 50$, 1982 Z80-based ZX Spectrum] in FORTRAN!

This is no big surprise as we are looking at the classical trade-off between the power of a language (allowing the programmer to express something in a compact way) and the performace of an implementation (which classically was related to the language being close to the machine representation).

However, the huge exception is CommonLisp. Lisp is the most powerful language that is, representing the classical extreme choice for expressive power instead of efficient implementation.

This choice seems to have paid off. Given enough expression power it was possible to write a compiler that competes with any machine-oriented language.

Conclusions (for me):

For new projects in which one is free to choose the language, one should choose Lisp. An example is the offline generation of web views (such as this) of complex, heterogenous databases where there are two requirements: quick implementation of new ideas and a preference for updating the static web site in 10 minutes rather than 15 hours.

If efficiency is very critical and only if the application must be multithreaded, one should choose Ada. Multithreaded, online simulation servers, like the ones mentioned above, are typical examples.

Other languages, with which one might be highly familiar due to historical reasons, could become efficient languages by being translated into CommonLisp and compiled with CMU-CL.

Appendix. Details of each program and how to compile and run it:

tespol.f

program tespol dimension pol(100) real pol integer i,j,n real su,pu,mu real x n = 500000 x = 0.2 mu = 10.0 pu = 0.0 do i = 1,n do j=1,100 mu = (mu + 2.0) / 2.0 pol(j) = mu enddo su = 0.0 do j=1,100 su = x * su + pol(j) enddo pu = pu + su enddo write (*,*) pu end

f77 tespol.f -O6 -o tespol time ./tespol

tespol2.f

function dopoly(x) real x real su,mu integer j dimension pol(100) real pol do j=1,100 mu = (mu + 2.0) / 2.0 pol(j) = mu enddo su = 0.0 do j=1,100 su = x * su + pol(j) enddo dopoly = su end program tespol integer i real pu real x n = 500000 x = 0.2 mu = 10.0 pu = 0.0 do i = 1,n pu = pu + dopoly(x) enddo write (*,*) pu end

f77 tespol2.f -O6 -o tespol2 time ./tespol2

tpol.adb

with Ada.Command_Line; use Ada.Command_Line; with Ada.Text_Io; use Ada.Text_Io; procedure Tpol is Pol: array(1..100) of Float; N: Integer:= Integer'Value(Argument(1)); X: Float:= Float'Value(Argument(2)); S: Float; Mu: Float:= 10.0; Pu: Float:= 0.0; begin for I in 1..N loop for J in 1..100 loop Mu := (Mu + 2.0) / 2.0; Pol(J) := Mu; end loop; S := 0.0; for J in 1..100 loop S := X * S + Pol(J); end loop; Pu := Pu+S; end loop; Put_Line(Float'Image(Pu)); end Tpol;

Compile and run with:

gnatmake -O6 tpol time ./tpol 500000 0.2

tpol2.adb

with Ada.Command_Line; use Ada.Command_Line; with Ada.Text_Io; use Ada.Text_Io; procedure Tpol2 is N: Integer:= Integer'Value(Argument(1)); X: Float:= Float'Value(Argument(2)); Pu: Float:= 0.0; function Dopol(X: Float) return Float is Pol: array(1..100) of Float; S: Float; Mu: Float:= 10.0; begin for J in 1..100 loop Mu := (Mu + 2.0) / 2.0; Pol(J) := Mu; end loop; S := 0.0; for J in 1..100 loop S := X * S + Pol(J); end loop; return S; end Dopol; begin for I in 1..N loop Pu := Pu+Dopol(X); end loop; Put_Line(Float'Image(Pu)); end Tpol2;

Compile and run with:

gnatmake -O6 tpol2 time ./tpol2 500000 0.2

public class tpoly { static public void main(String argv[]) { float mu = (float)10.0; float x,s; float pu = (float)0.0; int su, i, j, n; float pol[] = new float[100]; n = 500000; x = (float)0.2; for(i=0; i<n; i++) { for (j=0; j<100; j++) { mu = (mu + (float)2.0) / (float)2.0; pol[j] = mu; } s = (float)0.0; for (j=0; j<100; j++) { s = x*s + pol[j]; } pu += s; } System.out.println(pu); } }

gcj-3.0 --main=tpoly --classpath=/usr/share/java/libgcj.jar -O2 -o tpoly tpoly.java time ./tpoly

jikes -O tpoly.java time java tpoly

public class tpoly2 { static float dopoly(float x) { float pol[] = new float[100]; int j; float mu = (float)10.0; float s; for (j=0; j<100; j++) { mu = (mu + (float)2.0) / (float)2.0; pol[j] = mu; } s = (float)0.0; for (j=0; j<100; j++) { s = x*s + pol[j]; } return s; } static public void main(String argv[]) { float x; float pu = (float)0.0; int i, n; n = 500000; x = (float)0.2; for(i=0; i<n; i++) { pu += dopoly(x); } System.out.println(pu); } }

gcj-3.0 --main=tpoly --classpath=/usr/share/java/libgcj.jar -O2 -o tpoly2 tpoly2.java time ./tpoly2

jikes -O tpoly2.java time java tpoly2

C

tepol.c

#include <stdio.h> #include <stdlib.h> main(short argc, char **argv) { float mu = 10.0; float x,s; float pu = 0.0; int su, i, j, n; float pol[100]; n = atol(argv[1]); x = atof(argv[2]); for(i=0; i<n; i++) { for (j=0; j<100; j++) { pol[j] = mu = (mu + 2.0) / 2.0; } s = 0.0; for (j=0; j<100; j++) { s = x*s + pol[j]; } pu += s; } printf("%f

",pu); }

gcc -O6 tepol.c -o tepol time ./tepol 500000 0.2

tepol2.c

#include <stdio.h> #include <stdlib.h> float dopoly(float x) { float mu = 10.0; float s; int j; float pol[100]; for (j=0; j<100; j++) { pol[j] = mu = (mu + 2.0) / 2.0; } s = 0.0; for (j=0; j<100; j++) { s = x*s + pol[j]; } return s; } main(short argc, char **argv) { float x; float pu = 0.0; int i, n; n = atol(argv[1]); x = atof(argv[2]); for(i=0; i<n; i++) { pu += dopoly(x); } printf("%f

",pu); }

gcc -O6 tepol2.c -o tepol2 time ./tepol2 500000 0.2

Hand optimised C

bench1.c (by Mihai Manolescu)

#include <stdio.h> #include <stdlib.h> main(short argc, char **argv) { float mu = 10.0; float x,s; float pu = 0.0; int su, i, j, n; float pol[100]; register tm1; register int tp1; n = atol(argv[1]); x = atof(argv[2]); for(i=0; i<n; i++) { tp1=2; tm1=1/2.0; for (j=0; j<100; j++) { pol[j] = mu = (mu + tp1) * tm1; } s = 0.0; for (j=0; j<100; j++) { s = x*s + pol[j]; } pu += s; } printf("%f

",pu); }

Common-Lisp

testpol.lisp

(defun eval-pol (n x) (declare (fixnum n) (single-float x)) (let ((su 0.0) (mu 10.0) (pu 0.0) (pol (make-array 100 :element-type 'single-float))) (declare (single-float su) (single-float mu) (single-float pu)) (dotimes (i n) (declare (fixnum i)) (setf su 0.0) (dotimes (j 100) (declare (fixnum j)) (setf mu (the single-float (/ (+ mu 2.0) 2.0))) (setf (aref pol j) (the single-float mu))) (dotimes (j 100) (declare (fixnum j)) (setf su (the single-float (+ (aref pol j) (the single-float (* su x)))))) (setf pu (the single-float (+ pu su))) ) (prin1 pu) ))

lisp

(load "testpol.lisp") (compile #'eval-pol) (time (eval-pol 500000 0.2))

testpol2.lisp

(proclaim '(optimize)) (declaim (start-block dopoly eval-pol)) (defun dopoly (x) (declare (ftype (function (single-float) single-float) dopoly)) (declare (single-float x)) (let ((su 0.0) (mu 10.0) (pol (make-array 100 :element-type 'single-float))) (declare (single-float su) (single-float mu)) (dotimes (j 100) (declare (fixnum j)) (setf mu (the single-float (/ (the single-float (+ mu 2.0)) 2.0))) (setf (aref pol j) (the single-float mu))) (dotimes (j 100) (declare (fixnum j)) (setf su (the single-float (+ (aref pol j) (the single-float (* su x)))))) su) ) (defun eval-pol (n x) (declare (fixnum n) (single-float x)) (let ((pu 0.0)) (declare (single-float pu)) (dotimes (i n) (declare (fixnum i)) (setf pu (the single-float (+ pu (the single-float (dopoly x))))) ) (prin1 pu) )) (declaim (end-block))

lisp

(load "testpol2.lisp") (compile #'eval-pol) (time (eval-pol 500000 0.2))

D

tepol.d (by Mihai Militaru)

//tepol.d import std.stdio; import std.conv; int main(char[][] args) { float mu = 10.0; float x,s; float pu = 0.0; int su, i, j, n; float pol[100]; n = toLong(args[1]); x = toFloat(args[2]); for(i=0; i<n; i++) { for (j=0; j<100; j++) { pol[j] = mu = (mu + 2.0) / 2.0; } s = 0.0; for (j=0; j%lt;100; j++) { s = x*s + pol[j]; } pu += s; } writefln("%f

",pu); return 0; }

gdmd -O -release -inline tepol.d

time ./tepol 500000 0.2

tepol2.d (by Mihai Militaru)

//tepol2.d import std.stdio; import std.conv; float dopoly(float x) { float mu = 10.0; float s; int j; float pol[100]; for (j=0; j<100; j++) { pol[j] = mu = (mu + 2.0) / 2.0; } s = 0.0; for (j=0; j<100; j++) { s = x*s + pol[j]; } return s; } int main(char[][] args) { float x; float pu = 0.0; int i, n; n = toLong(args[1]); x = toFloat(args[2]); for(i=0; i<n; i++) { pu += dopoly(x); } printf("%f

",pu); return 0; }

gdmd -O -release -inline tepol2.d

time ./tepol2 500000 0.2

Python

tpytpol.py (by Mihai Manolescu)

n = 500000 x = 0.2 def t(x): mu = 10.0 pu = 0.0 pol =[0] * 100 r = range(0,100) for i in range(0,n): for j in r: pol[j] = mu = (mu + 2.0) / 2.0 su = 0.0 for j in r: su = x * su + pol[j] pu = pu + su print pu t(x)

time python tpytpol.py

Perl

tperlpol.pl (by Matei Conovici)

#! /usr/bin/perl $n = $ARGV[0]; $x = $ARGV[1]; $mu = 10; $pu = 0; @pol = (); for ($i = 0; $i < $n; $i++) { for ($j = 0; $j < 100; $j++) { $mu = ($mu + 2) / 2; $pol[$j] = $mu; } $s = 0; for ($j = 0; $j < 100; $j++) { $s = $x * $s + $pol[$j]; } $pu += $s; } print "$pu

";

perlcc -O -o tperlpol tperlpol.pl time ./tperlpol 500000 0.2

perlcc -B -o tperlpol tperlpol.pl time ./tperlpol 500000 0.2

Perl (hand optimized)

pl.pl (by zgrim )

my($n,$x,$mu,$pu,@pol)=(500000,0.2,10,0,(0..99)); for(1..$n) { for (0..$#pol) { $pol[$_] = $mu = ( $mu + 2 ) * 0.5 } $s = 0; for(0..$#pol) { $s = $x * $s + $pol[$_] } $pu += $s } print qq[$pu

];

perlcc -B -o pl pl.pl time ./pl

Perl (hand optimized for native compilation)

perl3.pl by Radu Greab

#!/usr/bin/perl -w use strict; my $n = $ARGV[0]; my $x = $ARGV[1]; my $mu = 10; my $pu = 0; my @pol; foreach (0 .. $n - 1) { foreach (0 .. 99) { $pol[$_] = $mu = ($mu + 2) / 2; } my $s = 0; foreach (0 .. 99) { $s = $x * $s + $pol[$_]; } $pu += $s; } print "$pu

";

perlcc -O -o perl3 perl3.pl time ./perl3

R (or S, Splus)

trpol2.R