\$\begingroup\$

Following the interest in this question, I thought it would be interesting to make answers a bit more objective and quantitative by proposing a contest.

The idea is simple: I have generated a binary file containing 50 million gaussian-distributed doubles (average: 0, stdev 1). The goal is to make a program that will sort these in memory as fast as possible. A very simple reference implementation in python takes 1m4s to complete. How low can we go?

The rules are as follows: answer with a program that opens the file "gaussian.dat" and sorts the numbers in memory (no need to output them), and instructions for building and running the program. The program must be able to work on my Arch Linux machine (meaning you can use any programming language or library that is easily installable on this system).

The program must be reasonably readable, so that I can make sure it is safe to launch (no assembler-only solution, please!).

I will run the answers on my machine (quad core, 4 Gigabytes of RAM). The fastest solution will get the accepted answer and a 100 points bounty :)

The program used to generate the numbers:

#!/usr/bin/env python import random from array import array from sys import argv count=int(argv[1]) a=array('d',(random.gauss(0,1) for x in xrange(count))) f=open("gaussian.dat","wb") a.tofile(f)

The simple reference implementation:

#!/usr/bin/env python from array import array from sys import argv count=int(argv[1]) a=array('d') a.fromfile(open("gaussian.dat"),count) print "sorting..." b=sorted(a)

EDIT: only 4 GB of RAM, sorry

EDIT#2: Note that the point of the contest is to see if we can use prior information about the data. it is not supposed to be a pissing match between different programming language implementations!