Common Lisp Series package

Introduction

Richard C. Water’s SERIES package for Common Lisp introduces the Series data type representing lazy sequences and a set of operations on them. The interesting thing about this package is that it allows to substitute many typical loop constructions with more functional approach but without losing on performance.

Restrictions

The idea is to use CL macrosystem to process the composition of expressions into the call graph and then effectively generate plain loops. Of course this could only be done for a subset of expressions which satisfy the set of limitations on optimizable expressions:

Expressions must be statically analyzable Expressions must be straight-line computations Procedures called by expressions must be preorder Intermediate values in computations must be sequences Every non-directed data-flow in expression must be on-line

The good thing about these restrictions is that the SERIES package during the code walking phase explicitely checks them and issue a warning if some expression could not be optimized. In this case a developer have an opportunity to manually optimize it or restructure to create optimizable expression.

How to install and use

Just use

( ql:quickload "series" )

to install.

In order to use one could either call the functions directly from the series package or use the function setup all necessary redefinitions for defun, let etc to try to automatically analyze the code and to setup helper reader macros:

( eval-when ( :compile-toplevel :execute :load-toplevel ) ( series::install ))

Performance

In order to test it I decided to compare with a pure functional implementation of some extremely simple function. All comparisons were made with LispWorks 7.0 32bit for Mac OSX. In all cases I compare performance of pure-functional implementation of the algorithm vs implementation with SERIES, and providing comparison with the simple loop implementation.

Sum of the list

First trivial task is to calculate the sum of the squares of numbers in a list. Pure functional implementation:

( defun sum-squares-cl ( lst ) ( declare ( optimize ( safety 0 ) ( speed 3 ))) ( reduce #' + ( mapcar ( lambda ( x ) ( expt x 2 )) lst )))

The pure functional implementation is a reduce over mapped list with applied square function. Pure loop-based implementation looks like this:

( defun sum-squares-loop2 ( lst ) ( declare ( optimize ( safety 0 ) ( speed 3 ))) ( loop for x in lst summing ( expt x 2 )))

Here we use summing facility of the loop macro, avoiding introducing accumulator variable.

The implementation using SERIES package:

( defun sum-squares-series2 ( lst ) ( declare ( optimize ( safety 0 ) ( speed 3 ))) ( collect-sum ( mapping (( x ( scan lst ))) ( expt x 2 ))))

Here the conversion of the list lst into the series happens with function scan . Series mapping macro defines a list of let like bindings as a first argument and a body of the function as a second. collect-sum allows to collect resuls. More generic implementation below:

( defun sum-squares-series1 ( lst ) ( declare ( optimize ( safety 0 ) ( speed 3 ))) ( collect-fn 'integer ( lambda () 0 ) #' + ( mapping (( x ( scan lst ))) ( expt x 2 ))))

More generic version uses the collect-fn function - the building block for accumulation functions using series expressions.

Lets generate a test data using using alexandria’s iota function which generates a list of integers up to its argument (thanks to Steve Losh for suggestion)

(defparameter *data* (loop :for i :below 4000 :collect (iota i)))

Simple comparison (gives the following results:

SERIES-TEST 4 > (time (loop for d in *data* do (sum-squares-cl d))) Timing the evaluation of (LOOP FOR D IN *DATA* DO (SUM-SQUARES-CL D)) User time = 1.870 System time = 0.002 Elapsed time = 1.858 Allocation = 155486812 bytes 0 Page faults Calls to %EVAL 76022 NIL SERIES-TEST 5 > (time (loop for d in *data* do (sum-squares-loop2 d))) Timing the evaluation of (LOOP FOR D IN *DATA* DO (SUM-SQUARES-LOOP2 D)) User time = 0.928 System time = 0.001 Elapsed time = 0.908 Allocation = 59525512 bytes 1 Page faults Calls to %EVAL 76022 NIL SERIES-TEST 6 > (time (loop for d in *data* do (sum-squares-series1 d))) Timing the evaluation of (LOOP FOR D IN *DATA* DO (SUM-SQUARES-SERIES1 D)) User time = 0.861 System time = 0.001 Elapsed time = 0.843 Allocation = 59521180 bytes 0 Page faults Calls to %EVAL 76022 NIL SERIES-TEST 7 > (time (loop for d in *data* do (sum-squares-series2 d))) Timing the evaluation of (LOOP FOR D IN *DATA* DO (SUM-SQUARES-SERIES2 D)) User time = 0.943 System time = 0.002 Elapsed time = 0.927 Allocation = 59518424 bytes 0 Page faults Calls to %EVAL 76022 NIL

From this results one can see that the pure functional implementation is slowest, while using both versions of the series function gives the same performance as the version with loop macro.

Let’s try to macroexpand the series implementation of this function:

SERIES-TEST 8 > ( pprint ( macroexpand ' ( collect-sum ( mapping (( x ( scan lst ))) ( expt x 2 ))))) ( COMMON-LISP:LET* (( #7= #:OUT-86150 LST )) ( COMMON-LISP:LET ( #8= #:ELEMENTS-86142 #1= #:LISTPTR-86144 #6= #:TEMP-86145 ( #2= #:LIMIT-86146 0 ) ( #3= #:INDEX-86147 -1 ) #4= #:LSTP-86148 #9= #:ITEMS-86152 ( #5= #:SUM-86139 0 )) ( DECLARE ( TYPE LIST #1# ) ( TYPE SERIES::VECTOR-INDEX+ #2# ) ( TYPE SERIES::-VECTOR-INDEX #3# ) ( TYPE BOOLEAN #4# ) ( TYPE NUMBER #5# )) ( LOCALLY ( DECLARE ( TYPE ARRAY #6# )) ( IF ( SETQ #4# ( LISTP #7# )) ( SETQ #1# #7# #6# #( )) ( LOCALLY ( DECLARE ( TYPE ARRAY #7# )) ( SETQ #6# #7# ) ( SETQ #2# ( SERIES::ARRAY-FILL-POINTER-OR-TOTAL-SIZE #7# )))) ( TAGBODY #10= #:LL-86153 ( IF #4# ( PROGN ( IF ( ENDP #1# ) ( GO SERIES::END )) ( SETQ #8# ( CAR #1# )) ( SETQ #1# ( CDR #1# ))) ( PROGN ( INCF #3# ) ( LOCALLY ( DECLARE ( TYPE ARRAY #7# ) ( TYPE SERIES::VECTOR-INDEX #3# )) ( IF ( >= #3# #2# ) ( GO SERIES::END )) ( SETQ #8# ( THE SERIES::*TYPE* ( ROW-MAJOR-AREF #7# #3# )))))) ( SETQ #9# (( LAMBDA ( X ) ( EXPT X 2 )) #8# )) ( SETQ #5# ( + #5# #9# )) ( GO #10# ) SERIES::END ) #5# )))

Up to the TAGBOBY intstruction follows declarations. As one can see, the actual algorithm implementation starts with TAGBODY , with the iteration label #10 and end of loop SERIES::END .

Sum of the list with conditions.

Now lets evolve this example. The task is to calculate the sum of first 5 numbers, which squares are above 50. For pure functional implementation we need to add the take-first-n function, which will take first N element of the list (basically a wrapper around subseq :

( defun take-first-n ( lst n ) ( declare ( optimize ( safety 0 ) ( speed 3 ))) ( let (( l ( length lst ))) ( subseq lst 0 ( min n l ))))

Now the implementation is trivial:

( defun sum-squares-of-5-above-50-cl ( lst ) ( declare ( optimize ( safety 0 ) ( speed 3 ))) ( reduce #' + ( take-first-n ( remove-if ( lambda ( x ) ( < x 50 )) ( mapcar ( lambda ( x ) ( expt x 2 )) lst )) 5 )))

Obivious problem of this implementation is that we create a 3 intermediate lists: first with mapcar , next with remove-if and last with the take-first-n .

The implementation with SERIES will look similar but will not need the wrapper function:

( defun sum-squares-of-5-above-50-series ( lst ) ( declare ( optimize ( safety 0 ) ( speed 3 ))) ( collect-sum ( subseries ( choose-if ( lambda ( x ) ( >= x 50 )) ( mapping (( x ( scan lst ))) ( expt x 2 ))) 0 5 )))

The loop-based implementation not as easily readable:

( defun sum-squares-of-5-above-50-naive ( lst ) ( declare ( optimize ( safety 0 ) ( speed 3 ))) ( let (( sum 0 )) ( loop for x in lst for counter = 0 for y = ( expt x 2 ) thereis ( < counter 5 ) if ( > y 50 ) do ( incf counter ) ( incf sum y )) sum ))

Here we need intermediate variable - counter - to be able to break the loop. Let’s measure the performance:

SERIES-TEST 9 > (time (loop for d in *data* do (sum-squares-of-5-above-50-cl d))) Timing the evaluation of (LOOP FOR D IN *DATA* DO (SUM-SQUARES-OF-5-ABOVE-50-CL D)) User time = 1.233 System time = 0.187 Elapsed time = 1.856 Allocation = 194243052 bytes 3006 Page faults Calls to %EVAL 76022 NIL SERIES-TEST 10 > (time (loop for d in *data* do (sum-squares-of-5-above-50-series d))) Timing the evaluation of (LOOP FOR D IN *DATA* DO (SUM-SQUARES-OF-5-ABOVE-50-SERIES D)) User time = 0.107 System time = 0.003 Elapsed time = 0.088 Allocation = 2419888 bytes 37 Page faults Calls to %EVAL 76022 NIL SERIES-TEST 11 > (time (loop for d in *data* do (sum-squares-of-5-above-50-naive d))) Timing the evaluation of (LOOP FOR D IN *DATA* DO (SUM-SQUARES-OF-5-ABOVE-50-NAIVE D)) User time = 0.036 System time = 0.000 Elapsed time = 0.019 Allocation = 2399620 bytes 0 Page faults Calls to %EVAL 76022 NIL

As it can be seen the the performance difference more than 10 times between functional and series/loop based implementation.

Conclusion

It is preferrable to use SERIES package instead of pure functional approach then possible since it will keep the code as readable/maintainable without giving up on performance. Obiviously not all algorithms could be implemented with SERIES; the authors of SERIES package claim however that analyzed 80% of loops could be implemented with SERIES.

Updated test results after Steve Losh’s suggestion.