Consider the following exercise (Ex. 1.5 in “Paradigms of Artifical Intelligence Programming” by Peter Norvig):

Write a function to compute the dot product of two sequences of numbers, represented as lists. The dot product is computed by multiplying corresponding elements and then adding up the resulting products. Example:

(dot-product '(10 20) '(3 4)) => 110

Here are four implementations (the first one is mine, the other three are the solutions to the exercise in the book)

1. My solution: Recursion with accumulator — tail call optimized (TCO)

(defun dot-product (lst1 lst2 &optional (acc 0)) "Computes the dot product of two sequences, represented as lists." (if (not lst1) acc (let ((x (first lst1)) (y (first lst2)) (lst11 (rest lst1)) (lst22 (rest lst2))) (dot-product lst11 lst22 (+ acc (* x y))))))

2. Solution 1 in PAIP: apply and mapcar

(defun dot-product1 (lst1 lst2) (apply #'+ (mapcar #'* lst1 lst2)))

3. Solution 2 in PAIP: recursive without TCO

(defun dot-product2 (lst1 lst2) (if (or (null lst1) (null lst2)) 0 (+ (* (first lst1) (first lst2)) (dot-product2 (rest lst1) (rest lst2)))))

4. Solution 3 in PAIP: iteration and indexing

(defun dot-product3 (lst1 lst2) (let ((sum 0)) (dotimes (i (length lst1)) (incf sum (* (elt lst1 i) (elt lst2 i)))) sum))

Performance

We test the solutions like this:

Set up some test data:

CL-USER> (defparameter *a* (make-list 100000 :initial-element 1))

dot-product: Use the time macro to measure the performance of the function dot-product :

CL-USER> (time (dot-product *a* *a*)) Evaluation took: 0.002 seconds of real time 0.001302 seconds of total run time (0.001301 user, 0.000001 system) 50.00% CPU 3,450,092 processor cycles 0 bytes consed 100000

The function does not allocate any new memory, uses 50% CPU on average and finishes in 0.001302 seconds for a 10,000 element list.

Similarly, we test the other three functions:

dot-product1:

CL-USER> (time (dot-product1 *a* *a*)) Evaluation took: 0.003 seconds of real time 0.002625 seconds of total run time (0.002611 user, 0.000014 system) 100.00% CPU 6,967,544 processor cycles 3,205,632 bytes consed 100000

dot-product2:

CL-USER> (time (dot-product2 *a* *a*)) Control stack guard page temporarily disabled: proceed with caution ...

This one aborted because it ran out of stack space before it could complete.

dot-product3:

CL-USER> (time (dot-product3 *a* *a*)) Evaluation took: 58.350 seconds of real time 58.347480 seconds of total run time (58.330376 user, 0.017104 system) 99.99% CPU 155,215,653,812 processor cycles 0 bytes consed 100000

So it seems my solution is twice as fast as the fastest one in PAIP and has the advantage of not using any extra memory, whereas the PAIP solution takes up nearly 3MB (32 bytes per element). My solution is 44,813 times faster than the PAIP solution that does not allocate extra memory. Admittedly, the PAIP solutions are not about getting into performance but more about showing how to write code in Lisp; and neither has the author delved into Tail-Call-Optimization yet; so I am not criticizing PAIP at all.

Well, dot-product1 looks very concise and elegant, doesn’t it? What if we try to run it with a larger list?

CL-USER> (defparameter *a* (make-list 1000000 :initial-element 1)) *A*

And now, alas, it also exhausts the stack space:

CL-USER> (time (dot-product1 *a* *a*)) Control stack guard page temporarily disabled: proceed with caution ; Evaluation aborted on #<SB-KERNEL::CONTROL-STACK-EXHAUSTED {100341FE93}>.

Presumably this is because we are trying to pass in 1,000,000 arguments to apply and the arguments have to be pushed on to the stack. Perhaps we can tweak this a bit so that the arguments are not pushed on to the stack:

(defun dot-product4 (lst1 lst2) (reduce #'+ (mapcar #'* lst1 lst2) :initial-value 0))

And now this function does run:

CL-USER> (time (dot-product4 *a* *a*)) Evaluation took: 0.212 seconds of real time 0.210735 seconds of total run time (0.182024 user, 0.028711 system) [ Run times consist of 0.143 seconds GC time, and 0.068 seconds non-GC time. ] 99.53% CPU 563,737,032 processor cycles 47,974,560 bytes consed 1000000

Hmm.. it seems to be allocating even more bytes per element than before (47 bytes per element) and there is now some GC overhead. What about my function?

CL-USER> (time (dot-product *a* *a*)) ; *a* is a list with a million elements Evaluation took: 0.014 seconds of real time 0.013474 seconds of total run time (0.013462 user, 0.000012 system) 92.86% CPU 36,021,059 processor cycles 0 bytes consed 1000000

Still jolly good!

Update

Svente posted another (more efficient, very readable) solution in the comments below:

(defun dot-product5 (lst1 lst2) (loop for element1 in lst1 for element2 in lst2 sum (* element1 element2)))

This code uses two nested for loops and a (invisible, hidden) local variable to hold the sum. How does it perform?

CL-USER> (time (dot-product5 *a* *a*)) Evaluation took: 0.007 seconds of real time 0.007470 seconds of total run time (0.007470 user, 0.000000 system) 100.00% CPU 19,859,546 processor cycles 0 bytes consed 1000000

It performs very well indeed; better than my recursive solution.

Summary

CPU, time, memory taken with a list of 10,000 elements:

Soluton CPU % Time (s) Processor cycles/element Bytes consed dot-product 50.0% 0.001302 34.5 0 dot-product1 100.0% 0.002625 69.7 3,205,632 dot-product2 does not even run to completion dot-product3 99.99% 58.35 1552 0

CPU, time, memory taken with a list of 1,000,000 elements:

Soluton CPU % Time (s) Processor cycles/element Bytes consed dot-product 100.0% 0.010221 27.2 0 dot-product4 99.21% 0.1259 336 48M dot-product5 100.0% 0.0079 21.2 0