Last month, I published a post on Finding The Intersection of Two Lists. The impetus for the post was an Emacs Stack Exchange question asking if there was a better way to find the intersection of two lists than looping over both of them. I thought putting the entries of the first list in a hash table and checking the entries of the second list against it was a better solution and I wrote some code to check out that hypothesis. You can see the details in the original post.

I got some gentle pushback from two Elisp luminaries, Chris Wellons and Alphapapa. Wellons noted that I could get a small speed increase by using dolist instead of mapc . More importantly, he noted that I should have compiled my code with lexical mode as that makes a large difference in running time.

Alphapapa, commenting as NoonianAtall, noted that I wasn’t eliminating duplicates and pointed me at some macros that make benchmarking this type of thing easier. He published a gist that showed one of those macros in action.

Naturally, I had to check all this out. I wrote a new version of hash-intersection that used dolist instead of mapc and also made a small change ( hash-intersection-dedup ) to the algorithm so that duplicates were eliminated. Finally, I stole Alphapapa’s version of the code that uses the loop macro instead dolist .

Here’s all the code, including the bit to generate random lists, for reference:

( require ' epdh ) ( require ' dash-functional ) ( defun make-random-list (n) ( let ((l nil)) ( dotimes (i n l) ( push (random 1500) l)))) ( defconst lst1 (make-random-list 1000)) ( defconst lst2 (make-random-list 1000)) ( defun hash-intersection-orig (l1 l2) ( let ((ht (make-hash-table :test #'equal)) (acc nil)) (mapc ( lambda (x) (puthash x t ht)) l1) (mapc ( lambda (x) ( if (gethash x ht nil) ( push x acc))) l2) acc)) ( defun hash-intersection-dolist (l1 l2) ( let ((ht (make-hash-table :test #'equal)) (acc nil)) ( dolist (l l1) (puthash l t ht)) ( dolist (l l2) ( if (gethash l ht nil) ( push l acc))) acc)) ( defun hash-intersection-dedup (l1 l2) ( let ((ht (make-hash-table :test #'equal)) (acc nil)) ( dolist (l l1) (puthash l t ht)) ( dolist (l l2) ( when (gethash l ht nil) (puthash l nil ht) ; only one to a customer ( push l acc))) acc)) ;; From Alphapapa ( defun hash-intersection-loop (l1 l2) ( let ((ht (make-hash-table :test #'equal) )) ( cl-loop for e1 in l1 do (puthash e1 t ht)) ( cl-loop for e2 in l2 when (gethash e2 ht) collect it)))

Now we’re ready to test Wellons’ claims. First, I used the bench-dynamic-vs-lexical-binding macro to measure the difference that compiling with lexical bindings made:

( bench-dynamic-vs-lexical-binding :times 100 :forms (( "hash-intersection-orig" ( progn ( defun hash-intersection-orig (l1 l2) ( let ((ht (make-hash-table :test #'equal)) (acc nil)) (mapc ( lambda (x) (puthash x t ht)) l1) (mapc ( lambda (x) ( if (gethash x ht nil) ( push x acc))) l2) acc)) (hash-intersection-orig lst1 lst2)))))

Form x faster than next Total runtime # of GCs Total GC runtime Lexical: hash-intersection-orig 1.25 0.027208 0 0 Dynamic: hash-intersection-orig slowest 0.033979 0 0

Just as Wellons promised, the lexical version is 25% faster. Next, I compared all the implementations (as well as seq-difference ) compiled with lexical bindings. There were a couple of surprises. First, the deduping version ran slightly faster than the dolist version. That was a pretty consistent result across several runs of the comparison. The only reason for it that I can see is that eliminates the push for all the duplicates.

A bit more surprising is that the dolist version ran 28% faster than the mapc version. That seems pretty high but the numbers did jump around a bit between different runs so perhaps it’s just an anomaly. If you want to play around with Alphapapa’s macros (they’re here) that would be a good experiment to run. There are more macros than I’ve mentioned here so they should meet whatever your benchmarking needs are.

( bench-multi-lexical :times 100 :forms (( "hash-intersection-orig" ( progn ( defun hash-intersection-orig (l1 l2) ( let ((ht (make-hash-table :test #'equal)) (acc nil)) (mapc ( lambda (x) (puthash x t ht)) l1) (mapc ( lambda (x) ( if (gethash x ht nil) ( push x acc))) l2) acc)) (hash-intersection-orig lst1 lst2))) ( "hash-intersection-dolist" ( progn ( defun hash-intersection-dolist (l1 l2) ( let ((ht (make-hash-table :test #'equal)) (acc nil)) ( dolist (l l1) (puthash l t ht)) ( dolist (l l2) ( if (gethash l ht nil) ( push l acc))) acc)) (hash-intersection-dolist lst1 lst2))) ( "seq-intersection" (seq-intersection lst1 lst2)) ( "hash-intersection-loop" ( progn ( defun hash-intersection-loop (l1 l2) ( let ((ht (make-hash-table :test #'equal) )) ( cl-loop for e1 in l1 do (puthash e1 t ht)) ( cl-loop for e2 in l2 when (gethash e2 ht) collect it))) (hash-intersection-loop lst1 lst2))) ( "hash-intersection-dedup" ( progn ( defun hash-intersection-dedup (l1 l2) ( let ((ht (make-hash-table :test #'equal)) (acc nil)) ( dolist (l l1) (puthash l t ht)) ( dolist (l l2) ( when (gethash l ht nil) (puthash l nil ht) ; only one to a customer ( push l acc))) acc)) (hash-intersection-dedup lst1 lst2)))))

Form x faster than next Total runtime # of GCs Total GC runtime hash-intersection-dedup 1.02 0.021943 0 0 hash-intersection-loop 1.09 0.022341 0 0 hash-intersection-dolist 1.28 0.024367 0 0 hash-intersection-orig 315.94 0.031197 0 0 seq-intersection slowest 9.856230 0 0

Finally, I should mention that the macros are part of Alphapapa’s Emacs Package Developer’s Handbook, which has a wealth of useful information—even if you aren’t writing packages—and is definitely worth taking a look at if you haven’t already.