Last month, I published a post on Finding The Intersection of Two Lists. The impetus for the post was an Emacs Stack Exchange question asking if there was a better way to find the intersection of two lists than looping over both of them. I thought putting the entries of the first list in a hash table and checking the entries of the second list against it was a better solution and I wrote some code to check out that hypothesis. You can see the details in the original post.
I got some gentle pushback from two Elisp luminaries, Chris Wellons and Alphapapa. Wellons noted that I could get a small speed increase by using dolist
instead of mapc
. More importantly, he noted that I should have compiled my code with lexical mode as that makes a large difference in running time.
Alphapapa, commenting as NoonianAtall, noted that I wasn’t eliminating duplicates and pointed me at some macros that make benchmarking this type of thing easier. He published a gist that showed one of those macros in action.
Naturally, I had to check all this out. I wrote a new version of hash-intersection
that used dolist
instead of mapc
and also made a small change (hash-intersection-dedup
) to the algorithm so that duplicates were eliminated. Finally, I stole Alphapapa’s version of the code that uses the loop
macro instead dolist
.
Here’s all the code, including the bit to generate random lists, for reference:
(require 'epdh) (require 'dash-functional) (defun make-random-list (n) (let ((l nil)) (dotimes (i n l) (push (random 1500) l)))) (defconst lst1 (make-random-list 1000)) (defconst lst2 (make-random-list 1000)) (defun hash-intersection-orig (l1 l2) (let ((ht (make-hash-table :test #'equal)) (acc nil)) (mapc (lambda (x) (puthash x t ht)) l1) (mapc (lambda (x) (if (gethash x ht nil) (push x acc))) l2) acc)) (defun hash-intersection-dolist (l1 l2) (let ((ht (make-hash-table :test #'equal)) (acc nil)) (dolist (l l1) (puthash l t ht)) (dolist (l l2) (if (gethash l ht nil) (push l acc))) acc)) (defun hash-intersection-dedup (l1 l2) (let ((ht (make-hash-table :test #'equal)) (acc nil)) (dolist (l l1) (puthash l t ht)) (dolist (l l2) (when (gethash l ht nil) (puthash l nil ht) ;only one to a customer (push l acc))) acc)) ;; From Alphapapa (defun hash-intersection-loop (l1 l2) (let ((ht (make-hash-table :test #'equal) )) (cl-loop for e1 in l1 do (puthash e1 t ht)) (cl-loop for e2 in l2 when (gethash e2 ht) collect it)))
Now we’re ready to test Wellons’ claims. First, I used the bench-dynamic-vs-lexical-binding
macro to measure the difference that compiling with lexical bindings made:
(bench-dynamic-vs-lexical-binding :times 100 :forms (("hash-intersection-orig" (progn (defun hash-intersection-orig (l1 l2) (let ((ht (make-hash-table :test #'equal)) (acc nil)) (mapc (lambda (x) (puthash x t ht)) l1) (mapc (lambda (x) (if (gethash x ht nil) (push x acc))) l2) acc)) (hash-intersection-orig lst1 lst2)))))
Form | x faster than next | Total runtime | # of GCs | Total GC runtime |
---|---|---|---|---|
Lexical: hash-intersection-orig | 1.25 | 0.027208 | 0 | 0 |
Dynamic: hash-intersection-orig | slowest | 0.033979 | 0 | 0 |
Just as Wellons promised, the lexical version is 25% faster. Next, I compared all the implementations (as well as seq-difference
) compiled with lexical bindings. There were a couple of surprises. First, the deduping version ran slightly faster than the dolist
version. That was a pretty consistent result across several runs of the comparison. The only reason for it that I can see is that eliminates the push
for all the duplicates.
A bit more surprising is that the dolist
version ran 28% faster than the mapc
version. That seems pretty high but the numbers did jump around a bit between different runs so perhaps it’s just an anomaly. If you want to play around with Alphapapa’s macros (they’re here) that would be a good experiment to run. There are more macros than I’ve mentioned here so they should meet whatever your benchmarking needs are.
(bench-multi-lexical :times 100 :forms (("hash-intersection-orig" (progn (defun hash-intersection-orig (l1 l2) (let ((ht (make-hash-table :test #'equal)) (acc nil)) (mapc (lambda (x) (puthash x t ht)) l1) (mapc (lambda (x) (if (gethash x ht nil) (push x acc))) l2) acc)) (hash-intersection-orig lst1 lst2))) ("hash-intersection-dolist" (progn (defun hash-intersection-dolist (l1 l2) (let ((ht (make-hash-table :test #'equal)) (acc nil)) (dolist (l l1) (puthash l t ht)) (dolist (l l2) (if (gethash l ht nil) (push l acc))) acc)) (hash-intersection-dolist lst1 lst2))) ("seq-intersection" (seq-intersection lst1 lst2)) ("hash-intersection-loop" (progn (defun hash-intersection-loop (l1 l2) (let ((ht (make-hash-table :test #'equal) )) (cl-loop for e1 in l1 do (puthash e1 t ht)) (cl-loop for e2 in l2 when (gethash e2 ht) collect it))) (hash-intersection-loop lst1 lst2))) ("hash-intersection-dedup" (progn (defun hash-intersection-dedup (l1 l2) (let ((ht (make-hash-table :test #'equal)) (acc nil)) (dolist (l l1) (puthash l t ht)) (dolist (l l2) (when (gethash l ht nil) (puthash l nil ht) ;only one to a customer (push l acc))) acc)) (hash-intersection-dedup lst1 lst2)))))
Form | x faster than next | Total runtime | # of GCs | Total GC runtime |
---|---|---|---|---|
hash-intersection-dedup | 1.02 | 0.021943 | 0 | 0 |
hash-intersection-loop | 1.09 | 0.022341 | 0 | 0 |
hash-intersection-dolist | 1.28 | 0.024367 | 0 | 0 |
hash-intersection-orig | 315.94 | 0.031197 | 0 | 0 |
seq-intersection | slowest | 9.856230 | 0 | 0 |
Finally, I should mention that the macros are part of Alphapapa’s Emacs Package Developer’s Handbook, which has a wealth of useful information—even if you aren’t writing packages—and is definitely worth taking a look at if you haven’t already.
Update NoonianAtall (Alphapapa) says that I should have included cl-intersection and -intersection in my comparisons. Fortunately, he’s already done that in the gist that I mentioned so you can check them out there. Cl-intersection
is, of course, included with Emacs and -intersection
is from Dash which you probably also have since it’s a dependency of many popular packages.