Last month, I published a post on Finding The Intersection of Two Lists. The impetus for the post was an Emacs Stack Exchange question asking if there was a better way to find the intersection of two lists than looping over both of them. I thought putting the entries of the first list in a hash table and checking the entries of the second list against it was a better solution and I wrote some code to check out that hypothesis. You can see the details in the original post.
I got some gentle pushback from two Elisp luminaries, Chris Wellons and Alphapapa. Wellons noted that I could get a small speed increase by using dolist instead of mapc. More importantly, he noted that I should have compiled my code with lexical mode as that makes a large difference in running time.
Alphapapa, commenting as NoonianAtall, noted that I wasn’t eliminating duplicates and pointed me at some macros that make benchmarking this type of thing easier. He published a gist that showed one of those macros in action.
Naturally, I had to check all this out. I wrote a new version of hash-intersection that used dolist instead of mapc and also made a small change (hash-intersection-dedup) to the algorithm so that duplicates were eliminated. Finally, I stole Alphapapa’s version of the code that uses the loop macro instead dolist.
Here’s all the code, including the bit to generate random lists, for reference:
(require 'epdh)
(require 'dash-functional)
(defun make-random-list (n)
(let ((l nil))
(dotimes (i n l)
(push (random 1500) l))))
(defconst lst1 (make-random-list 1000))
(defconst lst2 (make-random-list 1000))
(defun hash-intersection-orig (l1 l2)
(let ((ht (make-hash-table :test #'equal))
(acc nil))
(mapc (lambda (x) (puthash x t ht)) l1)
(mapc (lambda (x) (if (gethash x ht nil)
(push x acc)))
l2)
acc))
(defun hash-intersection-dolist (l1 l2)
(let ((ht (make-hash-table :test #'equal))
(acc nil))
(dolist (l l1)
(puthash l t ht))
(dolist (l l2)
(if (gethash l ht nil)
(push l acc)))
acc))
(defun hash-intersection-dedup (l1 l2)
(let ((ht (make-hash-table :test #'equal))
(acc nil))
(dolist (l l1)
(puthash l t ht))
(dolist (l l2)
(when (gethash l ht nil)
(puthash l nil ht) ;only one to a customer
(push l acc)))
acc))
;; From Alphapapa
(defun hash-intersection-loop (l1 l2)
(let ((ht (make-hash-table :test #'equal) ))
(cl-loop for e1 in l1
do (puthash e1 t ht))
(cl-loop for e2 in l2
when (gethash e2 ht)
collect it)))
Now we’re ready to test Wellons’ claims. First, I used the bench-dynamic-vs-lexical-binding macro to measure the difference that compiling with lexical bindings made:
(bench-dynamic-vs-lexical-binding
:times 100
:forms (("hash-intersection-orig" (progn
(defun hash-intersection-orig (l1 l2)
(let ((ht (make-hash-table :test #'equal))
(acc nil))
(mapc (lambda (x) (puthash x t ht)) l1)
(mapc (lambda (x) (if (gethash x ht nil)
(push x acc)))
l2)
acc))
(hash-intersection-orig lst1 lst2)))))
| Form |
x faster than next |
Total runtime |
# of GCs |
Total GC runtime |
| Lexical: hash-intersection-orig |
1.25 |
0.027208 |
0 |
0 |
| Dynamic: hash-intersection-orig |
slowest |
0.033979 |
0 |
0 |
Just as Wellons promised, the lexical version is 25% faster. Next, I compared all the implementations (as well as seq-difference) compiled with lexical bindings. There were a couple of surprises. First, the deduping version ran slightly faster than the dolist version. That was a pretty consistent result across several runs of the comparison. The only reason for it that I can see is that eliminates the push for all the duplicates.
A bit more surprising is that the dolist version ran 28% faster than the mapc version. That seems pretty high but the numbers did jump around a bit between different runs so perhaps it’s just an anomaly. If you want to play around with Alphapapa’s macros (they’re here) that would be a good experiment to run. There are more macros than I’ve mentioned here so they should meet whatever your benchmarking needs are.
(bench-multi-lexical
:times 100
:forms (("hash-intersection-orig" (progn
(defun hash-intersection-orig (l1 l2)
(let ((ht (make-hash-table :test #'equal))
(acc nil))
(mapc (lambda (x) (puthash x t ht)) l1)
(mapc (lambda (x) (if (gethash x ht nil)
(push x acc)))
l2)
acc))
(hash-intersection-orig lst1 lst2)))
("hash-intersection-dolist" (progn
(defun hash-intersection-dolist (l1 l2)
(let ((ht (make-hash-table :test #'equal))
(acc nil))
(dolist (l l1)
(puthash l t ht))
(dolist (l l2)
(if (gethash l ht nil)
(push l acc)))
acc))
(hash-intersection-dolist lst1 lst2)))
("seq-intersection" (seq-intersection lst1 lst2))
("hash-intersection-loop" (progn
(defun hash-intersection-loop (l1 l2)
(let ((ht (make-hash-table :test #'equal) ))
(cl-loop for e1 in l1
do (puthash e1 t ht))
(cl-loop for e2 in l2
when (gethash e2 ht)
collect it)))
(hash-intersection-loop lst1 lst2)))
("hash-intersection-dedup" (progn
(defun hash-intersection-dedup (l1 l2)
(let ((ht (make-hash-table :test #'equal))
(acc nil))
(dolist (l l1)
(puthash l t ht))
(dolist (l l2)
(when (gethash l ht nil)
(puthash l nil ht) ;only one to a customer
(push l acc)))
acc))
(hash-intersection-dedup lst1 lst2)))))
| Form |
x faster than next |
Total runtime |
# of GCs |
Total GC runtime |
| hash-intersection-dedup |
1.02 |
0.021943 |
0 |
0 |
| hash-intersection-loop |
1.09 |
0.022341 |
0 |
0 |
| hash-intersection-dolist |
1.28 |
0.024367 |
0 |
0 |
| hash-intersection-orig |
315.94 |
0.031197 |
0 |
0 |
| seq-intersection |
slowest |
9.856230 |
0 |
0 |
Finally, I should mention that the macros are part of Alphapapa’s Emacs Package Developer’s Handbook, which has a wealth of useful information—even if you aren’t writing packages—and is definitely worth taking a look at if you haven’t already.
Update [2020-02-08 Sat 11:22]: NoonianAtall (Alphapapa) says that I should have included cl-intersection and -intersection in my comparisons. Fortunately, he’s already done that in the gist that I mentioned so you can check them out there. Cl-intersection is, of course, included with Emacs and -intersection is from Dash which you probably also have since it’s a dependency of many popular packages.