Finding Pangrams in the NYT Spelling Bee Puzzle

From time to time I find a reference to a New York Times article in my feed. I had to register with the NYT to be able to read them and one side effect of that registration is that every morning I get a short news summary emailed to me. At the end of the summary, they always have the day’s Spelling Bee puzzle. You can read the exact rules at the link but the TL;DR is that you’re given 7 letters and have to make as many words as possible from them, possibly repeating letters. A word that contains all 7 letters is called a pangram and earns extra points.

Working the puzzle is an open ended process because you never know when you’ve found all the words. I don’t have the time for that but I do like to see if I can find the pangram. A quirk of my mental makeup makes me pretty good at it but sometimes I can’t find the answer. Naturally this caused the nerd in me to rise up and I started thinking about how I could automate the search. It turns out to be straightforward and easy so I’ll devote the rest of this post to showing how I did it with Elisp.

The basic idea is to start with a word list and assign each word a key. Then assign a key to the 7 letters in the same way and look for matches. For a word list, I used the dictionary word list that is part of every Unix/Linux distribution. The key is simply the unique letters of the word in alphabetical order. Here’s the Elisp to calculate it:

(defun bee--calc-key (word)
  "Derive a hash key from a word."
  (-> word
      (string-to-list)
      (sort #'<)
      (delete-consecutive-dups)
      concat))

The word is converted to a list of it’s letters, the list is sorted, the duplicate letters are deleted, and the remaining sorted letters are turned back into a string. If you’re not familiar with the thread first macro (->) see my post on threading macros.

Next, we have to read in the word list and massage it a bit:

(defun bee--get-word-list ()
  "Load the dictionary words as a lowercase list."
  (with-temp-buffer
    (insert-file-contents-literally "/usr/share/dict/words")
    (delete-consecutive-dups (mapcar #'downcase
                                     (split-string  (buffer-string))))))

To avoid ambiguities, we convert everything to lower case. This can cause some duplicates (a and A, for example) so we eliminate duplicates.

We do the “matching” with a hash table, of course. The next two functions build the hash table. The bee--build-hash function loops through the (massaged) word list and hashes each word by its key using the bee-hash-item function. The actual hashing is done by bee--hash-item. It retrieves the previous list of words with that key (possibly nil) and conses the new word onto it.

(defun bee--hash-item (word)
  "Add a single word to the hash."
  (let ((key (bee--calc-key word)))
    (puthash key (cons  word (gethash key bee--hash)) bee--hash)))

(setq bee--hash (make-hash-table :test #'equal))

(defun bee--build-hash ()
  "Add each dictionary word to the hash."
  (interactive)
  (mapcar #'bee--hash-item (bee--get-word-list)))

Finally, we ask for the 7 letters, turn them into a key, and query the hash table for matches:

(defun bee-solve (letters)
  "Find all the pangrams for the NYT Spelling Bee."
  (interactive "sLetters: ")
  (when (hash-table-empty-p bee--hash)
    (bee--build-hash))
  (-> letters
      (bee--calc-key)
      (gethash bee--hash)
      print))

Coda

When I first wrote this, I used delete-duplicates instead of delete-consecutive-dups. That makes a huge difference in run time because delete-duplicates doesn’t take advantage of the list being sorted so it has to do a lot of extra work. The original took between half an hour and a hour to run. With delete-consecutive-dups it takes about a second. I could probably eliminate even that latency by preprocessing the word list with something like

tr "A-Z" "a-z" </usr/share/dict/words | uniq

but I didn’t bother.

It would be pretty easy to turn this into a complete solution—finding all words that can be formed using the 7 letters not just the pangrams. All you’d need do is find the subsets of the 7 letters (the subsets of at least length 4 according to the rules) and call bee-solve on each of them. I’ll leave that as an exercise for the interested.

This entry was posted in General and tagged , . Bookmark the permalink.