Solution To The Add-A-Gram Challenge

Last week, I issued a challenge to solve the Add-A-Gram puzzle using Emacs and Elisp. The puzzle statement is here.

This is an interesting problem that’s easy to get wrong.The solution seems straightforward:

Start with a 3-letter word (car, say).
For each letter of the alphabet, check if the original letters (a, c, r) plus the new letter can be rearranged to make a new word in the dictionary. If it can, remember it for later processing. If not, remember the length of what you’ve got so far.
For each remembered word, repeat the process starting at step 2.

Here’s the guts of that plan in Common Lisp

(defun one-word (word)
  (let (stack)
    (push (list word) stack)
    (while stack
      (let* ((chain (pop stack))
             (w (car chain)))
        (dolist (l *letters*)  ; ("a" "b" "c" ... "z")
          (let* ((trial (concatenate 'string w l))
                (anagram (get-anagram trial)))
            (if anagram
                (push (cons anagram chain) stack)
                (longest chain))))))))

The get-anagram function returns a word in the dictionary having the same letters as the string given as its argument. The call to longest just records the length of the current chain and the chain itself if it’s longer than the previously remembered chain. Here’s an example run

CL-USER> (progn (one-word "car") (show-longest))
("trisoctahedrons" "orchestrations" "orchestration" "retroactions"
"retractions" "carrotiest" "retroacts" "tractors" "cottars" "tracts"
"tract" "cart" "car")

showing the chain from “car” to “trisoctahedrons” for a length of 15. The rest of the code just goes through the dictionary line by line looking for 3-letter words and calls one-word with each one it finds. That seems simple enough except that finding the chain starting with “car” took 40.2 seconds on my 4-core i5 iMac. The longest chain takes 58.6 seconds. There are 978 3-letter words in the dictionary. Finding the longest chain with this method took just under 5 hours with compiled Common Lisp.

The problem with this method is that it examines every possibility, even if it starts with the 3-letter word for the longest chain. To find a better solution, we must find a method that

Stops as soon as it finds the solution.
Starts with the most likely candidates for that solution.

To see how to do that, ask yourself, “What’s the best I can possibly do?” The answer is ethylenediaminetetraacetates, the longest word in the dictionary. Can we build a chain from a 3-letter word to ethylenediaminetetraacetates? If we try each 3-letter starting point, we’re right back to original solution. Instead, let’s turn the question upside down and ask if we can build a chain down from ethylenediaminetetraacetates to some 3-letter word. If we can’t, then try with the second longest word and so on.

To help with that, I sorted the dictionary by decreasing word size. I could have done this inside Emacs, I suppose, but I needed it for the CL version too so I just used

awk '{print length(), $0 |"sort -nr"}' WORD.LST | cut -d ' ' -f 2 >sorted-words

The rest is pretty easy. Here’s the entire code

(require 'cl)

(defun make-key (str)
  "Return a string with the same characters as STR but in sorted order"
  (coerce (sort (coerce str 'list) #'<) 'string))

(defun get-anagram (str)
  "Return an anagram of the letters in STR"
  (gethash (make-key str) anagrams))

(defun load-anagrams ()
  "Build the anagram hash"
  (beginning-of-buffer)
  (while (search-forward-regexp "\\(\\w+\\)" nil t)
    (puthash (make-key (match-string 1)) (match-string 1) anagrams)))

(defun down (chain)
  "Try to build a chain down to a 3 letter word"
  (if (= (length (car chain)) 3)
      (throw 'done (mapcar 'get-anagram chain))
    (let* ((word (car chain))
           (uniques (coerce (remove-duplicates word) 'list)))
      (dolist (c uniques)
        (let ((new-word (remove* c word :count 1)))
          (if (get-anagram new-word)
              (down (cons new-word chain))))))))

(defun solve (dict)
  "Call down with every word in DICT until it finds a solution"
  (let ((anagrams (make-hash-table :test 'equal :size 17500)))
    (with-temp-buffer
      (insert-file-contents dict)
      (load-anagrams)
      (beginning-of-buffer)
      (print (catch 'done
         (while (search-forward-regexp "\\(\\w+\\)" nil t)
           (down (list (match-string 1)))))))))

All the real work is done in down and is pretty much the opposite of what we did in the first solution:

Start with the target (ethylenediaminetetraacetates, say).
For each unique letter in the target, remove one copy of that letter from the target and check if you can make a new word. If you can, recursively call down with the shortened word.
If you get down to 3 letters, quit immediately with a throw, returning the chain.
If you can’t get down to 3 letters, get the next longest word from the dictionary and start again.

The solve function loads the dictionary into a temporary buffer, builds the anagram hash, and calls down for each word in the dictionary until a solution is found. When the throw is taken and catch returns, we know we have a solution because nothing longer worked and the rest of the words in the dictionary are no longer than the current word being tested.

Here’s the run

ELISP> (solve "~/Desktop/sorted-words")
("ion" "into" "niton" "nitons" "anoints" "enations" "antinoise"
"antimonies" "inseminator" "terminations" "antimodernist"
"determinations" "underestimation" "underestiations")

Notice that the solution is not unique because we record only 1 anagram for each key. If we chose other anagrams, we would have gotten different words but the length would be the same.

The important thing to notice here, though, is how much work Emacs does. Rather than writing a loop to read in each line of the dictionary (twice) as I had to do in the CL version of this solution, I just used insert-file-contents to load it into a buffer once and for all. This is what Xah Lee calls leveraging the power to Emacs to make these types of chores easier. The problem itself is not something that Emacs is supposed to deal with; we just use the facilities it provides to make our job easier.

By the way, “old and slow” Elisp took 2.564695 seconds to find the solution and that’s without byte compiling it. A nearly identical compiled CL version took 0.579200 seconds so Emacs/Elisp didn’t do badly at all.

A final question: The JavaScript solution of the problem at the problem statement link takes about 10 minutes to run. Granted that’s in the browser and they are displaying things as they go along but that still seems too long. I looked at the JavaScript but I’m not fluent enough to figure out what they’re doing. If any JavaScript gurus read this and want to spend a couple minutes figuring out how they solve the problem, leave a comment.

Solution To The Add-A-Gram Challenge

Recent Posts

Links

Archives

Categories

Meta