Slicing Words With Elisp

I was playing around with another ITA hiring puzzle today and needed to “slice” a sequence of words. That is, I wanted to form a list of strings consisting of the first letters of each word, the second letters of each word and so on. For example

hand lest more → hlm aeo nsr dte

Think of writing the words down in rows

hand
lest
more

and forming the slices by reading down each column.

This turns out to be simpler to do in Elisp than in Common Lisp (although it’s not really hard in CL). The thing that makes it easy is the string function, which takes a sequence of characters and turns them into a string:

ELISP> (string ?a ?b ?c)
"abc"

Common Lisp also has a string function but it behaves differently. Here’s how to slice words in Elisp

ELISP> (map 'list #'string "hand" "lest" "more")
("hlm" "aeo" "nsr" "dte")

This is nothing earth-shattering, of course, but it does serve as a nice example of the conciseness of Lisp in general and of Elisp in particular.

Posted in Programming | Tagged , | 1 Comment

Solution To The Add-A-Gram Challenge

Last week, I issued a challenge to solve the Add-A-Gram puzzle using Emacs and Elisp. The puzzle statement is here.

This is an interesting problem that’s easy to get wrong.The solution seems straightforward:

  1. Start with a 3-letter word (car, say).
  2. For each letter of the alphabet, check if the original letters (a, c, r) plus the new letter can be rearranged to make a new word in the dictionary. If it can, remember it for later processing. If not, remember the length of what you’ve got so far.
  3. For each remembered word, repeat the process starting at step 2.

Here’s the guts of that plan in Common Lisp

(defun one-word (word)
  (let (stack)
    (push (list word) stack)
    (while stack
      (let* ((chain (pop stack))
             (w (car chain)))
        (dolist (l *letters*)  ; ("a" "b" "c" ... "z")
          (let* ((trial (concatenate 'string w l))
                (anagram (get-anagram trial)))
            (if anagram
                (push (cons anagram chain) stack)
                (longest chain))))))))

The get-anagram function returns a word in the dictionary having the same letters as the string given as its argument. The call to longest just records the length of the current chain and the chain itself if it’s longer than the previously remembered chain. Here’s an example run

CL-USER> (progn (one-word "car") (show-longest))
("trisoctahedrons" "orchestrations" "orchestration" "retroactions"
"retractions" "carrotiest" "retroacts" "tractors" "cottars" "tracts"
"tract" "cart" "car")

showing the chain from “car” to “trisoctahedrons” for a length of 15. The rest of the code just goes through the dictionary line by line looking for 3-letter words and calls one-word with each one it finds. That seems simple enough except that finding the chain starting with “car” took 40.2 seconds on my 4-core i5 iMac. The longest chain takes 58.6 seconds. There are 978 3-letter words in the dictionary. Finding the longest chain with this method took just under 5 hours with compiled Common Lisp.

The problem with this method is that it examines every possibility, even if it starts with the 3-letter word for the longest chain. To find a better solution, we must find a method that

  1. Stops as soon as it finds the solution.
  2. Starts with the most likely candidates for that solution.

To see how to do that, ask yourself, “What’s the best I can possibly do?” The answer is ethylenediaminetetraacetates, the longest word in the dictionary. Can we build a chain from a 3-letter word to ethylenediaminetetraacetates? If we try each 3-letter starting point, we’re right back to original solution. Instead, let’s turn the question upside down and ask if we can build a chain down from ethylenediaminetetraacetates to some 3-letter word. If we can’t, then try with the second longest word and so on.

To help with that, I sorted the dictionary by decreasing word size. I could have done this inside Emacs, I suppose, but I needed it for the CL version too so I just used

awk '{print length(), $0 |"sort -nr"}' WORD.LST | cut -d ' ' -f 2 >sorted-words

The rest is pretty easy. Here’s the entire code

(require 'cl)

(defun make-key (str)
  "Return a string with the same characters as STR but in sorted order"
  (coerce (sort (coerce str 'list) #'<) 'string))

(defun get-anagram (str)
  "Return an anagram of the letters in STR"
  (gethash (make-key str) anagrams))

(defun load-anagrams ()
  "Build the anagram hash"
  (beginning-of-buffer)
  (while (search-forward-regexp "\\(\\w+\\)" nil t)
    (puthash (make-key (match-string 1)) (match-string 1) anagrams)))

(defun down (chain)
  "Try to build a chain down to a 3 letter word"
  (if (= (length (car chain)) 3)
      (throw 'done (mapcar 'get-anagram chain))
    (let* ((word (car chain))
           (uniques (coerce (remove-duplicates word) 'list)))
      (dolist (c uniques)
        (let ((new-word (remove* c word :count 1)))
          (if (get-anagram new-word)
              (down (cons new-word chain))))))))

(defun solve (dict)
  "Call down with every word in DICT until it finds a solution"
  (let ((anagrams (make-hash-table :test 'equal :size 17500)))
    (with-temp-buffer
      (insert-file-contents dict)
      (load-anagrams)
      (beginning-of-buffer)
      (print (catch 'done
         (while (search-forward-regexp "\\(\\w+\\)" nil t)
           (down (list (match-string 1)))))))))

All the real work is done in down and is pretty much the opposite of what we did in the first solution:

  1. Start with the target (ethylenediaminetetraacetates, say).
  2. For each unique letter in the target, remove one copy of that letter from the target and check if you can make a new word. If you can, recursively call down with the shortened word.
  3. If you get down to 3 letters, quit immediately with a throw, returning the chain.
  4. If you can’t get down to 3 letters, get the next longest word from the dictionary and start again.

The solve function loads the dictionary into a temporary buffer, builds the anagram hash, and calls down for each word in the dictionary until a solution is found. When the throw is taken and catch returns, we know we have a solution because nothing longer worked and the rest of the words in the dictionary are no longer than the current word being tested.

Here’s the run

ELISP> (solve "~/Desktop/sorted-words")
("ion" "into" "niton" "nitons" "anoints" "enations" "antinoise"
"antimonies" "inseminator" "terminations" "antimodernist"
"determinations" "underestimation" "underestiations")

Notice that the solution is not unique because we record only 1 anagram for each key. If we chose other anagrams, we would have gotten different words but the length would be the same.

The important thing to notice here, though, is how much work Emacs does. Rather than writing a loop to read in each line of the dictionary (twice) as I had to do in the CL version of this solution, I just used insert-file-contents to load it into a buffer once and for all. This is what Xah Lee calls leveraging the power to Emacs to make these types of chores easier. The problem itself is not something that Emacs is supposed to deal with; we just use the facilities it provides to make our job easier.

By the way, “old and slow” Elisp took 2.564695 seconds to find the solution and that’s without byte compiling it. A nearly identical compiled CL version took 0.579200 seconds so Emacs/Elisp didn’t do badly at all.

A final question: The JavaScript solution of the problem at the problem statement link takes about 10 minutes to run. Granted that’s in the browser and they are displaying things as they go along but that still seems too long. I looked at the JavaScript but I’m not fluent enough to figure out what they’re doing. If any JavaScript gurus read this and want to spend a couple minutes figuring out how they solve the problem, leave a comment.

Posted in Programming | Tagged , , | Leave a comment

Two From Garret

I was trawling through Ron Garret’s site (see yesterday’s post) and came upon two excellent short papers. The first, The Idiot’s Guide to Special Variables discusses the difference between lexical and special (or dynamic) variables. The second, The Idiot’s Guide to Common Lisp Packages, discusses CL packages, what they are, how they operate, and the many non-obvious pitfalls you can find yourself in when using them.

These are both excellent little tutorials and despite the Idiot in the titles they are not at all trivial. If you’re a CL user, you might find them helpful.

Posted in Programming | Tagged | Leave a comment

Lisp At JPL

A long time ago, I read and enjoyed Ron Garret’s story of using Lisp at the Jet Propulsion Laboratory. Now, he retells the story in a Google Talk that is well worth watching. Part of the story explains how they were able to debug a problem when the DS1 spacecraft was a 10 or 45 (it’s not clear from the video) light minutes round trip away because the Lisp image that was running the program had a REPL.

Another story involved the Galileo magnetometer that was controlled by a Forth program. One of the bytes on the computer running Forth went bad so the engineers had to modify the program to avoid using the bad byte. The problem was that the program was originally developed using an Apple II Forth environment and the Apple II systems had long since been decommissioned. They solved this problem by recreating the entire Apple II Forth environment in Lisp.

If you enjoy hearing Lisp stories or stories about space flight and exploration then you should invest 70 minutes and watch the video.

Posted in Programming | Tagged | Leave a comment

Clozure Common Lisp 1.8

I took advantage of the weekend to upgrade my machines to Clozure CL 1.8. As usual, installation was a snap. Just get the distribution with SVN:

svn co http://svn.clozure.com/publicsvn/openmcl/release/1.8/darwinx86/ccl

for Mac OS X and then

ccl64 -n
? (ccl:rebuild-ccl :full t)

and you’re done. Users of other platforms will need to change darwinx86 to the directory appropriate for their systems as explained on the Getting Clozure CL page.

Lisp programs compiled with CCL 1.8 run a bit faster than those compiled with CCL 1.7 and, of course, there are various bug fixes and other clean up as described in the release notes.

I’ve only started using it but so far I’m happy with what I see. There are no obvious incompatibilities except for a couple of improvements to the directory function, and a minor change to the way the kernel handles a single command line argument that makes it easier to call standalone binaries. With performance enhancements and bug fixes, there’s not much to dislike.

Posted in Programming | Tagged | Leave a comment

Elisp Challenge No. 2

The other day, I found a reference to the Add-A-Gram puzzle, which is one of the retired challenges from ITA Software for prospective employees. Today’s challenge is to solve the Add-A-Gram puzzle with Emacs/Elisp. I’ll give a solution next week.

Posted in Programming | Tagged , | 1 Comment

Weitz On Macros

Edi Weitz has a very nice set of notes on The Power of Lisp Macros. They’re from a talk he gave at freiheit.com and are aimed at the beginning Lisper. They’re amazingly easy to follow even though you aren’t hearing the accompanying commentary. It’s definitely worth a few minutes of your time.

Weitz is, of course, the author of much Common Lisp open source software including the indispensable CL-PPCRE that brings industrial strength Perl compatible regular expressions to Common Lisp. There’s lots of stuff on his site so it’s definitely worth a visit.

Posted in Programming | Tagged | Leave a comment

Org Mode And Living Documents

Vsevolod Dyomkin over at Lisp, The Universe and Everything has another in his series of interviews with Lisp hackers. This time it’s with Christophe Rhodes, the principal maintainer of SBCL. It’s an interesting interview and well worth a read so head on over.

One thing that Rhodes said that really resonated with me (probably because I feel the same) is this little bit about Org mode

“I have in the last couple of years fallen in love with org-mode and
in particular with “reproducible research”, or living documents; the
ability to have executable sections of code for generating results,
which can then themselves be further analysed by other bits of code —
in other languages, if that’s appropriate — and the whole exported as
a document typeset to what I consider to be high-quality standards.
The fact that there is also acceptable-quality web output from the
exact same process is gravy.”

There’s just something great about having a single document that contains all your notes, code to generate diagrams or compute results, and markup that can generate high quality typeset documents or HTML.

If any Emacs users out there have not yet tried Org mode, I urge you to do so without delay. There’s a compact guide to get you started as well as a full manual. Both are available as PDFs as well. Also, Bernt Hansen’s excellent Org Mode—Organize Your Life In Plain Text! is a great example of how to leverage Org mode. I found it very helpful as a go by when I was learning Org mode.

Posted in General | Tagged , | 1 Comment

The TSA (Again)

Yesterday I wrote that I don’t often post about the TSA but the latest news is just too good to pass up. Congress is holding hearings on the TSA so naturally they invited Bruce Schneier to testify. That only makes sense: he’s an acknowledged expert in the security field and has written widely on airport security and the TSA.

Except that the TSA was afraid of the heat and asked that he be removed from the witness list. The politicians, craven as always, complied.

Posted in General | Tagged , | Leave a comment

Huh?

Why the Programming Language C is Obsolete. C obsolete? Really? Someone forgot to tell Linus. Bjarne Stroustrup is clearly a smart guy and by all accounts a nice guy so I hate to call his baby ugly but Damn! that’s one ugly baby.

At one point in my career I tried really hard to like C++ but I could never convince myself that it was anything other than a disaster. Perhaps I’m not smart enough to learn C++ adequately or to see its manifest superiority because it still looks ugly to me. One advantage of writing in C rather than a Lisp-like language or Python or Ruby is that you can get almost down to the metal. Lots of fine grained control but, of course, you pay for that by having to worry about memory management and other annoying details. Another advantage, for me, is that I have an excellent mental model of what code will be generated for a given C statement and can use that model to pick between alternative C constructs for a particular task.

C++ adds complexity and destroys my mental model but still requires me to worry about low level details like memory management. How is this an improvement? Object Oriented Programming I hear you say. Except that it’s not clear that the benefits are worth the costs as far as C++ goes. I know, I know, Apostasy! Still, I can’t help but agree with Linus on this.

Posted in General | 1 Comment