A Puzzle

After stumbling onto Justin Heyes-Jones’ compile-command tip that I wrote about previously, I decided to trawl through JustinHJ’s Coding Blog to see what other goodies I could find. One interesting post, Word numbers programming puzzle, discusses one of the ITA candidate puzzles from before they were acquired by Google. These were non-trivial programming challenges that candidates were to submit with their applications.

This particular challenge asks you to generate all the numbers from 1 to 999,999,999 as English words, sort them, and then find the 50,000,000,000 character excluding spaces, commas, dashes, and the word “and”. When Justin tried the obvious solution, he quickly ran out of heap space and even smaller problems took considerable run time. That’s not surprising. Even from the statement of the problem we know that the resulting sorted text exceeds 50 gigabytes, far more memory than anyone reading this blog is apt to have available.

It’s pretty clear that to solve the problem you have to generate each of the numbers in alphabetical order and feed them to the counting routine one at a time. My idea was to use 3 lists of 999 entries each. One list contained the numbers one to ninehundredninetynine in alphabetical order. The seconds list was generated from the first by appending “thousand” to each entry and resorting it. The third list was like the second but with “million” appended.

Most of the numbers are of the form: XmillionYthousandZ so it’s tempting to write something like

(dolist (m third-list)
  (dolist (t second-list)
    (dolist (o first-list)
      (count (concatenate 'string m t o)))))

but unfortunately a small number (2,997,000) of the words don’t follow that pattern. You’ve got numbers like ten and fourmillionthree, so you’ve got to generate them separately and in the right order. After some careful thought (and false starts) I realized that these numbers occur in two ways: without a “million” prefix and after a “million” prefix but without any “thousand” term. The numbers 1 through 999,999 are the first case and the numbers 1 through 999 are the second. To deal with the first case I made a new list by merging the first and second lists. The second case is handled by reusing the first list.

With that introduction, here is the code

(defparameter *word-count* 0)       ;count of generated number words
(defparameter *ones* nil)           ;number words 1..999 sorted
(defparameter *thousands* nil)      ;number words 1..999 thousand sorted
(defparameter *millions* nil)       ;number words 1..999 million sorted
(defparameter *prefixless* nil)     ;number words with no million prefix

(defun make-char-counter (goal)
  "Report the goalth character of the generated number words."
  (let ((cnt 0))
    (lambda (p pl w)
      (incf cnt (+ pl (cdr w)))
      (when (>= cnt goal)
        (let* ((word (concatenate 'string p (car w)))
               (word-len (length word))
               (goal-char (+ (- word-len cnt 1) goal)))
          (format t "The goal character is ~c~%"
                  (subseq word goal-char (1+ goal-char)))
          (throw 'done t))))))

(defun run-prefixless (action)
  "Generate a prefixless word."
  (if (numberp (cdar *prefixless*))
      (funcall action "" 0 (pop *prefixless*))
      (run-single-thousand action (car (pop *prefixless*)))))

(defun run-ones (action prefix)
  "Add the 1..999 suffixes in alphabetical order."
  (let ((prefix-len (length prefix)))
    (dolist (w *ones*)
      (funcall action prefix prefix-len w))))

(defun run-single-thousand (action prefix)
  "Generate all the words for Xthousand..Xthousandninehundredninetynine."
  (funcall action prefix (length prefix) '("" . 0))
  (run-ones action prefix))

(defun run-thousands (action prefix thousands ones)
  "Generate all the thousands for a particular million."
  (unless (null thousands)
    (while (and ones (string-lessp (caar ones) (car thousands)))
      (funcall action prefix (length prefix) (pop ones)))
    (let ((prefix (concatenate 'string prefix (car thousands))))
      (run-single-thousand action prefix))
    (run-thousands action prefix (cdr thousands) ones)))

(defun run-millions (action millions)
  "Generate all the number words in alphabetical order."
  (unless (null millions)
    (let ((prefix (car millions)))
      (while (and *prefixless* (string-lessp (caar *prefixless*) prefix))
        (run-prefixless action))
      (funcall action prefix (length prefix) '("" . 0))
      (run-thousands action prefix *thousands* *ones*))
    (run-millions action (cdr millions)))
  (while *prefixless*
    (run-prefixless action)))

(defun generate-words (&optional (i 1) (w nil))
  "Generate one..ninehundredninetynine in alphabetical order."
  (if (< i 1000)
      (generate-words
       (1+ i)
       (cons (remove-if-not #'alpha-char-p (format nil "~R" i)) w))
      (sort w #'string-lessp)))

(defun generate (action)
  "Initialize globals and call run-millions to generate the number words."
  (let ((words (generate-words)))
    (setq *ones* (mapcar (lambda (w) (cons w (length w))) words))
    (setq *thousands* (sort (mapcar
                             (lambda (w) (concatenate 'string w "thousand"))
                             words) #'string-lessp))
    (setq *millions* (sort (mapcar
                            (lambda (w) (concatenate 'string w "million"))
                            words) #'string-lessp)))
  (setq *prefixless* (merge 'list (copy-list *ones*)
                            (mapcar (lambda (w) (cons w t)) *thousands*)
                            #'string-lessp :key #'car))
  (setq *word-count* 0)
  (catch 'done (run-millions action *millions*)))

The main function is generate, which you can think of as a machine that generates each number word in the correct order and passes them to the action function. It starts out by generating the four lists I talked about in the introduction and then calls run-millions to get things rolling. The run-millions routine implements the idea of the three nested dolists from the introduction but also looks for those numbers with no “million” part. Similarly, run-thousands takes care of adding the “thousand” component but also checks to see if it’s time to put in one of the numbers without a “thousand” component.

The action function that counts the characters is returned by make-char-counter. When it finds that the character count has exceeded the goal it looks at the current word and backtracks to the correct character. It outputs that character and then throws an exception to exit the program.

None of the code is very complicated so I’ll leave it to those interested to dig out the details. One point that’s not obvious is why I passed in the action function. That’s because I had several debugging functions that I used during development. For example, make-checker returns a function that checks that the generated words are in alphabetical order and that the right number of words is generated.

(defun make-checker ()
  "Ensure the correct number of words are generated words and in order."
  (let ((last ""))
    (lambda (p pl w)
      (declare (ignore pl))
      (let ((num (concatenate 'string p (car w))))
        (unless (string-lessp last num)
          (cerror "~s >= ~s~%" 'abort last num))
        (setf last num)
        (incf *word-count*)))))

Interestingly, checking that the generator outputs the correct words takes much longer to run than the actual character counting. With make-char-counter ((generator (make-char-counter 50000000000))), the run time is just under 12 seconds (on a 2.66 GHz Intel Core i5 iMac). Running with make-checker ((generator (make-checker))) takes a bit more than 11 minutes, 15 seconds. That’s not too surprising since most of the lengths are precalculated so there’s not much work to do for finding the 50 billionth character but the checker has to concatenate the three parts of each word and then do a string compare.

Update: Rereading Justin’s post, I see that I solved a slightly different problem. I’ll leave it as an exercise to the reader, as they say, to make the changes necessary to solve the actual problem. Since the hard part of generating those numbers one at a time in alphabetical order is already done, the necessary changes are pretty easy.

This entry was posted in Programming and tagged . Bookmark the permalink.