Xah's Challenge

A couple of days ago Xah Lee posed a programming challenge. The problem was to write a miniparser in your favorite language to detect mismatched bracketing characters in a set of text files. The set of brackets to be checked were: () {} [] “” ‹› «» 【】 〈〉 《》 「」 『』. His solution is here.

I solved this problem in two “different” ways, both in Emacs lisp. This post is my first solution. I'll post the second solution tomorrow.

This solution is similar to Lee's but written in a more functional style. The strategy is to hunt down the brackets with search-forward-regexp, push the opening brackets onto a stack, and check that each closing bracket matches the top of the stack. If it does, we pop the stack and keep going. If it doesn't, we record the error and write it to an error buffer. If there is anything left on the stack when we finish, that's also an error because there is an unmatched opening bracket.

The main function, check-file-for-mismatches uses a complicated regular expression to locate the brackets and an alist to type the brackets and identify the matching bracket. In a production system, these would be computed before run time to save on execution time. As Paul Graham has said, the nice thing about Lisp is that all of Lisp is available all the time. That means that we can use Lisp as we are writing the code to compute the regexp and alist. In this example, I compute them at run time so that you can see how I did it.

We start with the list of brackets:

(defvar brackets '(("(" . ")")
                   ("{" . "}")
                   ("[" . "]")
                   ("“" . "”")
                   ("‹" . "›")
                   ("«" . "»")
                   ("【" . "】")
                   ("〈" . "〉")
                   ("《" . "》")
                   ("「" . "」")
                   ("『" . "』")))

and then compute the regexp with:

(defun make-br-regexp ()
  "Make a regexp to check for any of the delimiters in brackets"
  (mapconcat (lambda (bp)
               (concat (regexp-quote (car bp)) "\\|" (regexp-quote (cdr bp))))
             brackets "\\|"))

and the alist with:

(defun make-br-alist ()
  "Make an alist of the brackets. Each entry is of the form:
bracket open-or-close matching-bracket"
  (reduce #'append (mapcar (lambda (bp)
                             (list (list (car bp) 'open (cdr bp))
                                   (list (cdr bp) 'close (car bp))))
                           brackets)))

Each entry in the alist has the form (bracket type matching-bracket) where type is either open or close and indicates whether the bracket is an opening or closing bracket.

As I said, all of this could be done at coding time and the results made into constants. Instead we define a couple of constants on the fly:

(defconst br-regexp (make-br-regexp))
(defconst br-alist (make-br-alist))

All the work is done in check-file-for-mismatches.

 1:  (defun check-file-for-mismatches (fpath)
 2:    "Check a file for mismatched brackets."
 3:    (let ((fb (get-buffer-create "*Temp*"))
 4:          (stack nil)
 5:          (mismatches nil))
 6:      (set-buffer fb)
 7:      (insert-file-contents fpath nil nil nil t)
 8:      (goto-char 1)
 9:      (while (search-forward-regexp br-regexp nil t)
10:        (let* ((bpos (point))
11:               (brk (buffer-substring-no-properties (1- bpos) bpos))
12:               (blist (assoc brk br-alist))
13:               (btype (cadr blist)))
14:          (cond
15:           ((eq btype 'open)  (push (cons brk (1- bpos)) stack))
16:           ((eq btype 'close) (if (or (null stack)
17:                                      (not (string= (caar stack) (caddr blist))))
18:                                  (push (1- bpos) mismatches)
19:                                (pop stack)))
20:           (t (error "FAIL--found %s instead of bracket" brk)))))
21:      (if mismatches
22:          (mapc (lambda (e)
23:                  (print (format "Mismatch in %s at position %d"
24:                                 fpath e)))
25:                (reverse mismatches)))
26:      (if stack
27:          (mapc (lambda (e)
28:                  (print (format "%s has an unclosed %s at position %d"
29:                                 fpath (car e) (cdr e))))
30:                (reverse stack)))
31:      (kill-buffer fb)))
32:  

The actual checking occurs in the while loop on lines 9–20. We find a bracket with the regexp search, record its position and type and then check to see if it's an opening or closing bracket. For opening brackets, we just push it on the stack. For closing brackets we check for the matching opening bracket on the stack and record an error if it's not a match.

On lines 21–25, we print any mismatches that we found in the file. On lines 26–30 we print an error message for any open brackets still on the stack. Almost half of check-file-for-mismatches is concerned with error reporting. It's worth noting that after the first error the parsing tends to get out of sync and you can get cascading errors but the error report tells you what happened so that you can fix the real errors and rerun the program as a final check.

To run check-file-for-mismatches we need a function to feed it the files and establish the error buffer. The sample function, do-files that I show below checks files with the extension txt in the ~/elisp directory. What you actually do depends on how you are selecting your files.

(defun do-files ()
  "Check a list of files for bracket mismatches"
  (interactive)
  (with-output-to-temp-buffer "*Mismatch Report*"
    (mapc #'check-file-for-mismatches (directory-files "~/elisp" t "\.txt$"))))

Many of the ideas for these functions come from Lee's Emacs Lisp Tutorial. This page, in particular, has several useful templates for handling files and buffers in a way similar to what we've done here.

This entry was posted in Programming and tagged . Bookmark the permalink.