Converting S-Expressions To XML In Emacs

My last two posts mentioned the ease with which s-expressions can be converted to XML and vice versa. Out of curiosity, I decided to write an sexpr to XML function in Elisp.

The implementation consists of two functions. The first, sexpr->xml, does the actual conversion, while the second, convert-to-xml, takes care of Emacs bookkeeping details and acts as a driver. For convenience, convert-to-xml takes an sexpr as its input but it would be easy to have it operate on a region or even a whole buffer.

We'll use the example sexpr from Yegge's The Emacs Problem post. Note that, again for simplicity, I am dealing with only a single record but everything would work exactly the same if there were multiple records wrapped in a '(log …) sexpr.

  (date "2005-02-21T18:57:39")
  (millis 1109041059800)
  (sequence 1)
  (logger nil)
  (level 'SEVERE)
  (class "java.util.logging.LogManager$RootLogger")
  (method 'log)
  (thread 10)
  (message "A very very bad thing has happened!")
    (message "java.lang.Exception")
      (class "logtest")
      (method 'main)
      (line 30))))

The converter is very simple.

1:  (defun sexpr->xml (sexpr)
2:    (let ((tag (car sexpr)))
3:      (princ (format "<%s>" tag))
4:      (dolist (o (cdr sexpr))
5:        (if (atom o)
6:            (princ (format "%s " o))
7:          (sexpr->xml o)))
8:      (princ (format "</%s>" tag))))

It's passed an sexpr whose first symbol is the XML tag. That gets saved away in the let on line 2 and printed as the opening tag on line 3. The dolist loop in lines 4–7 looks at each of the other objects in the sexpr. If an object is not another sexpr, it is printed with a trailing space. If it is another sexpr, sexpr->xml is called recursively on line 7 to process it. When all the objects in the sexpr have been processed, the end tag is printed on line 8.

The output of sexpr->xml is a single line of XML with no formatting at all. Also, any nils will appear explicitly in the XML instead of the tag pair being empty and all the quoted symbols will be wrapped in <quote></quote> tags because the lisp reader turns 'symbol into (quote symbol).

Now let's look at the driver function:

1:  (defun convert-to-xml (sexpr)
2:    (with-output-to-temp-buffer "*XML*"
3:      (sexpr->xml sexpr)
4:      (set-buffer "*XML*")
5:      (xml-mode)
6:      (replace-regexp "\\bnil\\b\\|<quote>\\|</quote>" "" nil (point-min) (point-max))
7:      (sgml-pretty-print (point-min) (point-max))))

Lines 2 and 3 call sexpr->xml and arrange for its output to go into a buffer named *XML*. When sexpr->xml returns, the *XML* buffer is selected and set to xml-mode (really nxml-mode). The replace-regexp on line 6 deletes any occurrences of nil and gets rid of <quote> and </quote> tags. Finally, sgml-pretty-print is called to format the XML nicely.

That's a lot of work for not very much code. We could, of course, take care of the formatting and fixing up the nil and <quote> problems right in sexpr->xml but I wanted to show how simple the conversion can be without a lot of busy details. Besides, we have the power of Emacs so it would foolish not to use it.

The final result of running convert-to-xml on our sample sexpr is

  <message>A very very bad thing has happened!

Any Elisp coders out there might want to consider what a translator in the other direction (XML to s-expressions) would look like. Not having s-expressions to leverage makes the processing a little more difficult but not by much.

This entry was posted in Programming and tagged , . Bookmark the permalink.
  • Boyd Adamson

    Hmm… if we do go the other way, then maybe we could edit html with paredit… sweet :)

    • jcs

      Heh heh. Yes, that was exactly my thought. If I had to deal with XML log files, the first thing I would do would be to convert them to sexprs and deal with them in Lisp. Happily, that's pretty easy to do..

      As for HTML, Paul Graham has already shown us the utility of going from sexrs to HTML but I'm not sure I see the case for going in the opposite direction except as a sort of decompile. OTOH, maybe that's enough reason.