Almost all Emacs users are familiar with Steve Yegge’s Effective Emacs post. If you’re an Emacs user and haven’t read it, stop what you’re doing and go read it right now. As it happens, Yegge wrote several posts about Emacs and one of my favorites, which I’ve just reread is The Emacs Problem.
That post begins with a claim by one of Yegge’s colleagues that Lisp is not a good language for text processing, that Emacs goes to show that, and that Emacs should be rewritten with Ruby as the interpreter. I know, I know; me too. But Yegge examines that claim seriously and in the process generates an wonderfully readable case that nothing but Lisp is really any good at text processing.
He begins by noting that when we think of text processing we almost always think of regular expressions as the main tool. He notes that Lispers are generally skeptical of regexps and say things like, “Regexps aren’t useful for tree structured data and why are you storing your data as text in the first place instead of s-expressions (that is, as Lisp)?” Then he goes into a comedy routine about log files and how they’re usually just a line or two of text that is most easily processed with regexps and that those Lisp losers appear not to know that.
But then he says that he just noticed that his java.util.logging
output had suddenly changed to being XML (this was with Java 1.5 in 2005). He gives an example of a short text-based log entry and the corresponding entry in XML and concedes that sometimes the extra metadata in the XML can be useful and that tools like Xpath
provide a powerful way of processing and querying XML data.
Next, he gives the same data as an sexpr. It’s shorter, clearer, and much easier to read than the XML and can be operated on by Xpath
-like tools available for Lisp. In fact, you can think of the data as executable and write functions or macros that can make it transform or process itself. He then identifies the Text-Processing Conundrum:
- You want to be able to store and process text data.
- Doing this effectively requires the data be tree structured in all but trivial cases.
- The only good, general tool for doing this is XML.
- XML processing is supposed to be easy but rapidly becomes complex when you start using tools like
XSLT
andXQuery
or worse yet write your own transformations using aSAX
orDOM
processor in the language of your choice. - But those are your only options.
Unless you’re using Lisp. With Lisp, data is code and so you can store your data as a Lisp program. Querying and transforming it are almost trivial.
He goes on to consider other text processing such as configuration files and finds that the same principles apply only more so. By writing your configuration file in Lisp, it ceases being a configuration file and becomes part of your program.
Finally, he considers the question of rewriting Emacs in some other language. He goes through all the usual reasons why that’s not practical (at least until guile becomes the interpreter for Emacs if it ever does) and recites the usual litany of political problems surrounding Emacs and RMS.
To my mind, though, he misses the main point. Why would you want to write it in something besides a Lisp-like language. As he amply demonstrates in the first part of the post, Lisp is a great language for text processing and the only one without serious problems. Sure, a lot of the text that Emacs deals with isn’t tree structured, but a lot of it is and Emacs has a powerful regexp system for the text that isn’t. Just read through Xah Lee’s excellent series of articles on text processing with Emacs, for instance. A recurring theme in Lee’s posts is text-soup automation in which he uses Elisp to transform arbitrary text.
Yegge’s post is excellent and if you haven’t read it, you’re missing out on a treat. Also be sure to read the comments. He develops some of the ideas in the post further in responses to the commenters.