The Emacs regexp-opt Function

Sacha Chua introduced me to regexp-opt, an Emacs function that I hadn’t seen before. The idea is that regexp-opt will take a list of strings and return a more or less optimal regexp to recognize any of those strings.

Because it does analysis and restructuring of the list of strings, regexp-opt will generally produce a more efficient regexp than the equivalent

(mapconcat 'regexp-quote strings "\\|")

In addition to the list of strings, regexp-opt takes an optional second argument that if not nil will surround the regexp with grouping parentheses. If the second argument is 'words, then regexp-opt also surrounds the regexp with \\< and \\>.

Here’s an example from the source code for regexp-opt that shows how it optimizes a regexp.

(let ((strings '("cond" "if" "when" "unless" "while"
                    "let" "let*" "progn" "prog1" "prog2"
                    "save-restriction" "save-excursion" "save-window-excursion"
                    "save-current-buffer" "save-match-data"
                    "catch" "throw" "unwind-protect" "condition-case")))
  (concat "(" (regexp-opt strings t) "\\>"))
 => "(\\(c\\(atch\\|ond\\(ition-case\\)?\\)\\|if\\|let\\*?\\|prog[12n]\\|save-\\(current-buffer\\|excursion\\|match-data\\|restriction\\|window-excursion\\)\\|throw\\|un\\(less\\|wind-protect\\)\\|wh\\(en\\|ile\\)\\)\\>"

This regexp takes approximately two-thirds the amount of time that the equivalent mapconcat regexp does.

Simon Marshall, the code’s author, notes that the package was written to produce efficient regexps, not to produce regexps efficiently and therefore you should avoid using it in-line too much unless you use eval-when-compile. See the commentary in regexp-opt.el for more on this. You can, of course, easily get to the code from the function description by clicking on the file name.

Update: Fixed mapconcat example.

This entry was posted in Programming and tagged , . Bookmark the permalink.