Sacha Chua introduced me to regexp-opt
, an Emacs function that I hadn’t seen before. The idea is that regexp-opt
will take a list of strings and return a more or less optimal regexp to recognize any of those strings.
Because it does analysis and restructuring of the list of strings, regexp-opt
will generally produce a more efficient regexp than the equivalent
(mapconcat 'regexp-quote strings "\\|")
In addition to the list of strings, regexp-opt
takes an optional second argument that if not nil
will surround the regexp with grouping parentheses. If the second argument is 'words
, then regexp-opt
also surrounds the regexp with \\<
and \\>
.
Here’s an example from the source code for regexp-opt
that shows how it optimizes a regexp.
(let ((strings '("cond" "if" "when" "unless" "while" "let" "let*" "progn" "prog1" "prog2" "save-restriction" "save-excursion" "save-window-excursion" "save-current-buffer" "save-match-data" "catch" "throw" "unwind-protect" "condition-case"))) (concat "(" (regexp-opt strings t) "\\>")) => "(\\(c\\(atch\\|ond\\(ition-case\\)?\\)\\|if\\|let\\*?\\|prog[12n]\\|save-\\(current-buffer\\|excursion\\|match-data\\|restriction\\|window-excursion\\)\\|throw\\|un\\(less\\|wind-protect\\)\\|wh\\(en\\|ile\\)\\)\\>"
This regexp takes approximately two-thirds the amount of time that the equivalent mapconcat
regexp does.
Simon Marshall, the code’s author, notes that the package was written to produce efficient regexps, not to produce regexps efficiently and therefore you should avoid using it in-line too much unless you use eval-when-compile
. See the commentary in regexp-opt.el
for more on this. You can, of course, easily get to the code from the function description by clicking on the file name.
Update: Fixed mapconcat
example.