Emacs Regular Expressions

Jamie Zawinski notwithstanding, I find regular expressions extraordinarily powerful and useful. Most of us in software engineering use them daily and almost all editors support them. One of the complaints I hear all the time is that Emacs regular expressions are not Perl compatible and therefore hard to learn and use or something.

They are (slightly) different from Perl’s implementation but guess what. Perl regular expressions are different from the original grep / egrep flavors as well and yet we all somehow learned to use Perl’s version. Probably the difference that causes the most problems is parentheses. In most regular expression systems, parentheses are used for grouping and a literal parenthesis must be escaped as </code> or <code>. Emacs inverts this usage because parentheses are so common in Lisp. For me, the most annoying issue is having to use things like [[:digit:]] instead of \d.

In any event, Xah Lee has come to the rescue with his recently updated Emacs Regex Tutorial. He points out the differences and how to deal with Emacs specific issues such as case folding. It’s a short tutorial and well worth taking the time to read.

This entry was posted in General and tagged . Bookmark the permalink.
  • I get that there is no distinction between code & data in eLisp, but DSLs are one point where a little distinction would have been a good thing.

    Python's triple quote mechanism is a triumph of pragmatism over purity, IMO.

    • jcs

      Sorry, I'm not sure I see your point. Are you saying that the choice of when to escape parentheses was made for Lisp purity reasons? If so, let me know so that I can respond on point.

  • Phil

    Escaped parentheses for grouping is BRE syntax, and I always assumed Emacs had just built on that basis. It's not obvious to me that it's anything to do with lisp using parens. I've just failed in my brief attempt to research this, though, so I really have no idea.

    • jcs

      I'm pretty sure I remember reading somewhere---the Elisp documentation, I think---that Emacs inverted the "normal" escaping rule because Emacs was used for writing Lisp a lot and therefore had a lot of parentheses in text and it was easier to escape the grouping case instead. I was waiting for Smitty to verify that that was what he was questioning before I put any time in researching the matter.

      That said I, who do almost all my programming in some dialect of Lisp, end up escaping almost all parentheses in my regex templates so it's not clear the choice was a good one.

  • Osmo Salomaa

    It's also possible to just forget about Emacs's syntax. I find pcre-mode works fine.

    https://github.com/joddie/pcre2el#pcre-mode-experimental