The pcre2el Package

In my post the other day on Mike Zamansky’s Dired video, I meant to mention one of the packages he talked about but I forgot. That package is pcre2el, a package to, among other things, convert Perl compatible regular expressions to Emacs-style regular expressions. Many people have trouble with Emacs’ regular expressions because of their syntax. They were originally optimized for working with Lisp so some of the escaping rules are just the opposite of Perl’s (and other) regular expression systems. For example, in most regex systems, parentheses are used for grouping and if you want a literal parenthesis, you have to escape it. Because parentheses are ubiquitous in Lisp, the rule is exactly the reverse for Emacs regexes. These days, of course, most people aren’t writing in Lisp so these rules don’t appear to make sense and are annoying.

For those who find this confusing or annoying, the pcre2el package offers a partial solution. The package is essentially a translator that parses Perl regexes and translates them into Emacs regexes. The two engines are different; Perl regexes have capabilities that Emacs regexes don’t so the Perl syntax supporting those features can’t be translated. Still, if you’d like to work with the Perl syntax instead of switching back and forth, this package may be just what you need. Take a look at Zamansky’s video to see an example of it in action.

This entry was posted in General and tagged . Bookmark the permalink.
  • Perry Metzger

    I don't think the Emacs RE syntax is like that because it was better for working in lisp, but rather because it corresponds to the old Unix "basic" Unix expressions no one touches any more which were broken in this manner. You still see some hints of this problem in a few places though (like if you still use sed for anything). All sane commands moved to modern regexes decades ago.

    I think it would be a good idea for Emacs to just adopt a modern RE format, keeping the old functions for backward compat but encouraging migration to new ones.

    • jcs

      I don't think the Emacs RE syntax is like that because it was better for working in lisp, but rather because it corresponds to the old Unix "basic" Unix expressions no one touches any more which were broken in this manner.

      Hmm. I could swear that I've read Lisp was the reason for this choice but, of course, I can't find it now. What I did find was this comment from last year in which Phil also suggests that the Emacs choice was left over from BRE regular expressions. As I said in the reply to Phil's comment, even as someone who mostly writes in Lisp these days, I still end up escaping most parentheses so you and Phil are probably right.

      I think it would be a good idea for Emacs to just adopt a modern RE format, keeping the old functions for backward compat but encouraging migration to new ones.

      I agree but I don't know how you fix the syntax problem and still maintain backward compatibility. Probably something like pcre2el, I guess.

      • Perry Metzger

        An alternative is just create a new fleet of functions with slightly distinguished names that use the new RE format and leave the old ones in place. You encourage new apps to use the new ones and let users pick the new ones as the back ends for things like isearch on regexps.