The Many Faces of Regex

One of a programmer’s most useful tools, Jamie Zawinski notwithstanding, is regular expressions. In a strictly Lisp world, s-expressions solve an astounding number of problems but in the real world of mixed technologies regexes are an incredibly useful and necessary tool. The problem is that there is no single “standard regex” syntax.

Even in the Unix world, where regular expressions made their first beachhead, there are two versions of regulars expressions: basic and extended. Today there are probably dozens of versions of regular expressions but for most of us there are really only four:

  • Unix basic regular expressions
  • Unix extended regular expressions
  • Perl regular expressions
  • Emacs regular expressions

Of the four, Perl regular expressions are probably the richest variety but they all have their advantages. In an ideal world grep, egrep, perl, and Emacs would all support Perl regular expressions plus whatever useful features the others have. Sadly, that is not the case so we must deal with multiple versions.

For the weak minded proprietor of Irreal, it’s always a challenge to keep the differences in mind. One of the ways that I’ve used to overcome this is Xah Lee’s Regex Tutorial. Lee has recently updated this so I’m mentioning it again for the benefit of those who missed my previous discussion of it. I have it bookmarked and if you, like me, need some help remembering the details of the various subtypes, you should too.

This entry was posted in Programming. Bookmark the permalink.