Unicode in AWK

A few days ago I wrote about the excellent video of David Brailsford and Brian Kernighan discussing AWK and its history. In the video, Kernighan mentions that he’s been working on enabling Unicode in the One True AWK. Here’s a pull request from Kernighan showing that he’s mostly accomplished that goal.

At one level, it’s easy to believe that it’s basically a trivial change but as AWK demonstrates it’s not always so easy. Probably the hardest thing is fixing AWK’s regex parser to accept and deal with Unicode. But even simple things like calculating the length of strings can be a problem.

When AWK was first developed—and long afterwards—ASCII was sufficient. These days, it’s a real imposition to deal with an app that doesn’t support Unicode. Kernighan’s porting AWK to support it will ensure that AWK will continue to be a useful tool not only for English people speaking people but for those who speak languages with non-ASCII characters as well.

If you’re a young engineer, the idea of open source and having access to the source code to your tools seems unexceptional. That’s just the way it is. But AWK comes from a time when that wasn’t true. It’s great to see the original AWK still available and still under development. AWK and its developers are a national treasure that we should all be thankful for.

This entry was posted in General and tagged . Bookmark the permalink.