If you’re like me, you probably don’t think about globbing very much. It’s just a way of fuzzy matching file names in a manner reminiscent of regular expression matching. We all use it everyday but have you ever thought about its implementation or efficiency?
Russ Cox, of course, has thought about these issues and has posted a really interesting article with his results. He starts with a simple benchmark that measures the time to complete ls (a*)nb
in a directory containing the single file a100
where the exponents represent string repetition. He discovered that this resulted in an exponential run time whereas ls | grep (a*)nb
ran in \(O(n)\) time.
Of course, globbing is done by the shell so he tried the (non-grep) experiment with different shells. Most shells showed exponential time but csh
was linear in \(n\). Cox also discovered that some FTP servers exhibit the same behaviors.
The results are odd because you’d think globbing is a much simpler problem than regex matching. It can be, of course, and Cox gives an example implementation that is both simple and fast.
Be sure to read the article to see all Cox’s results and analysis. As usual with Cox’s posts, it’s really interesting and well worth your time.
UPDATE
: Added link to article.