Reusability and NIH

Peter Donis has a nice post discussing a programming challenge given to Knuth and critiqued by McIlroy. Don Knuth, certainly one the premier computer scientists (period) was asked to demonstrate literate programming by solving a simple problem involving finding the n most frequent words in a given text. Doug McIlroy, who’s also well known but not nearly as much as he should be1, was asked to critique Knuth’s solution. McIlroy’s critique consisted of a 6-line shell script that solved the exact same problem as Knuth’s 10-page Pascal program.

McIlroy’s larger point was about reusability (follow the link to Donis’s post for the details) but what struck me about the story—and made me vaguely uncomfortable because I’m often guilty of this—was how often those of us with the hacker disposition are inclined to jump in and implement a solution to a problem in our $FAVORITE_LANGUAGE rather than use existing functionality as McIlroy did.

It’s amazing how often a simple shell script can solve a problem simply and elegantly. It’s just crazy that we would rather write pages of code rather than simply use a pipeline of a few of the robust and tested Unix utilities. The problem, I think, is that we really love programming and it seems like cheating to solve a problem using a line or two of shell script.

It’s worth remembering that the goal is—most often—solving the problem in the fastest and easiest way. Sometimes that means a simple shell script rather than rolling up our sleeves and cranking out code.

Footnotes:

1 This story is from Dennis Ritchie as told on his home page.

This entry was posted in Programming. Bookmark the permalink.

9 Responses to Reusability and NIH

  1. FMM says:

    It’s not really fair to critique Knuth for the length of his solution. By your own description, he was given the task of demonstrating literate programming. Likely, Knuth was intentionally verbose for the sake of providing plenty of examples.

    The world could honestly do with less “I coulda written that in 4 lines of code” sentiment. Maybe I’ve seen too many “magic” Perl one-liners that are entirely incomprehensible and have dismal runtime complexity.

    • jcs jcs says:

      Donis addresses that issue in his post and comments. I’m guessing that even McIlroy understood that the critique was, in some sense, unfair but his larger point, as I said, was about reusability. See Donis’s post for more on that.

      I was really just using the post as a springboard to discuss a related but distinct point: the tendency of many of us to use a big glop of custom code in our favorite language rather than leveraging the built in utilities and a few lines of shell script to tie them together.

  2. rdm says:

    Meh…

    Hypothetically speaking, code like this:

    perl -pe ‘s/\W+/ /g’ | fmt -w1 | sort -f | uniq -ic | sort -n | tail -10
    #perl: collapse all non word-forming characters to spaces
    #fmt: break each word ont its own line
    #sort -f: put copies of the same word next to each other
    #uniq -ic: count identical words
    #sort-n: order them with most frequent words appearing last
    #tail -10: pick the last n words (10 in this case)

    should be just as easy to understand as Knuth’s 10 pages of pascal, and perhaps easier.

    That’s probably not what appeared in the critique, but it’s close enough to what’s being described here for my purposes.

    (And, if we want to handle contractions we might update that perl statement to perl -pe “s/[^'\w]+/ /g”)

    And, yes, it’s perfectly possible to write incomprehensible code in perl. But comprehensibility of a statement is a reflection of the author of that statement and not a reflection of the language used by the author (except in the case where the listener is ignorant of the language — but that’s a characteristic of the listener and not of the language).

    • jcs jcs says:

      I haven’t seen the 10 pages of Pascal but I can pretty much guarantee that your code is easier to understand.

      • rdm says:

        After sleeping on this… probably need a | grep . | after fmt, to prevent blank lines from being treated as words. (bug fix)

        Also, I should probably acknowledge that normalizing text (here: replacing sequences of characters which do not represent words with a space, and my treatment of case folding) can be a complex and domain dependent issue. (specification issue)

  3. Kevin says:

    While likely not the case for Knuth, I think that this misses a very important point that play a huge roll in these decisions most of the time: there are lots of pre-existing tools and languages and it’s exceptionally hard to know how to use all of them well or even that most of them exist. I can’t count the number of times I’ve implemented something only to find, later on, that there was a much simpler, more elegant way to deal with it that Googling for the answer before “cranking out code” didn’t turn up.

    • jcs jcs says:

      Yup, and it’s getting harder everyday. I feel you pain, believe me. Still, I think almost everyone knows (or certainly should) the standard Unix utilities.

  4. Am I the only one finding it very weird for programming posts to reference opinions from the 60′s when people were using punched cards to program stuff?

    While all our programming is based on C/Pascal (you name it) – our methodology have changed sharply.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>