Goodhart's Law

Kontra has a useful reminder:

We see applications of this all the time. One famous example is the notion that we should use standardized tests to measure teacher/school effectiveness and base compensation/budgets appropriately. It’s one of those ideas that sounds good when you first hear it—who isn’t for rewarding more effective teachers, after all—but in reality those being measured soon learn how to game the system and we end up with “teaching to the test.”

We see it in our own industry every time some bean counter decides to evaluate programmer productivity by counting lines of code committed or some equally silly measure. Whatever the metric, programmers soon learn to maximize it even if the result is less and poorer quality work getting done.

The idea that making some measurement a target results in the measurement becoming useless is so common that it even has a name, Goodhart’s Law. It’s one of those things that everyone knows but always fail to take into consideration when starting out to measure some human activity with the aim of affecting policy in some way that matters to those being measured.

This entry was posted in General. Bookmark the permalink.
  • davidmanheim

    I think we often discount the phenomenon that Goodhart suggested in the context of economic policy, but metrics are an effective way to track and incentify behavior in many domains, and they work well when deployed carefully. Clearly the claim is overstated, and I would argue that the tendency to over-universalize this "Law" is because people don't appreciate the dynamics of how it occurs.

    I discuss the dynamics here;
    And I clarify what I see as the largest contributor, unclear / underspecified goals, here;

    • jcs

      I'm not seeing how we disagree in any essential way. (The popular version of) Goodhart's law is very often true precisely because what we measure is not actually what we want to optimize (as you say in your second article). The two cases I mentioned are notorious examples of this.

      Everyone knows who the effective teachers and productive programmers are but it appears to be impossible, as a practical matter, to derive an analytical measure for them. As a result, we measure proxies for what we are trying to optimize (test scores and lines of code) and Goodhart's law exerts itself.

      Irreal readers are, I would guess, mostly scientists and engineers so we all value measurement and for many of us it's a sine qua non. Nonetheless, as soon as you use a measurement to affect the person being measured, you can be sure that whatever is being measured will be optimized by the subjects of the measurement. It's fine to say, "Then we need to measure the right thing," but that turns out to be a practical impossibility in many (most?) cases.

      • davidmanheim

        I do think we mostly agree, but I disagree that it is impossible to build useful analytical measures for programmers or teachers. That's why I think a fuller discussion of Goodhart's law was useful - it gives us ways around the problem.

        For example, reducing the problem of complexity in what is being measured can have pernicious effects, but it can also have salutary ones; we usually want to reduce the complexity of problems to be solved, and by carefully reducing them, we can incentify work more properly.

        For programmers, for example, TLOC is dumb - but using code execution time, resource uses, and memory efficiency as KPIs for a program is still a good idea. Similarly, if you are concerned about code readability and test cases, you can incentify them directly via peer or outside review. The use of properly specified goals has limited the application of metrics - we no longer have a single number to judge - but it still can properly incentify the things we want done.