Russ Cox has an excellent post on managing software dependencies. He begins by describing the problem: we’ve gone from talking about but not implementing code reuse to the wholesale use of packages/libraries/modules that we often don’t know much about. Most often these dependencies are downloaded from the Internet. All too often, these dependencies are integrated into projects without any analysis or investigation. This is, Cox says, like hiring a programmer you’ve never heard of and about whom you know nothing. The results of this are well known within the programmer community and Cox mentions a few.
The rest of post concerns strategies for dealing with the problem. Sadly, the solutions are difficult to implement and require a large commitment of engineering time and resources. They involve things like:
- Investigating the history of the project that produced the package.
- Examining the code for obvious red flags and quality.
- Making sure the project uses regression tests and running them yourself.
- Identifying the package’s dependencies and iterating the process on them.
- Writing and running your own tests that focus on your use of the package.
and other strategies.
The problem is ongoing. Every time the package is upgraded you have to rerun the regression tests and look at the diffs to make sure it’s still doing what you need it to.
Finally, Cox considers technical solutions to the problem. By this he means things like enhancing package managers to track subdependencies and enhancing compilers to embed a manifest of dependencies and their version numbers in the binary.
As usual with Cox, the post is interesting and thought provoking. If you’re working on large projects that use external libraries you should definitely read it. Of course, implementing his recommendations is hard. Even Cox admits that they often don’t so at Google because it’s so much work and takes so long.