blogstrapping

Project Activity != Project Health

Especially since the advent of DVCS focused code sharing sites like GitHub and Mercurial, but even before that point, people seem obsessed with the level of development activity for any given open source software project. I believe this is increasingly becoming a big deal for people because of the fact that sites like this provide easily accessible measures of project activity. When metrics are easy to access and provide simple statistical data that appear to correlate strongly with desirable or undesirable characteristics of some situation, people tend to latch onto those metrics as if they provide the One True Measure of Quality.

This plays itself out everywhere. If two locales happen to have antithetical legal approaches to a given issue that is not itself synonymous with criminal activity (drug use, abortion, ownership of firearms, gay marriage, pick your poison), and those two locales also happen to differ substantially in crime rates, hordes of people will assume that one difference must necessarily be the cause of the other difference. Such people forget, or perhaps never realized, that simple correlation does not imply causation. Try telling them that, though, and they will accuse you of biases, stupidity, and anything else they can think of to dismiss your arguments rather than think about them.

While I have no direct proof of the fact, and am essentially just speculating, I believe quite strongly that in any case where an intuitively comprehensible metric is presented in a clear manner, people will overstimate the importance of that metric for making decisions about matters that may or may not actually be related. In short:

Easy access to metrics leads to overvaluation of metrics.

This has been amply illustrated by the way that corporate middle managers the world over have wasted uncounted man-years of effort trying to find the One True Metric for accurate measurement of programmer productivity. The canonical example, now thoroughly debunked amongst programmers who really know their stuff (but still held in the highest esteem of many petty bureaucrats), is the case of lines of code per time period. If you write five hundred lines of code per day, you must be a good programmer!

I touched on this before, in another venue. In lines spent, I quoted legendary computer scientist Edsger W. Dijkstra's On the cruelty of really teaching computing science, wherein he said:

From there it is only a small step to measuring "programmer productivity" in terms of "number of lines of code produced per month". This is a very costly measuring unit because it encourages the writing of insipid code, but today I am less interested in how foolish a unit it is from even a pure business point of view. My point today is that, if we wish to count lines of code, we should not regard them as "lines produced" but as "lines spent": the current conventional wisdom is so foolish as to book that count on the wrong side of the ledger.

Clearly, measuring programmer effectiveness in terms of "lines spent" alone is also a foolish approach, but Dijkstra's point is well-made. For a given block of functionality, assuming your code is well-written so as to be readable and maintainable, all else being equal, fewer lines of code is generally better. The simple words "lines spent" sum that up nicely; it is better to spend less, as long as you get optimal returns on your investment.

On the other hand, if you assume minimal lines of code for maximal functionality, all else being equal, you come back to a point where writing more lines of code means you are a more productive programmer. The problem is that there is no simple way to measure all of that. The only thing you can measure clearly in a single metric is lines of code -- and all those corporate middle managers are constantly looking for the simplest possible, single-factor metric they can use to judge what underlings to reward (or at least leave alone) and which to punish or fire. Thus, the moment "lines of code" enters discussion, many managers immediately latch onto that as the One True Measure of Quality. Once again, I find myself considering this simple statement:

Easy access to metrics leads to overvaluation of metrics.

The fact that sites like GitHub and BitBucket make project activity immediately obvious to people browsing the site is no different from the ease of counting lines of code in a corporate setting, in this regard. People tend to leap all too quickly to the conclusion that the rapidity of project commits somehow has a nearly 1:1 relationship to project health and software quality.

I was tempted to title this essay thusly:

Project Activity Metrics Considered Harmful

Fortunately, I came to my senses. Pithy it may be, but it is also sensational and, strictly speaking, inaccurate. The true harm comes from the simplistic thought processes of people who do not think much about the fact that the world is usually not so one-dimensional. Even cartoons are two-dimensional, making the tendency of many GitHub users in particular to judge project quality and health on project activity metrics worse than cartoonishly wrong.

I recently commented at Hacker News, somewhat popularly (22 upvotes as of this writing), on a counter-example to the unsophisticated idea that Project Activity == Project Health:

Y'know, there are some projects that haven't had commits for a while because they do exactly what they're supposed to do, and don't need a bunch of commits. I consider it a good thing when a project gets to the point where nobody can find any bugs and they stop adding features because it has enough of them already. I sure as hell don't want my quick, clean, elegant, productivity enhancing window manager to turn into a featuritis infected monstrosity like OpenOffice.org, after all.

I then added to the comment a little while later:

edit: Actually, my window manager of choice has basically been abandoned by its developer several years ago, and it's still bug-free, stable, and lacking basically nothing. I finally came up with a feature enhancement I'd like it to have today, after using it for five years with no complaints or wants -- and that enhancement is really just an improvement of an existing feature.

More to the point, it's an enhancement that no window manager has, so I'm likely to need to write the code for this feature enhancement myself if I want it badly enough. I may pick up maintainership for the project to do just that.

Think about that a moment: one feature enhancement requested in [half] a decade, merely an extension of an already existing feature, and it's something entirely new to window managers as far as I'm aware. This thing has had no need for additional code in all that time, and it has been better as a result of all that.

Do not mislead yourself with simplistic metrics. Do not assume that because one project sees less activity than another it is "sick" or otherwise less worthy. The same goes for feature counts, the percentage of commits that come from one source as opposed to another, or even user base.

These factors can certainly be relevant, but no single metric (nor even two of them) will likely provide a clear picture of the health and quality of a particular project.

Beware the single metric mindset.