Using Code Metrics with Purpose

I know plenty of developers who, at a tactical level, have had success with static source code analysis tools, using them to help find and root out bad code smells. When PMD tells us there's an empty catch block at line 207, for instance, it's pretty straight-forward what to do.

At an aggregate level, however, code metrics are seldom so helpful or straight-forward. When seeing that a source tree has 160,000 lines of code or an average cyclomatic complexity of 4.12, our first thought is usually "interesting!", followed shortly by "well, now what?".

The problem is, in my experience, we often look at our code metrics in isolation, without good comparison points, leaving us to wonder whether the numbers we see are big or small, typical or abnormal, good or bad. In the end, it's not clear what to do, if anything.

For code metrics to have any real strategic value, whether it's justifying a major refactoring effort to the executive team or just proving that the recent adoption of some new practice (e.g. code reviews, automated testing, etc.) actually had the salubrious effect that you said it would, they must be looked at within some broader context. And there must be a purpose. Otherwise they're just interesting numbers, no more no less.

So, to that end, here's a very simple, low-overhead approach I've taken on a few projects, with some success'

Define a Purpose

If code metrics are to actually be useful and not just interesting, there must be some clear business-relevant question that you're trying answering. Here are a few examples:

What is the pace of development? Are we moving slower or faster than before?
Is the code inherently difficult to maintain? Are there modules that are hopeless?
Should we expect greater or fewer bugs in this iteration? In general, is quality improving or getting worse?

Pick which Code Metrics to Use

Once it's clear the questions you're trying to answer, the right metrics and tools can then be picked to help build your case. Here are a few metrics and tools (Java-based) to consider, but there are plenty more:

Lines of code (JavaNCSS, Cloc)
Code complexity (JavaNCSS, PMD, CheckStyle)
Unit Test Coverage (Cobertura)
Copy-paste instances (PMD)
Style violations (CheckStyle)
Design violations (PMD, CheckStyle, FindBugs)
Possible bugs (FindBugs)
Number of check-ins

The important thing, again, is that these metrics map to some business question you're interested in. Less is more, in my experience - it's very easy to get lost in a sea of code statistics, so spend your time only on numbers that could have some practical impact, avoiding those that are inexact, nebulous, inaccurate, or just generally not useful.

Track Code Metrics over Time

Very often your question is time-oriented - i.e. "are things getting better or worse"? If that's the case, then metrics can be tracked over time, either by iteration, sprint, or release, or by day, week, or month.

And although there are a few tools that do this aggregation for you (metric_fu for Ruby, QALab or PMDReports for Java, etc.), I've found that few give me exactly what I need with little setup or configuration hassle. In my explorations, I've found it easier to just use the tools I need separately, and aggregate the metrics (dare I say it!) manually in a spreadsheet. For example, two metrics I often keep track of are lines of code and average cyclomatic complexity.

For calculating the lines of code I use a nifty tool called Cloc (which take no more than 5 minutes to set up):


  C:> cloc codebasemyprojsrc

...which gives me something like this:

For the complexity statistics, I use JavaNCSS plus a simple program I wrote to filter out the getters and setters (since these can significantly skew obscure complexity numbers, making your system look a lot simpler than it probably is!):


  C: javancss -function -recursive c:codebasemyprojsrc | java FilterGettersAndSetters

...which gives me this:

Setting up these tools (downloading, putting things in the path and classpath, etc.) takes maybe a half an hour, and then each time I run them, just seconds. Typically, I'll do this at the end of every sprint, put the metrics in a spread sheet, and then generate a handy graph that I can send to the team or use in a strategy discussion or status report.

Note that without the context of time, seeing a complexity score of 3.5 alone doesn't mean much. Knowing, however, that complexity increased from 2.4 in the last sprint to 3.5 in this sprint does - it means something happened, whether too much functionality was fit into the most recent sprint resulting in a slip in quality, or perhaps a new junior developer joined the project, and probably needs to have his code reviewed.

Compare Code Metrics across Projects

Another way to give your metrics meaning is to compare them against other projects - whether internal to your organization ("look, project A has 12% unit test coverage rate whereas project B has 78%!") or even with other open source projects ("our project has an average cyclomatic complexity of 4.57, but Spring, Hibernate, and Struts all have less than 2.5"). Although your sample size probably won't be statistically significant, it can give a good rough benchmark to help you determine what's acceptable and what merits concern.

Note: these are just hypothetical numbers to show how metrics could be used! Again, with little effort, these metrics can be collected manually using simple static code analysis tools, as long as you know what you're looking for.

Conclusion

In the end, code metrics alone won't tell the whole story, but used thoughtfully and purposefully, they can be invaluable when making strategic product or project decisions - for example, helping to illuminate maintainability problem areas in your architecture, predict where and when quality slippages would be expected, and, in general, determine whether the ship is going in the right direction or not.

Anyway, I'd like to hear about your experiences with using code metrics. Have aggregate statistics helped in any way? Which ones? How have you used them? And which tools have been useful?