I love Agile's idea of velocity in theory - that after accumulating a few weeks/months of data, a team can derive the average number of story points it can implement per sprint, and then use this as a basis for knowing both how much it can commit to in the next sprint (short term) and also when the project will be finished (long term).
In practice, however, I believe any calculation of velocity (based in story points) is doomed to be dangerously inaccurate and misleading for either the short or long term. Here's why...
The first problem is that story points are not additive. Mike Cohn touches on (but does not fully address) this in a recent post. Using his example, imagine a team that uses story point buckets of 1, 2, 3, 5, and 8. Further, consider the median number of hours to complete user stories of each size:
Story Points | Median Hours |
---|---|
1 | 21 |
2 | 52 |
3 | 64 |
5 | 100 |
8 | 111 |
Now assume your team's velocity is determined to be 16 points. It would seem that you could pluck any combination of user stories off the backlog (according to the business owner's prioritization, of course!), and be fairly confident that you could complete this work so long as the stories all sum to 16 points. It's easy to illustrate, however, why this doesn't work. For example, take just two different combinations of stories:
Story Combination | Actual Hours |
---|---|
8, 8 = 16 | 111 + 111 = 222 |
5, 3, 2, 2, 2, 1, 1 = 16 | 100 + 64 + 52 + 52 + 52 + 21 + 21 = 362 |
Note that although the story points both sum to 16, the actual hours differ significantly: 222 hours for the first combination of stories and 362 for the second. This is not a negligible amount! It could very realistically cause a delay in a release (or alternatively, a copious amount of slack time for the development team). Either way, something is wrong.
The root problem here is in the correspondence of story points to actual hours. To be able to calculate velocity, it must be the case that story points not only preserve a relative ordering, but their numeric ratios as well. In other words, it must be true that two 1 point stories equals the same amount of effort as one 2 point story. Unfortunately, story points in practice are little more than ordinal values - e.g. we can rely on a 2 point story being bigger than a 1 point story, but not necessarily a 2 pointer being twice as big as a 1 pointer.
In general, the worse this correspondence between story points and actual hours is on your project, the more unreliable your velocity will be. To ensure that the correspondence is consistent, therefore, a project manager must be vigilant about calculating these statistics (as in the table above) and then presenting them to the team, so that the team can best adjust their estimates (for example, using the data above, if the team wanted to keep its notion of a "1", then it would need to adjust down its assessment of a "2").
Assume then that the correspondence between story points and actual hours is perfect. There's still a second problem: teams very seldom track actual hours. Instead, "actuals", as in the example above, are most often derived from the hour-based task estimates given during the planning phase to tasks. For example, a simple "Add order" user story might be broken into three tasks, and each of these tasks would be given hour estimates prior to starting the sprint.
- Implement UI (24 hours)
- Implement business logic (10 hours)
- Test (4 hours)
Summing all these tasks, the project manager would get the total number of hours for that story, which is then used to burn down from. The problem is, of course, these hour estimates seldom map to how long these tasks actually took, and so using these "actual hours" to measure the correspondence of story points to actual hours will be spurious. Again, velocity is chimerical.
Finally, a third problem, as a number of experts have pointed out, is that a team's assessment of story points adjust over time. For example, what the team used to think of as a "2" a few months back may now be considered a "3". This type of adjustment is fine, again, if story points are only to be used as relative values, but if they are used to calculate velocity, it simply destroys any hope for accuracy, because it is simply not mathematically valid to add them together.
As an analogy, if half-way through your sophomore year your college switched from a 4 point scale to a 5 point scale, they obviously couldn't calculate your GPA by just adding all your grades together and dividing by the number of units. The scale changed! Again, the point is that because story points aren't additive, velocity just doesn't work.
In the end, I completely understand that story points and velocity were designed specifically to be a low-ceremony estimates, and further that the purpose of agile is to spend more time delivering working software and less time on producing plans for delivering software. That's great. However, there seems to be a very pervasive and dogmatic belief in the robustness of velocity, and it's just not empirically warranted. The idea is great in concept, but it's based on a the faulty premise: that story points can be used like numbers.
Personally, I severely doubt that velocity can be used reliably for long term forecasting, and definitely not for sprint planning. Just my thoughts - what do you think?