The best we can do is size up the chances, calculate the risks involved, estimate our ability to deal with them, and then make our plans with confidence. (Henry Ford)
The greatest of all gifts is the power to estimate things at their true worth. (François de la Rochefoucauld)
Estimating the size, effort, complexity and cost of software projects is possibly the most difficult task in all of Software Development and Project Management. Even estimating the time required to complete seemingly small and straightforward tasks can be annoyingly, or even dangerously difficult to do. Software Project Success is determined in large part by the ability of the team to meet stakeholder expectations. To be predictive, you need data and most prediction models typically use historical data as the basis of their forecasts. Stock market analysis and weather forecasting are classic examples. In spite of mountains of historical data, advanced algorithms and supercomputers to perform the calculations, weather forecasters are accurate less than 50% of the time. There is even more historical data, incredibly sophisticated algorithms and and nearly the same computing power available to market analysts. In the markets, success is somewhat better for some than others, and fortunately, to be successful in the stock market, you need only be right more often than you are wrong. Some win very big, but most, however, struggle to hit that 50% mark.
There are aspects unique to software development that makes software estimation inherently difficult and different from other forms of forecasting. Some of the reasons for this are obvious while many are not. When I set out to write this essay, I planned to speak in depth about the reasons for this, but soon realized (not without a bit of irony) that I had seriously underestimated the scope and complexity of the task. This topic alone could easily be the basis of an entire book, and there is a lot of information available on the web if you are really interested. So instead of focusing on why we fail so miserably at estimating software development effort, I will simply focus on a purely pragmatic view of how we can do a better job of it in Agile projects. That being said, here are just a few of the factors that can impact the accuracy of your estimates:
- In 1979 Kahneman and Tversky found that human judgment is generally optimistic due to overconfidence and insufficient consideration of distributional information about outcomes. Further, risks typically are underestimated and benefits overestimated. This is a human bias resulting from our “inside view” of the project.
- There is a human tendency for us to want to please others and in so doing we bias our estimates optimistically in the name of pleasing our stakeholders. Changing the estimate does not, however, change the amount of work that needs to be done, and in the longer term, shortcuts rarely turn out to be shortcuts. Ultimately we disappoint the stakeholders by delivering late.
- External forces (budget, customer deadlines, completion, etc…) pressure us to complete things as quickly as possible. Again, this is an external pressure that creates internal tensions which the team tries to satisfy. It usually distorts the estimates, but rarely changes the reality.
- Things change – the project requirements shift, the client needs more features to be added or removed.
- Technical uncertainty causes us to sometimes take a wrong path and have to redo work.
- The tools and technologies are constantly changing, causing developers to continually learn and adapt to the latest releases. In each new release of tools that they use from vendors, they will encounter bugs that are fixed, potentially causing old workarounds to break, while at the same time introducing new bugs.
- Interruptions and distractions affect productivity: Noisy workplace, ineffective meetings, poor lighting, uncomfortable seating, inefficient processes, etc…
- In many, if not most cases, there is no baseline. How can you estimate how long it will take you to do something that you’ve never done before? This leads to the question: If you’ve done it before, why are you doing it again?
- The tasks are often too many, too large and too complex, and with too many interdependencies to fully understand their implications.
- Software is part science, but a large part of it is art. Because a lot of it is art, creativity and productivity of individual team members will vary dramatically and the quality and quantity of their input will vary correspondingly. Somewhat counter intuitively, the productivity is not related to levels of education or years of experience. Given two individuals with essentially identical education and work experience, researchers have measured differences in productivity of as much as 100 times. These are the intangibles of insight, creativity and commitment, and they are far more important than education or certification, but nearly impossible to measure.
One of the most significant influences affecting accuracy of estimates is illustrated in the following example.
Focused attention distorts perceptions of time. I remember years ago reading about a study that was performed on highly trained fighter pilots to determine how their ability to estimate the passing of time was affected by their degree of mental focus applied to the tasks they were performing. As I recall, the test was set up so that the pilots were placed in flight simulators and the people running the test started a stopwatch and requested that the pilot indicate when ten minutes had passed. There were several scenarios of varying complexity that required pilot engagement ranging from routine in-flight functions like communication with the control tower to full air combat simulation. As the complexity of the tasks and corresponding need for focused attention increased, their perception of time became increasingly and dramatically distorted. In the simple task tests, the pilots routinely estimated the duration within a few seconds accuracy. At the extreme end (full air combat simulation) the estimates were dramatically off – sometimes in excess of 300%. In other words, when the greatest attention and focus was required, highly trained pilots let as much as 30 minutes pass, while believing that only ten minutes had elapsed.
Productive programming requires similar levels of focused concentration. Thus as a programmer’s focus and corresponding productivity increase, their ability to determine how long it takes to do something declines. For anyone who has done extensive software development this effect is clearly evident, yet very difficult to compensate for in estimates as it is unique to each individual. My wife has for years witnessed me disappear into “The Zone” and has come to refer to this effect as “Programmer Time” vs. “Real Time”. The irony in this situation is that we want developers to spend as much time as possible in The Zone where their productivity is maximized, but while in The Zone, their estimates of time are dramatically distorted. We then somehow expect them to use this hugely distorted perception of time as the basis for their estimates.
What Can We Do About It?
Clearly we cannot contain and/or compensate for many (if not most) of the factors that influence the accuracy of our estimates. That being said, we still need to have some degree of predictability in our work; “I don’t know” is not a good enough answer. Obviously the larger and more complex the task we are trying to estimate, the more challenging it will be to produce an accurate estimate. We also want to make sure that we pay the greatest attention to the things that are most important to the success of our project. Fortunately Agile helps us in this respect because appropriate use of the methodology decomposes a project into small units of work, and by definition, focuses our effort and attention on the things that yield the greatest value to the users of the software. Agile practitioners and Scrum practitioners in particular have proposed a number of scales for calibrating estimated effort in projects including:
- Ranking effort on a scale of one to three – one being the smallest, and three being the largest.
- Using a Fibonacci Sequence [1, 2, 3, 5, 8]. A Story ranked as an eight is a Story that is too large to accurately estimate and should likely be classified as an Epic and decomposed into a smaller set of Stories.
There are other methods, but these are the two most common ones that I have encountered. Of note in both cases, the estimates are not produced in terms of units of time. Rather they are merely expressions of Relative Effort. There are several good reasons for this approach, but principally it is recognition of the variations of team dynamics, experience and productivity. For example: Using the Fibonacci Sequence scale, a task ranked as a five for a highly efficient and very experienced developer might take one day to complete whereas it might take a junior developer five days to complete. Alternatively, the same time differences may exist between two senior developers with very similar experience. This may have nothing to do with the overall aptitude of the individuals, but may be due to a personal problem solving style that is more effective in that specific instance. Or one developer may have solved a similar problem in the past that caused the solution of this particular problem to be obvious. Relative Effort is thus a good comparative yardstick.
While both of these methods are effective and widely used, I believe they do not take into account the underlying elements that affect effort and uncertainty. I have thus developed a different model that I find to be very effective. This model is also consistent with the way I develop rankings of Stories, Defects and Risk. If you’ve already read my articles on these topics you will know where this is headed, so here we go…
Developing The Effort Matrix
As I enumerated (in part) above by the multitude of factors that affect our ability to accurately estimate effort, it is clear that accurate estimation requires a multidimensional view to produce accurate and effective estimates. The challenge, however, is which dimensions do we measure? If we were to classify the possibilities using a SWOT (see my article on managing Risk: Five Simple Steps To Agile Risk Management) according to Internal vs. External influences, we can eliminate many of the candidates by simply focusing our attention on the things over which we have influence and conversely paying less attention to those that we can’t. I also keep the vectors to two so as to keep the process as simple as possible so that we actually use the process and don’t try to sidestep it because it is too cumbersome. Using two vectors also maintains a consistency with the other areas of the methodology. Here is what I’ve found works best:
Story size is an estimate of the relative scale of the work in terms of actual development effort.
The wording provided here is a suggestion. Develop wording that works best for your team. Remember that these are guidelines – not rules.
Complexity (Story and Technical)
This is complexity of either or both the requirements of the Story and or its technical complexity. Complexity introduces uncertainty to the estimate – more complexity means more uncertainty.
The wording provided here is a suggestion. Develop wording that works best for your team. Not all factors listed need to be met – remember, these are guidelines, not rules.
Using these two vectors, I determine effort using the following simple formula:
Effort = Complexity x Size
Are The Estimates Accurate?
How do you know if you are doing a good job of estimating Effort? The answer is simple, but it may require a bit of work to come up with the answer. It is a straightforward exercise in basic statistics but without very much math. To determine if you doing a good job of estimating, you need to look for two key things:
- Distribution: This is a measure of how well the distribution of your estimates map to a bell curve.
- Velocity: A measure of how many units of Effort are completed per sprint as compared to how many units of Effort were forecast. Velocity is a large topic on its own and will not be discussed in this article. The process of calculating and comparing Velocity to forecasts is part of a larger Calibration process. Calibration is the feedback loop used in the setting of baseline metrics that are at the core of making all of your processes and estimates more predictable. We will use external mechanisms (to be discussed in a future article) to calibrate the Intrinsic measure of Relative Effort (as determined by the team) and use an Extrinsic view to measure Velocity (Relative Effort Compared to Time) and turn it into Time Estimates. This will help us offset The Zone effect in our time estimates and improve the accuracy. Look for a discussion on this in the next few weeks.
If you are estimating well, and your Stories are scoped appropriately, then there should be a distribution of Effort approximating the distribution of the classic Bell Curve. While one could go into tremendous detail calculating Standard Deviation and performing other analysis, what we are looking for here is a simple litmus test to act as a reality check on our estimates. This is a quick and easy exercise with a spreadsheet to gather the raw data and chart it. Your distribution should look something like this.
What we are looking for is a clustering of estimates in the range of [4..15]. Ideally, you do not want to have anything in the [1,2,20,25] ranges.
The two most significant factors in how your estimates will be distributed are:
- Appropriate scoping of stories: If your stories are written so that your distribution is weighted in the [10..20] range, your Stories are likely too large, and or too complex. If the distribution is weighted in the [2..9] range, then they are likely too small or not sufficiently complex. In either case, work through the Stories in successive iterations (Sprints) to re-scope and refine them. The sweet spot is to try and have your stories scoped so that the vast majority of your Stories scale in the [4..15] range.
- Accuracy of your estimates: This article defines the structure for managing Estimates. An upcoming article will describe the actual real-world mechanism for determining individual estimates. Look for it in the weeks to come.
Every team and every organization will be unique and everything in this article should be considered a guideline and suggestive rather than prescriptive. As the person leading your team, you will have to work with the team to find the balance point that works best for accurate calibration of your team. In both examples listed above, use an iterative process to discover and refine the mechanisms that work best for your organization.
As always, I look forward to your comments.