Estimating Effort For Your Agile Stories

Determining the Effort required to complete an Agile Story
Wrong Way - Go Back
Wrong Way – Go Back (flickr – naz’s stuff)

The best we can do is size up the chances, calculate the risks involved, estimate our ability to deal with them, and then make our plans with confidence. (Henry Ford)

The greatest of all gifts is the power to estimate things at their true worth. (François de la Rochefoucauld)

The Problem

Estimation is the calculated approximation of a result which is usable even if input data may be incomplete or uncertain. (Wikipedia)

Estimating the size, effort, complexity and cost of software projects is possibly the most difficult task in all of Software Development and Project Management. Even estimating the time required to complete seemingly small and straightforward tasks can be annoyingly, or even dangerously difficult to do. Software Project Success is determined in large part by the ability of the team to meet stakeholder expectations. To be predictive, you need data and most prediction models typically use historical data as the basis of their forecasts. Stock market analysis and weather forecasting are classic examples. In spite of mountains of historical data, advanced algorithms and supercomputers to perform the calculations, weather forecasters are accurate less than 50% of the time. There is even more historical data, incredibly sophisticated algorithms and and nearly the same computing power available to market analysts. In the markets, success is somewhat better for some than others, and fortunately, to be successful in the stock market, you need only be right more often than you are wrong. Some win very big, but most, however, struggle to hit that 50% mark.

There are aspects unique to software development that makes software estimation inherently difficult and different from other forms of forecasting. Some of the reasons for this are obvious while many are not. When I set out to write this essay, I planned to speak in depth about the reasons for this, but soon realized (not without a bit of irony) that I had seriously underestimated the scope and complexity of the task. This topic alone could easily be the basis of an entire book, and there is a lot of information available on the web if you are really interested. So instead of focusing on why we fail so miserably at estimating software development effort, I will simply focus on a purely pragmatic view of how we can do a better job of it in Agile projects. That being said, here are just a few of the factors that can impact the accuracy of your estimates:

  • In 1979 Kahneman and Tversky found that human judgment is generally optimistic due to overconfidence and insufficient consideration of distributional information about outcomes. Further, risks typically are underestimated and benefits overestimated. This is a human bias resulting from our “inside view” of the project.
  • There is a human tendency for us to want to please others and in so doing we bias our estimates optimistically in the name of pleasing our stakeholders. Changing the estimate does not, however, change the amount of work that needs to be done, and in the longer term, shortcuts rarely turn out to be shortcuts. Ultimately we disappoint the stakeholders by delivering late.
  • External forces (budget, customer deadlines, completion, etc…) pressure us to complete things as quickly as possible. Again, this is an external pressure that creates internal tensions which the team tries to satisfy. It usually distorts the estimates, but rarely changes the reality.
  • Things change – the project requirements shift, the client needs more features to be added or removed.
  • Technical uncertainty causes us to sometimes take a wrong path and have to redo work.
  • The tools and technologies are constantly changing, causing developers to continually learn and adapt to the latest releases. In each new release of tools that they use from vendors, they will encounter bugs that are fixed, potentially causing old workarounds to break, while at the same time introducing new bugs.
  • Interruptions and distractions affect productivity: Noisy workplace, ineffective meetings, poor lighting, uncomfortable seating, inefficient processes, etc…
  • In many, if not most cases, there is no baseline. How can you estimate how long it will take you to do something that you’ve never done before? This leads to the question: If you’ve done it before, why are you doing it again?
  • The tasks are often too many, too large and too complex, and with too many interdependencies to fully understand their implications.
  • Software is part science, but a large part of it is art. Because a lot of it is art, creativity and productivity of individual team members will vary dramatically and the quality and quantity of their input will vary correspondingly. Somewhat counter intuitively, the productivity is not related to levels of education or years of experience. Given two individuals with essentially identical education and work experience, researchers have measured differences in productivity of as much as 100 times. These are the intangibles of insight, creativity and commitment, and they are far more important than education or certification, but nearly impossible to measure.

One of the most significant influences affecting accuracy of estimates is illustrated in the following example.

Focused attention distorts perceptions of time. I remember years ago reading about a study that was performed on highly trained fighter pilots to determine how their ability to estimate the passing of time was affected by their degree of mental focus applied to the tasks they were performing. As I recall, the test was set up so that the pilots were placed in flight simulators and the people running the test started a stopwatch and requested that the pilot indicate when ten minutes had passed. There were several scenarios of varying complexity that required pilot engagement ranging from routine in-flight functions like communication with the control tower to full air combat simulation. As the complexity of the tasks and corresponding need for focused attention increased, their perception of time became increasingly and dramatically distorted. In the simple task tests, the pilots routinely estimated the duration within a few seconds accuracy. At the extreme end (full air combat simulation) the estimates were dramatically off – sometimes in excess of 300%. In other words, when the greatest attention and focus was required, highly trained pilots let as much as 30 minutes pass, while believing that only ten minutes had elapsed.

Productive programming requires similar levels of focused concentration. Thus as a programmer’s focus and corresponding productivity increase, their ability to determine how long it takes to do something declines. For anyone who has done extensive software development this effect is clearly evident, yet very difficult to compensate for in estimates as it is unique to each individual. My wife has for years witnessed me disappear into “The Zone” and has come to refer to this effect as “Programmer Time” vs. “Real Time”. The irony in this situation is that we want developers to spend as much time as possible in The Zone where their productivity is maximized, but while in The Zone, their estimates of time are dramatically distorted. We then somehow expect them to use this hugely distorted perception of time as the basis for their estimates.

What Can We Do About It?

Clearly we cannot contain and/or compensate for many (if not most) of the factors that influence the accuracy of our estimates. That being said, we still need to have some degree of predictability in our work; “I don’t know” is not a good enough answer. Obviously the larger and more complex the task we are trying to estimate, the more challenging it will be to produce an accurate estimate. We also want to make sure that we pay the greatest attention to the  things that are most important to the success of our project. Fortunately Agile helps us in this respect because appropriate use of the methodology decomposes a project into small units of work, and by definition, focuses our effort and attention on the things that yield the greatest value to the users of the software. Agile practitioners and Scrum practitioners in particular have proposed a number of scales for calibrating estimated effort in projects including:

  • Ranking effort on a scale of one to three – one being the smallest, and three being the largest.
  • Using a Fibonacci Sequence [1, 2, 3, 5, 8]. A Story ranked as an eight is a Story that is too large to accurately estimate and should likely be classified as an Epic and decomposed into a smaller set of Stories.

There are other methods, but these are the two most common ones that I have encountered. Of note in both cases, the estimates are not produced in terms of units of time. Rather they are merely expressions of Relative Effort. There are several good reasons for this approach, but principally it is recognition of the variations of team dynamics, experience and productivity. For example: Using the Fibonacci Sequence scale, a task ranked as a five for a highly efficient and very experienced developer might take one day to complete whereas it might take a junior developer five days to complete. Alternatively, the same time differences may exist between two senior developers with very similar experience. This may have nothing to do with the overall aptitude of the individuals, but may be due to a personal problem solving style that is more effective in that specific instance. Or one developer may have solved a similar problem in the past that caused the solution of this particular problem to be obvious. Relative Effort is thus a good comparative yardstick.

While both of these methods are effective and widely used, I believe they do not take into account the underlying elements that affect effort and uncertainty. I have thus developed a different model that I find to be very effective. This model is also consistent with the way I develop rankings of Stories, Defects and Risk. If you’ve already read my articles on these topics you will know where this is headed, so here we go…

Developing The Effort Matrix

As I enumerated (in part) above by the multitude of factors that affect our ability to accurately estimate effort, it is clear that accurate estimation requires a multidimensional view to produce accurate and effective estimates. The challenge, however, is which dimensions do we measure? If we were to classify the possibilities using a SWOT (see my article on managing Risk: Five Simple Steps To Agile Risk Management) according to Internal vs. External influences, we can eliminate many of the candidates by simply focusing our attention on the things over which we have influence and conversely paying less attention to those that we can’t. I also keep the vectors to two so as to keep the process as simple as possible so that we actually use the process and don’t try to sidestep it because it is too cumbersome. Using two vectors also maintains a consistency with the other areas of the methodology. Here is what I’ve found works best:

Story Size

Story size is an estimate of the relative scale of the work in terms of actual development effort.

Value Guidelines
5
  • An extremely large story
  • Too large to accurately estimate
  • Should almost certainly be broken down into a set of smaller Stories
  • May be a candidate for separation into a new project
4
  • A very large Story
  • Requires the focused effort of a developer for a long period of time – Think in terms of more than a week of work
  • Should consider breaking it down into a set of smaller stories
3
  • A moderately large story
  • Think in terms of two to five days of work
2
  • Think in terms of a roughly a day or two of work
1
  • A very small story representing tiny effort level.
  • Think in terms of only a few hours of work.

The wording provided here is a suggestion. Develop wording that works best for your team. Remember that these are guidelines – not rules.

Complexity (Story and Technical)

This is complexity of either or both the requirements of the Story and or its technical complexity. Complexity introduces uncertainty to the estimate – more complexity means more uncertainty.

Value Guidelines
5
  • Extremely complex
  • Many dependencies on other stories, other systems or subsystems
  • Represents a skill set or experience that is important, but absent in the team
  • Story is difficult to accurately describe
  • Many unknowns
  • Requires significant refactoring
  • Requires extensive research
  • Requires difficult judgement calls
  • Effects of the Story have significant impact external to the story itself
4
  • Very complex
  • Multiple dependencies on other stories, other systems or subsystems
  • Represents a skill set or experience that is important, but not strong in the team
  • Story is somewhat difficult for product owner to accurately describe
  • Multiple unknowns
  • Comparatively large amount of refactoring required
  • Requires research
  • Requires senior level programming skills to complete
  • Requires somewhat difficult judgement calls
  • Effects of the Story have moderate impact external to the story itself
3
  • Moderately complex
  • Moderate number of dependencies on other stories, other systems or subsystems
  • Represents a skill set or experience that is reasonably strong in the team
  • Story is somewhat difficult for owner to accurately describe
  • Moderate level of unknowns
  • Some refactoring may be required
  • Requires intermediate programming skills to complete
  • Requires little research
  • Requires few important judgement calls
  • Effects of the Story have minimal impact external to the story itself
2
  • Easily understood technical and business requirements
  • Little or no research required
  • Few unknowns
  • Little if any research required
  • Requires basic to intermediate programming skills to complete
  • Effects of the Story are almost completely localized to the Story itself
1
  • Very straightforward with few if any unknowns
  • Technical and business requirements very clear with no ambiguity
  • No unknowns
  • No research required
  • Requires basic programming skills to complete
  • Effects of Story are completely localized to the Story itself

The wording provided here is a suggestion. Develop wording that works best for your team. Not all factors listed need to be met – remember, these are guidelines, not rules.

Effort

Using these two vectors, I determine effort using the following simple formula:

Effort = Complexity x Size

Effort Matrix

Effort Matrix (click on image to enlarge)

Are The Estimates Accurate?

The Metrics

How do you know if you are doing a good job of estimating Effort? The answer is simple, but it may require a bit of work to come up with the answer. It is a straightforward exercise in basic statistics but without very much math. To determine if you doing a good job of estimating, you need to look for two key things:

  1. Distribution: This is a measure of how well the distribution of your estimates map to a bell curve.
  2. Velocity: A measure of how many units of Effort are completed per sprint as compared to how many units of Effort were forecast. Velocity is a large topic on its own and will not be discussed in this article. The process of calculating and comparing Velocity to forecasts is part of a larger Calibration process. Calibration is the feedback loop used in the setting of baseline metrics that are at the core of making all of your processes and estimates more predictable. We will use external mechanisms (to be discussed in a future article) to calibrate the Intrinsic measure of Relative Effort (as determined by the team) and use an Extrinsic view to measure Velocity (Relative Effort Compared to Time) and turn it into Time Estimates. This will help us offset The Zone effect in our time estimates and improve the accuracy. Look for a discussion on this in the next few weeks.

Distribution

If you are estimating well, and your Stories are scoped appropriately, then there should be a distribution of Effort approximating the distribution of the classic Bell Curve. While one could go into tremendous detail calculating Standard Deviation and performing other analysis, what we are looking for here is a simple litmus test to act as a reality check on our estimates. This is a quick and easy exercise with a spreadsheet to gather the raw data and chart it. Your distribution should look something like this.

Effort Distribution

Effort Distribution (click on image to enlarge)

What we are looking for is a clustering of estimates in the range of [4..15]. Ideally, you do not want to have anything in the [1,2,20,25] ranges.

The two most significant factors in how your estimates will be distributed are:

  1. Appropriate scoping of stories: If your stories are written so that your distribution is weighted in the [10..20] range, your Stories are likely too large, and or too complex. If the distribution is weighted in the [2..9] range, then they are likely too small or not sufficiently complex. In either case, work through the Stories in successive iterations (Sprints) to re-scope and refine them. The sweet spot is to try and have your stories scoped so that the vast majority of your Stories scale in the  [4..15] range.
  2. Accuracy of your estimates: This article defines the structure for managing Estimates. An upcoming article will describe the actual real-world mechanism for determining individual estimates. Look for it in the weeks to come.

Every team and every organization will be unique and everything in this article should be considered a guideline and suggestive rather than prescriptive. As the person leading your team, you will have to work with the team to find the balance point that works best for accurate calibration of your team. In both examples listed above, use an iterative process to discover and refine the mechanisms that work best for your organization.

As always, I look forward to your comments.

Michael

I’ve been designing and building software and leading software teams for over 20 years. I am the founder of projectyap.com (agile project management that’s social) yapagame.com (social media and micro blogging for sports fans).

Twitter LinkedIn Google+   

Tags: , , , , , , , ,

This entry was posted in Agile, Project Management, Scrum, Software Development, Technology and tagged , , , , , , , , . Bookmark the permalink. Both comments and trackbacks are currently closed.

6 Comments

  1. Michael Thuma
    Posted July 12, 2010 at 5:13 am | Permalink

    The hardest thing to accept – a lot easier to understand – is that complexity, size and ideal days spent are independent from each other outside a box. It will be interesting to see/read what experience you had with velocity.

    In general I would agree. Most of the stories are form 9 to 12, they are less risky.

    Anything that goes beyond or some of those who are considered simple should be rethought if they are really understood or just written down in 2 sentences – because everyone knows. “It is just …” If it really just then it is an evolution of an existing story.

    Anything dark yellow red can requrie (“seperate project”) a more structured approach to satisfy the requirement e.g. more formal design in order to figure out the right questions.

    Assuming a month interation I think this methodolgy can proof in practice for ideal days of a team with a very satisfying velocity.

    Mike

    • Posted July 18, 2010 at 10:54 pm | Permalink

      Michael,

      Thank you for your detailed comments.

      I used to use one month sprints. I now tend to use two week sprints. I think in part the one month sprints was because in moving to Agile, I found it hard to give up the “Big Release” mentality. Two week sprints may not be the ideal in every situation, but I seem to get better results because mid-course corrections occur sooner.

      The next topic(s) will be about calibration and velocity. I’ve not yet decided if this will be in a single post or split into two separate posts. Whichever it is will be post Wednesday or Thursday.

      Michael

  2. Posted July 7, 2010 at 5:13 am | Permalink

    Nice idea, I would like to see it tried in practice. Personally I don’t think it will work as people will think complexity when you ask them to think about size. I have myself tried similar things, but never really gotten them to work. This mulitplication think is commonly found in traditional risk analysis as you know.

    So .. interesting idea … doubting whether it will work in practice. But perhaps?

    • Michael
      Posted July 8, 2010 at 10:11 am | Permalink

      Thank you for your thoughtful comments. The key to getting something like this to work is more related to the social dynamics of the situation than anything else. If the team feels that the information will be used to control them or to be used as a metric to monitor them (both very dehumanizing things) they will resist. If, however, they can see that it will benefit them by making their job easier, more predictable, or they are more likely to receive recognition for a job well done, they are far more likely to embrace a system like this.

      Michael

  3. Posted July 6, 2010 at 8:43 am | Permalink

    Very interesting and concise. I’ve never thought of using a histogram for identifying how small are the tasks. Nice idea.

    I find your posts so well written that you should start a book :) These simple tools are the one managers really need (besides soft skills, of course).

    Keep it up. See you.

    • Michael
      Posted July 6, 2010 at 10:50 pm | Permalink

      Hi Alexandre,

      Thank you very much for the compliments. I’ve never really thought of myself as a writer, but it is indeed very gratifying to know that people find this blog to be useful.

      I realized this evening that I left out a section of the article. I forgot to include a section and corresponding chart that explains the four colour ranges. I will add it in the next day or two.

      Michael

3 Trackbacks

  1. [...] can we do better? This article builds on the concepts presented in my four most recent articles: (Estimating Effort For Your Agile Stories, Agile Planning Poker, Calculating the Velocity of Your Agile Project) Using these articles as the [...]

  2. By Twitted by robstoltz on July 13, 2010 at 2:10 pm

    [...] This post was Twitted by robstoltz [...]

  3. [...] This post was mentioned on Twitter: Estimating Effort For Your Agile Stories http://bit.ly/c39plu [...]