Five Simple Steps to Agile Risk Management

Risk Factory Sign

Risk Factory (flickr: kyz)

This report, by its very length, defends itself against the risk of being read.  (Winston Churchill)

Background

In my post on Agile Project Charters I outlined the embarrassingly high failure rate of software projects. Success rates today are only marginally better than they were when the Standish Group released its first Chaos report in 1995. Recognizing the tremendous misalignment between project expectations and project results, a variety of tools and methods have evolved to help improve the odds of success. Chief among them is Project Management methodologies. Even with fifteen years of experience combined with improved software development tools and better methods, software project success rate have eked out only marginal gains. This is not a vilification of project management methodologies. Rather, it is a statement that software development is an inherently and increasingly complex undertaking with many uncertainties. With Risk Management, we attempt to identify the things we don’t know (the uncertainties) and quantify them so that they can be managed. This sounds like a paradox – how can you quantify what you don’t know- but it is a paradox we can manage.

Agile Methods such as Scrum are a relatively new entrant into the field of project management. A basic tenet of Agile Methods is that teams produce a continuous series of useable software builds in very short cycles called Sprints. Each build is assessed, issues identified and the backlog of tasks is reviewed and prioritized and the most important tasks are scheduled for the next sprint. It sounds like an ideal approach. For many teams it works extremely well as Agile teams tend to claim higher project success rates than do teams using more traditional methods. There is not a lot of empirical data available that makes effective comparisons of Agile project success rates to other methodologies, but what data that does exist tends to support those claims.

Most methodologies place a fairly high importance on Risk Management. Agile approaches tend to implicitly manage Risk. That might not be a bad approach if the only things that affected the outcome of the project were the decisions that the developers made to implement the solution, but as we shall shortly see, there exist a multitude of factors that can have a significant impact on the success of a project. Further, I maintain the position that explicit Risk identification and management can further improve on the success rate of Agile projects. In this article I will outline a Risk Management methodology I use that is quick, simple, pretty comprehensive and very Agile friendly. As the title of the article implies, I have broken the process down into five steps:

  1. Identify
  2. Classify
  3. Quantify
  4. Plan
  5. Act
  6. Repeat

Oops I lied… there are six steps. Actually there are only five steps but it is worth stating Repeat as a sixth step to emphasize that our Agile Risk Management Process defines a virtuous circle of continuous improvement.

The Basics

Yes, risk taking is inherently failure-prone.  Otherwise, it would be called sure-thing-taking.  (Tim McMahon)

  • Risks are influencing factors that might adversely affect the outcome of a project.
  • Risk is the direct result of uncertainty. If there is no uncertainty, it is not a risk – it is a certainty.
  • Risk analysis is used to help a team understand uncertainty that could affect the outcome of the project.
  • Risk management (sometimes called Risk Mitigation) is the plan that the team puts into place to pre-empt, contain or mitigate the effects of risk to a project.

The important thing to remember is that even in simple projects, things can and will go wrong, and that you need to make plans to minimize the impact of those events when they occur.

1. Identify

The Dimensions of Risk

Risk has two dimensional influences. The first Helpful/Harmful is a simple assessment of factors that have a potentially positive or negative influence on the success of our project:

  1. Helpful: Factors that advance the objectives of the project
  2. Harmful:  Factors that hinder or imperil the outcome of the project.

The second dimension of Risk is the identification of the source of the Risk:

  1. Internal: Factors originating inside the organization or within the sphere of influence of the project.
  2. External:  Factors originating outside of the organization or project that cannot usually be influenced by the project.

Combining these factors into a two dimensional assessment provides us with the classic SWOT Analysis view of our project: Strengths, Weaknesses, Opportunities, and Threats.  In the diagram below we see the two dimensions (four factor categories) arranged in a matrix with Helpful/Harmful dimension represented as columns, and Internal/External dimension represented as rows.

SWOT Diagram

Strengths Weaknesses Opportunities and Threats

Risk Management is primarily interested in the Harmful column and that is what we will focus on in this article.

Examples of Weakness

  • Insufficient resources
  • Limited budget
  • Aggressive timeline
  • Important skills lacking in the team
  • Technological uncertainties
  • Lack of stakeholder consensus
  • Lack of a disaster recovery plan

Examples of Threats

  • Rapid and significant changes in the economy
  • Pandemics
  • Geopolitical tensions
  • Economic uncertainty
  • Changing legislation
  • Changing competitive landscape
  • Trade tariffs

Weaknesses are factors over which we tend to have some degree of control. Threats, however, are factors over which we tend to have little or no control. It is important to understand that even though we may have no control over a factor such as a pandemic, there are usually things we can do to manage or minimize the Risk effects on our project.

2. Classify

Each of the Risks needs to be categorized as to the affected area, likelihood and level of Impact it may have on the project. Risk Classes are used primarily for organizing, summarizing and reporting of Risks to management and stakeholders. Some Risks you identify may impact more than one Class, and if they do, they should be reflected in the summaries of those Classes.

The next chart is a list of Risks Classes I typically use. These categories are not prescriptive and you may wish to add others such as Reputation, Environmental Impact, etc… to suit your project or company needs. Solution, Timeline, Budget, Privacy and Security should be of interest to everyone with a stake in the outcome of the project. Resources and Scope are primarily relevant to the development team, but they can have a significant impact on the other categories and are as such included in the set. Some Risks may affect multiple Risk Classes and that effect should be reflected in your Risk Classification. I will show how the Risk Classes are summarized later in this article.

Risk Class Explanation
Solution
  • Does it satisfy end user requirements (quality, features, performance, etc…)
Timeline
  • Is the project on time
Budget
  • Is the project within its budget and/or is there sufficient budget to complete the project
Privacy
  • Does the solution comply with privacy legislation and the company’s own privacy policy
Security
  • Is the solution adequately protected from intruders
Resources
  • Are there sufficient skilled resources available to complete the project
Scope
  • Is project scope properly contained

What Do You Assess?

I maintain the Risk ratings of each Story or Defect directly on the Cards. I use the same pattern of recording the three numbers Probability, Impact and Risk Rating and use a highlighter to colour code the risks. At a story level, most risk will likely be pretty benign, so don’t obsess and spend a lot of time on the low risk items. Focus on the ones that are genuine threats. Defects are areas that may require more attention if only because as Defects they likely have higher visibility in the organization. In both cases, write a few details about the Risk directly on the card. An added benefit of having developers assess Risk associated with the Stories and Defects is that it encourage a new dimension for their thinking about the work they are doing and helps them to be cognizant of the effects their work has to the overall success of the project.

Tracking Risk associated with Stories and Defects is insufficient – especially for Threats (factors external to the project) and for any identified Risk that is not a Story or a Defect I use a Risk Register (more on that later). The Stories and Defects that receive a high Risk Rating are also tracked in the Risk Register.

3. Quantify

Great – so now we know what to measure, but how do we go about doing that? If you’ve read my three previous blog posts, you’ve likely already guessed that we will use a matrix based on two vectors. The two vectors we will use in this case will be Probability and Impact. The Risks you identify must each be assessed according to these two vectors.

The assessment of each risk must be performed by the respective SME (Subject Matter Expert). A project manager is not qualified to perform an assessment of system security unless he/she is also a security SME. The same is likely true for assessing Risks relative to system performance, quality and privacy. Scrum is about teamwork so depend on the team to bring their expertise to the table. Another reason for SMEs to do the assessment is that I have in some organizations witnessed political pressure applied to PMs to produce Risk Reports to reflect a particular or desired Risk profile. This may force the PM to game the numbers to produce the desired results. The ethics of such practices are highly questionable. If you experience a situation like this, you’ve got much bigger issues on your hands than managing the Risk in the project and should perhaps consider looking for a new job. Having the SMEs do the assessment does help insulate the PM from such pressures. Once the SMEs have performed their assessments, it is useful to discuss the assessments as a team to ensure that there is a consistent approach and weighting applied across all assessments. This also allows the thinking and assumptions behind the assessments to be shared amongst the team and brings the team’s collective wisdom to bear on evolving potential solutions. It may even uncover additional Risks due to Risk interdependencies.

Impact

The Impact of a Risk is a measure of its affect on the project. It ranges from Minimal (1) at the low end where the consequences would be very small up to Extreme (5) at the high end. You and your team should devise wording to describe each Impact level to suit the realities of your organization. Whatever you decide upon should be consistent throughout the entire organization so at to minimize confusion. The wording should not be viewed as a set of rules – instead, it is a set of guidelines. Here is some suggested wording:

Impact Description
5 Extreme

  • May result in project failure
  • Budget overrun could exceed 50%
  • Project late by more than 50%
  • Could affect the ability of the organization to continue functioning
4 High

  • May result in significant impact on expected features, functionality or quality
  • Budget overrun exceeding 25%
  • Project late by more than 25%
3 Moderate

  • Significant effects on the project are unlikely
  • Budget overrun exceeding 10%
  • Project or subsystem late by more than 10%
2 Nominal

  • Does not require monitoring or review
  • Budget overrun exceeding 5%
  • Project late by more than 5%
1 Minimal

  • Little or no impact on any aspect of the project
  • Should be reviewed quarterly

Probability

If there is a very high probability that a Risk may be realized, then it is clear that it should have the attention of the team. Conversely, if there is a very low probability of the risk being realized, then it is likely that it should receive less attention from the team. We thus need to ensure that the greatest attention is focused on the Risks with the highest occurrence probability. The following chart provides a suggested scale for assessing the probability of Risk manifestation.

Probability Description
5 91 – 100% or Very likely to occur
4 61 – 90% or Likely to occur
3 41 – 60% or May occur about half of the time
2 11 – 40% or Unlikely to occur
1 0 – 10% or Very unlikely to occur

Enter the Matrix

We now have two Risk Vectors and as we did in the prioritization of Stories and Defects (see my previous blogs), we take the two vectors and multiply them together to obtain the simple product which is the Risk value. Using the same thresholds for Stories and Defects as well as the corresponding colour system we end up with a Risk Matrix that looks like this:

Risk Matrix Diagram

Risk Matrix

4. Plan

Risk Rating

Now that you’ve identified the important Risks that threaten the success of your project, what should you do about them? You can make your Risk Planning as comprehensive as you wish, but like most things in life, the simplest approach is often the best approach. Unlike Impact and Probability Assessment, your wording should not be considered a guideline.  For each of the various Risk Ratings, we want specific things to occur because the risk thresholds are triggers to mobilize the team or stakeholder to take action to mitigate the Risk. Here is some suggested wording for your Risk Planning. The wording you use in your company should be different than mine and reflect the realities of your organization, but it is important that the wording be focused on Actions to manage the Risk:

Risk Ratings

Risk Ratings

Risk Register

To track and manage Risk on a project I use a Risk Register. To do this, I use a spreadsheet. Each time I do a Risk Assessment (ideally each sprint planning session) I add a new page to the spreadsheet and each page is a Risk Assessment corresponding to a particular Sprint. This way I can track how a Risk has changed over the course of a project. I can also monitor how Risks are added and removed from the Register. As you near the end of the project, you should see all of your Risks gradually move into the green or minimal range. If this does not happen, you are definitely doing something wrong because if you still have Orange or Red risk in the late stages of your project, you have not been managing the Risk and you are rolling the dice on project success. All of the time, effort and money invested up to that point is at Risk of being lost.

Risk Registry

Risk Registry

5. Act

Insanity: doing the same thing over and over again and expecting different results.
(Albert Einstein)

First Things First

Act is simply that. It is the implementation of the defined Risk Mitigation Strategies. Well it’s actually not that simple. Human nature is such that we tend to put off the things that aren’t fun, interesting or that might be just plain hard work. This is project suicide when it comes to Risk. It is imperative that you deal with the high Risk items first. Deferring performance testing and finding out a week before implementation that you can’t possibly achieve the requisite transactional throughput may be the death of your project. At a minimum, someone is going to have to do a lot of explaining – don’t let that be you.

Fail Early

This is important

The “Fail Early” phrase is becoming very popular in the world of venture capital. In essence it means figure out as early as possible in the process as to whether or not what you are doing will succeed. These findings are essentially the go-no go for your project. If success is not possible, either stop (kill the project) and move on to something else, or rethink the project and come at it from a different angle. Either way, do the difficult, gnarly, risky stuff up front. An added benefit is that it helps you define the boundaries of your system and sets expectations as to what is possible/realistic and what is impossible/unrealistic. It could even bring to light unrealistic success criteria and the definition of project success many need to change. When this happens, the project may still live, but under a revised and possibly more realistic set of stakeholder expectations. It may also stimulate commitments like a larger budget or access to key people.

As simple and as obvious as this may sound, it is amazing how often such critical, high Risk items are left until the final stages of the project. From my own observations over the years, this is one of the biggest reasons for project failure. Do the Risky stuff first and Fail Early.

5. Repeat

This process is very lightweight and very quick to perform. Identifying Risks early, and implementing appropriate Risk Mitigation Strategies for each is essential to the success of projects. Done properly, it is a continuous virtuous circle of Assessment and Action to constantly identify, manage and minimize Risk.

Your Risk Plan should be reviewed at a minimum quarterly. Better still, your review should coincide with your sprint planning sessions. At your these sessions, you have access to your team where everyone is already looking at the stories, reviewing effort estimate, etc…  You don’t need to do an exhaustive review each time, but pay particular attention to the Risks you are tracking in your Risk Register. Also look for any new Risks that might start appearing as the team progresses through the project and learns more about the challenges. As always, if you discover Risks that are high, deal with them early.

Summary

In this article I have presented a simple, easy five step process for assessing and managing Risk in an Agile process. My next post will approach how you can aggregate the Risks of multiple Projects into a Program view of Risk.

As always, I look forward to your comments.

I’ve been designing and building software and leading software teams for over 20 years. I am the founder of projectyap.com (agile project management that’s social) yapagame.com (social media and micro blogging for sports fans).

Twitter LinkedIn Google+   

Tags: , , , , , , , , , , , , , , , , ,

This entry was posted in Agile, Privacy, Project Management, Risk Management, Scrum, Security, Technology and tagged , , , , , , , , , , , , , , , , , . Bookmark the permalink. Both comments and trackbacks are currently closed.

6 Comments

  1. Posted August 10, 2010 at 8:46 pm | Permalink

    Hello Michael,

    Great post! I like the simplistic model using vectors (which kudos to you for understanding how simple vector products work) and the color codes for visible cues to risk. This would be a fairly valuable way of presenting risk and getting executive buy-in during resolution of impediments and in non-Scrum team (external) influences on the project.

    I would propose that beyond the summation of the individual risk per story, on a per Sprint basis, this would be a great first step to project planning. Think of how this concept aligns with TDD principles. You set out your tests and write the code to pass. Here, you set out your risks and you take active steps to eliminate them, in priority order. Some elimination of risk is development work, while other elimination is by delivery of work product to the Scrum team by outside groups. This puts the Scrum Master in a position of potentially influencing non-Scrum teams in order to gain mitigation on their projects, which tends to be a political issue at times in larger corporate environments.

    I am currently building a set of risk assessment questions based on Agile adoption factors, like scrum master experience, product owner involvemnt, legacy systems support requirements, new technology vs. refactoring existing, the % of code coverage your system has in unit tests and so on. These questions can be sorted and grouped into a logical manner, then analyzed to present risk across categories, much how you have outlined in your blog.

    Thank you for posting these concepts, I find them useful food for thought during my current meanderings into Risk Assessment for Agile Teams.

    Cheers!

    • Posted August 11, 2010 at 9:07 pm | Permalink

      Hi Mathew,

      Great comments – thanks!

      I regularly use these risk ratings with executives to help them understand where the issues lie. The simple system works well to quantify and present the level of various risks relative to each other. It helps cut out a lot of the extraneous discussion based on “I think…” or “I feel…” positioning and makes it more quantitative. Take a look at my post on Agile Program Risk Management where I discuss how the process can be elevated to a higher level (less detail) and focus on risk across a portfolio of projects.

      I like the idea of formally tying the risks to TDD. It is a very effective way for the Scrum Master to add a bit of rigour to managing relationships between participants, and like managing executive interactions, the conversations are less qualitative and more quantitative.

      I would very much like hear about your question on Agile adoption factors when you have finished with them.

      Michael

  2. Hal Stull
    Posted June 11, 2010 at 6:40 pm | Permalink

    Michael,
    I have worked in and managed projects in several Agile shops. There are two inherent risks in the methodology that you don’t mention.

    You have Solution and Scope as two risk factors. What I have seen in Agile shops is Scope creep originating in the IT team. One project produced a workable document control system in nine months with a team comprised of three senior developers working with a customer. The second year budget was for five full time developers enhancing the system. One of the enhancements was changing technology from Cold Fusion to .Net, clearly not a customer need.

    With repeating builds, the Time line and Budget are rolled up to the project level, not set at project start. Agile teams often don’t look beyond the Transition. The customer will use the system for years after the project. For cost effectiveness, completed systems should be run by operations. For Security, developers should be locked out of production systems. Tier 3 support should be infrequent. Yet, I have seen systems where the senior developers have been supporting systems on a daily basis for over five years.

    You have some generic techniques that could be used in any project. I like the color coding schemes. What is unique to Agile risk management isn’t really clear.

    • Michael
      Posted June 12, 2010 at 1:16 am | Permalink

      Hi Hal,

      Thank you for your comments.

      There are indeed other Risk factors that could well be included and what is included will often vary from one shop to another. The IT feature/scope creep issue you have described is one I’ve encountered as well. Developers often want to add in capability that they believe could make the product much better, but as you say, these are not necessarily customer needs that are being satisfied. The developers may be right with what they want to do, but equally they could be wrong. This is where leadership combined with having the product management team involved in ranking stories (see my article in prioritizing stories) comes in. Deciding to track it as a separate Risk is up to you and your organization.

      I also agree with your assertions about support and ongoing maintenance of an operational system. I don’t think this is an issue that should necessarily be tracked as project Risk – at least not in the development project. Once in production and the original project has been closed, you should open a maintenance project and that project should have a backlog, sprints, etc… just like any Agile project. It should also be subject to the same processes for building, staging, testing and deploying as any other project.

      As you state, this method for management of Risk is usable in many types of projects. You could use it for instance in planning for and managing the Risk of a wedding. The Risk Classes would be very different from the ones proposed here, but the principles would be the same. As to what is unique about this approach as it applies to Agile, well because it could in fact be applied to a project methodology of any type there is nothing specifically designed for Agile projects except that it is meant to be lightweight. The point is not that it is a method specifically for Agile projects so much is it is an approach that works really well with Agile projects as well as other types of projects.

      The colours are indeed useful, and you will find that project teams begin to refer to project status by colour.

      Michael

  3. Posted June 5, 2010 at 3:46 am | Permalink

    Typo at nominal impact “Budget overrun exceeding 50%” (ought to be 5%). The same applies for “project late”. It also might be useful to remind the reader that W/T at “Risk Register” refer to Weakness and Threat (SWOT). It took a while of head-scratching to figure that one out. :)

    The post reminded me of a project management course I participated in a few years ago. I like particularly the color coding system that helps to highlight the real problems. Do you happen to know any specialized tools that help in tracking risks this way?

    I agree on the point of “failing early”. It’s way too common to postpone important tasks till it’s too late to act properly. Traditionally it takes some amount of heroism to pull this kind of projects through due to deadline. I’m not saying heroism is a bad thing. It just can be really draining and not the optimal way to work on longer term.

    • Michael
      Posted June 6, 2010 at 10:45 pm | Permalink

      Thanks for the heads-up on the typo, and sorry about the confusion with W/T. I should know better than to do final edits at 2:00 am.

      I find the colour coding to be really helpful as well. It simplifies things and people actually start talking about statuses as Redy, Orange, Yellow and Green. It’s a nice visual and verbal shorthand. I’ve not spent a lot of time looking for tools to track project risk in this fashion so I can’t really say what is out there. I have been using spreadsheets, but I am now building a tool for my own use.

      Failing early is important, as long as it is not too early. Sometimes the best ideas come when you think you’ve run out of options (necissity is the mother of invention). The art of it is knowing when to quit, but the science is doing the really tough, high risk stuff first.

      Sometimes in spite of best laid plans, bad stuff happens and heroics is the only thing that can save the project. Heroics can work for a while, but they are not sustainable, and they are not scalable. That being said, heroics has saved many a project, including a few of mine.

      Thanks for your thoughtful comments.