Does Your Risk Framework Enable Good Asset Management?
In the world of asset management, navigating risk is paramount. Well-defined frameworks can empower organisations to make informed decisions and achieve their objectives. Explore the nuances of consequence levels, likelihood scales, and the importance of clear risk assessment.
We often think of risk in terms of health, safety and environment (HSE) risks, financial risks, or even reputation risks. But what about the risks of meeting (or not meeting) the organisation's Asset Management objectives?
When I talk about the risk framework, I'm talking about not just the risk matrix, but also the likelihood scale, the consequence categories, and the consequence scale. Also the risk levels and the granularity or the rate at which this scale increases.
It's also prudent that I point out here, the ISO definition of risk is:
The effect of uncertainty on objectives.
Risk frameworks are a critically important part of any organisation that deals with risks in terms of health and safety, the environment, and/or shareholder capital, for example. What I want to discuss in this article is how organisations describe the risks that they potentially face, and how useful, or not, is that framework in the context of asset management.
While some organisations have very well thought out risk definitions, I've seen enough to motivate me to write this article. Risk ratings are often used to justify expenditure to help the organisation achieve it's objectives. The problem arises when the risks can not be adequately or appropriately articulated in that justification.
In this article, I will look at some typical uses, and some of the problems arising from poor frameworks.
Consequence Categories
There are also many different variations of categories. They usually involve financial and HSE. Financial consequences are a direct and measurable cost to the organisation, so of course it makes sense to include it. HSE consequences can also have a significant financial impact, but can also impact an organisation's social licence. Again, a no-brainer. In the industries I've worked in, legal compliance, or legal action is also a common consequence category.
The categories that are included in the risk definition should be what the organisation thinks is important. For example, a company that produces widgets, where that production process is time constrained, downtime might be an important factor to consider. As such, downtime should be represented in the consequence categories. Another example might be a factory where the process is supply constrained. Risks to the supply chain will have a much greater impact on achieving the organisational objectives than downtime if production can always catch back up.
Consequence Levels
The levels at which the consequences escalate is also important. Usually these either grow at a linear or exponential rate. If we use downtime as an example, a factory might use the following scale:
Consequence Level | Consequence Description |
---|---|
Very Low | Limited interruption, short delays in production. |
Low | Partial production stoppage, affecting specific areas or processes for a few days. |
Moderate | Significant production stoppage, impacting multiple areas or processes for about a week. |
High | Extensive production stoppage, affecting the entire factory for several weeks. |
Very High | Complete production halt, with the factory unable to operate for up to 3 months. |
This example here is purposely bad. The description of the consequences are vague. Most of the time, guessing the outcome of events that have never happened is a bit of a thumb-suck, but when the downtime is a known quantity, then what is "several weeks"? How long is "about a week"? Where does a definite 5 days or 6 days lost production sit? Where does 10 days sit? The boundaries of these levels need to be clearly defined so that you can split hairs when you need to.
A financial consequence table might look like:
Consequence Level | Financial Consequence |
---|---|
Very Low | up to $10,000 |
Low | up to $50,000 |
Moderate | up to $100,000 |
High | up to $200,000 |
Very High | up to $500,000 |
The steps between each level should not be too large, such that a significant difference is not reflected by a change in consequence level. For example, using the above financial scale, if a $200,000 machine at the factory is damaged beyond repair, then the consequence is High. If two machines are damaged, then the consequence level is still High, but we've actually got double the loss. That's not insignificant.
How can an asset manager justify a demonstrable saving to an organisation by limiting the damages to one machine, yet in the universal language of the organisation's consequence levels, there's no difference?
Additionally, the lower and upper bounds need to make sense for the organisation. There's no sense in the lower limit being very low when the organisation nets a billion dollars per year. This artificially raises risks when the organisation could instead raise its appetite. On the other hand, there's no sense in a moderate consequence being absolutely devastating to the company, unless the organisation has a high risk appetite. There was one organisation I worked with that had the lowest financial consequence level starting at half a million dollars. That was an eye water amount of money for the first level of responsible decision makers to be betting on. Especially since that amount would have sunk some of the business units.
In summary, the consequence boundaries need to fit the organisation in its current context, and the steps need to be small enough that significant increases are reflected on the scale.
Likelihood Levels
There's not much variation in the typical likelihood levels that I've seen. But I've seen some ineffective, an downright confusing likelihood levels. most organisations use some variation on a theme of occurrences in a time-scale that grows by a factor of 10, or a probability expressed as a percentage.
The issue with expressing probability as a percentage is that some events are 100% going to happen. It's just a matter of when. There's no value in expressing the risk of a machine or component failure using a probability scale without considering the time domain. But then how do you express risk by picking one box in the table when it varies over time?
With the "one in x time" scale, these need to be clearly described, and the boundaries need to be relevant for the organisation.
Risk Table (Matrix)
Since risk is the product of likelihood and consequence, it's only natural that it is represented in a table like the below. If you search the interwebs for "risk matrix", you can find many examples like this:
Impact / Likelihood | Very Low | Low | Moderate | High | Very High |
---|---|---|---|---|---|
Very Likely | Medium | High | Extreme | Extreme | Extreme |
Likely | Medium | Medium | High | Extreme | Extreme |
Possible | Low | Medium | High | High | Extreme |
Unlikely | Low | Low | Medium | High | High |
Very Unlikely | Low | Low | Medium | Medium | High |
This is but one example. There are many variations of this. You might notice that I called this a table. That's what it is. It's not a matrix. A matrix is one of these:
[[ 1, 0 ],
[ 0, 1 ]]
I digress...
While there's nothing really wrong with these sorts of tables, there is something that is glaringly obvious. And that is...
Risk Levels
Most organisations that I've interacted with use only four levels of risk:
- Low
- Medium
- High
- Extreme
I think there is one case where the organisation had five levels.
I don't know about you, but having only four or five levels can not adequately describe the differences between different risks, or the effectiveness of controls. As you can see in the above example table, there are multiple possibilities where reducing the consequence or reducing the likelihood does not reduce the risk level. There's even two cases (low and extreme) where reducing both likelihood and consequence have no bearing on the outcome.
In cases where a risk control clearly reduces the risk by consequence or likelihood, but does not reduce the resulting risk level, it is difficult to justify the costs of the control.
Perhaps a more useful system might include a level of granularity that helps us to demonstrate a change in risk, but then that granular system can be mapped to a simplified level. For example, in the 5x5 table above, there are 25 possible outcomes. Suppose these are given numbers from 1 to 25. We should be able to show that risk number 13 is lower than risk number 14, yet they might carry the same descriptor of "Medium". I'm not sold yet. There has to be a better way, but I'm not convinced of what it is yet.
Tolerable Risk
At some stage, senior management needs to draw a line in the sand and say "this is the amount if risk that we will tolerate". That line defines the tolerance level for the organisation. More often than not, this tolerance level is not set. Furthermore, the tolerance level needs to be clear, and it needs to be communicated.
A mature organisation may have a delegation of authority for risk levels. For example, a supervisor may only be able to accept medium risks, a manager can accept high risks, and the executive accepts extreme risks. This is sometimes referred to as the risk escalation criteria.
Clearly defining these tolerance levels empowers people to make risk related decisions. This can improve the efficiency of internal processes and delegates appropriate levels of responsibility throughout the organisation.
Bow-ties
Bow-ties can be an effective tool for understanding the risks of complicated systems with varying levels of likelihood from different possible causes. Additionally, they highlight the various pathways and controls preventing an event. These controls can then be assessed and tested for effectiveness.
Bow-ties should generally be reserved for the more complex, less often, more serious events. We often refer to these events as "process safety" events.
While Bow-ties are great for managing these complex risk scenarios, they may be too cumbersome for managing asset management risks, particularly when related to achieving the asset management objectives. However, I wouldn't rule them out, as they may be handy for visualising multiple threats to long term objectives.
Conclusion
In asset management, we need to work together with different departments to achieve a common goal. We can only be coordinated if we're talking the same languages. Risk is one of those important languages. We must communicate risk in a way that everyone in the organisation understands. It can be mighty difficult to demonstrate the effectiveness of controls and reductions in risks if the risk levels are too broad.
We may not be able to change the risk frameworks our organisations use, but we should make sure that we articulate risks and controls by other relevant means in a clear and simple language that our organisation understands.