Thursday, June 4, 2009

A Risk Analysis of Risk Analysis

The title of this post is taken from a both sobering and sensible paper published last year by Jay Lund, a distinguished professor of civil engineering at the University of California (Davis), who specialises in water management. The paper presents a discussion of the merits of Probabilistic Risk Assessment (PRA), which is a “systematic and comprehensive methodology to evaluate risks associated with a complex engineered technological entity”. PRA is notably used by NASA (see their 320 page guide) as well as essentially being mandated for assessing the operational risks of nuclear power plants during the 80’s and 90’s.

Professor Lund’s views are derived from his experiences in applying PRA to decision-making and policy-setting for engineering effective water management, as well as from teaching PRA methods. His paper starts with two propositions: (1) PRA is a venerated collection of mathematically rigorous methods for performing engineering risk assessments, and (2) PRA is rarely used in practice. Given the first proposition he seeks to provide some insight into the “irrational behaviour” that has lead to the second proposition. Why don’t risk assessors use the best tools available?

Discussions on the merits of using modeling and quantitative risk analysis in IT Security flare up quite regularly in the blogosphere. Most of the time the discussions are just storms in HTML teacups – the participants usually make some good points but the thread rapidly peters out since both the detractors and defenders typically have no real experience or evidence to offer either way. So you either believe quant methods would be a good idea to use or you don’t. With Lund we have a more informed subject who understands the benefits and limits of a sophisticated risk methodology, and has experience with its use in practice for both projects and policy-setting.

Know Your Decision-Makers

After a brief introduction to PRA, Lund begins by providing some anecdotal quotes and reasoning for PRA being passed over in practice.

People would rather live with a problem that they cannot solve than accept a solution that they cannot understand.

Decision-makers are more comfortable with what they are already using. As I was once told by a Corps manager, “I don’t trust anything that comes from a computer or from a Ph.D.”

“Dream on! Hardly anyone in decision-making authority will ever be able to understand this stuff.”

PRA is too hard to understand. While in theory PRA is transparent, in practical terms, PRA is not transparent at all to most people, especially lay decision makers, without considerable investments of time and effort.

So the first barrier is the lack of transparency in PRA to the untrained, who will often be the decision-makers. There is an assumption here that risk support for decisions under uncertainty can be provided in the form of concise, transparent and correct recommendations – and PRA is not giving decision makers that type of output. But I think in at least some cases this expectation is unreasonable. For some decisions there will be a certain amount of inherent complexity and uncertainty which cannot be winnowed away for the convenience of presentation. I am not sure, for example, to what extent the risks associated with a major IT infrastructure outsourcing can be made transparent to non-specialists.

The next few comments from Lund are quite telling.

People who achieve decision-making positions typically do so based on intuitive and social skills and not detailed PRA analysis skills.

Most decisions are not driven by objectives included in PRA. Decision-makers are elected or appointed. Being re-elected can be more important than being technically correct on a particular issue. Empirical demonstration of good decisions from PRA is often unavailable during a person’s career.

So decision-makers are usually not made decision-makers based on their analytical skills, and what motivates such people may well be outside of the scope of what PRA considers “useful” decision criteria. Actually developing a methodology tailored to solving risk problems, in isolation to the intended decision-making audience, is counter-productive.

And here is the paradox as I see it

A poorly-presented or poorly-understood PRA can raise public controversy and reduce the transparency and credibility of public decisions. These difficulties are more likely for novel and controversial decisions (the same sorts of problems where PRA should be at its most rigorous).

So for complex decisions that potentially have the greatest impact in terms of costs and/or reputation, in exactly the circumstances where a thorough risk assessment is required, transparency rather than rigour is the order of the day.

Process Reliability

Lund notes that PRA involves a sequence of steps that must each succeed to produce a reliable result. Those steps are problem formulation, accurate solution to the problem, correct interpretation of the results, and then proper communication of the results to stakeholders or decision-makers. In summary then we have four steps: formulation, solution, interpretation and communication. He asks

What is the probability that a typical consultant, agency engineer, lay decision-maker, or even a water resources engineering professor will accurately formulate, calculate, interpret, or understand a PRA problem?

He makes the simple assumption that the probability of each step succeeding is independent, which he justifies by saying that the steps are often segregating in large organizations. In any case, he presents the following graph which plots step (component) success to overall success.

image

Lund describes this as a sobering plot since it shows that even with a 93% success at each step then the final PRA is only successful with 75%. When the step success is only 80% then the PRA success is just 41% (not worth doing). We should not take the graph as an accurate plot but rather to show the perhaps non-intuitive relation between step (component) success and overall success.

A Partial PRA Example

Lund also describes an interesting example of a partial PRA, where deriving a range of solutions likely to contain the optimal solution to support decision-making is just as helpful as finding the exact optimal solution. The problem he considers is straightforward: given an area of land that has a fixed damage potential D, what is the risk-based optimal height of a levee (barrier or dyke) to protect the land which minimizes expected annual costs? The graph below plots the annual cost outcomes across a wide range of options.

image

There are three axes to consider – one horizontal (the levee height), a left vertical (annual cost) and a right vertical (recurrence period). Considering the left vertical at a zero height levee (that is, no levee), total annual costs are about $850 million or the best part of a billion dollars damage if left unaddressed. Considering the right vertical, for a 20m levee, costs are dominated by maintaining the levee and water levels exceeding the levee height (called an overtopping event) are expected less than once per thousand years.

The recurrence period states that the water levels reaching a given height H will be a 1-in-T year event, which can also be interpreted as the probability of the water level reaching H in one year is 1/T. For a levee of less than 6m in height there is no material difference between the total cost and the cost of damage, which we can interpret as small levees being cheap and an overtopping event likely.

At 8m - 10m we start to see a separation between the total and damage cost curves, so that the likelihood of an overtopping event is decreasing and levee cost increasing. At 14m, levee costs are dominant and the expected annual damage from overtopping seems marginal. In fact, the optimal solution is a levee of height 14.5m, yielding a recurrence period for overtopping of 102 years. Varying the levee height by 1m around the optimal value (either up or down), gives a range of $65.6 - $66.8 million for total annual costs. Lund makes some excellent conclusions from this example

a) Identifying the range of promising solutions which are probably robust to most estimation errors,

b) Indicating that within this range a variety of additional non-economic objectives might be economically accommodated, and

c) Providing a basis for policy-making which avoids under-protecting or over-protecting an area, but which can be somewhat flexible.

I think that this is exactly the type of risk support for decision-making that we should be aiming for in IT Risk management.

Last Remarks

The paper by Professor Lund is required reading at only 8 pages. PRA he surmises can be sub-optimal when it has high costs, a potentially low probability of success, or inconclusive results. His final recommendation is to reserve its use to situations involving very large expenditures or having very large consequences, large enough to justify the kinds of expenses needed for a PRA to be reliable. Note that he does not doubt PRA can be reliable but you really have to pay for it. In IT risk management I think we have more to learn from Cape Canaveral and Chernobyl than from Wall Street.

1 comment:

boy labyog said...

you know that mensusa is the best source of man suit they have big discount off and free shipping promo.