Time To Put Numbers on Internal Controls
August 2005
There's a difference in culture and approach
between two of the major groups involved in "risk management" in its widest
sense. On one side are the insurance specialists, busy putting
numbers on risk. Their high priesthood,
the actuaries, like mathematics and thirst for empirical data. On the other
side are the accountants and auditors, busy putting
words on risk. Their high priesthood, the
audit partners in big external audit firms, rely on judgment and seek "comfort."
by Matthew
Leitch *
Why this difference when, in effect, external audit is a form of insurance
too? The differences are that external audit firms very rarely receive "claims"
(i.e., get taken to court) and the claim is decided by a court after extensive
analysis of the auditor's actions. An insurance company receives many claims
and just pays out if the terms of the policy are met.
Consequently, the auditor's strategy is based on leaving no evidence that
a hostile lawyer could exploit. Although the big firms have occasionally flirted
with quantitative methods for setting audit sample sizes, for example, they
have shied away from them, preferring to obscure their decisions under the general
heading of "professional judgment."
Risks are described as "high," assurance is often described as "reasonable"
(or "high" if necessary), and control weaknesses may be deemed "significant,"
"serious," or even "material." None of these key terms has any quantitative
basis.
The Problem with "Professional Judgment"
As far as I know, "professional judgment" as exercised by auditors and controls
specialists from that background (including me) has never been systematically
tested. If it were, what might be found?
Judgments of probability have been studied in a few other professions. By
far the best data comes from studies of weather forecasters. These have shown
that people who routinely make probability judgments and get feedback on their
accuracy can become what is called "well calibrated." This means that, for example,
if you take all the instances when the forecaster has said the probability of
rain is 50 percent, then in fact there was rain on about 50 percent of those
instances.
(You might think that makes their probabilities good ones. Not necessarily.
There is rain over Cambridge on about 50 percent of days. If a forecaster says
the chance of rain is 50 percent everyday, he or she will be perfectly calibrated
but how useful is that?)
The key point is that this good calibration is rare and the result of long
practice and quantitative feedback. Controls specialists get a lot of practice
but next to no quantitative feedback, so it seems unlikely that they will be
well calibrated.
Indeed, research into human judgment generally paints a bleak picture. We
are biased in favor of confirming what we already believe. We cannot weigh more
than two or three factors simultaneously without being inconsistent.
The Need for Empirical Support
For "professional judgment," the precedents are discouraging, and we should
consider the controls specialists guilty until proven innocent. For all anybody
knows, organizations across the world could be spending millions on audit work
and controls they do not need, while doing nothing on controls that matter more
than anyone realizes.
If I were a lawyer building a case against an audit firm, I would argue that
the firm had been negligent in not collecting and making use of empirical data
to support its judgments. The fact that other auditors have also failed to do
this is no defense because other risk management professionals, such as those
dealing with insurance, health, safety, and project management, have all made
efforts to gather and use empirical data—as any educated person would expect.
A Lot Can Be Done
The data we need is all around us, collected already. Virtually all large
organizations collect data on processing errors, fraud, safety incidents, and
so on. Usability testing generates thousands of statistics a year about error
rates from different kinds of computer interface in different situations and
tasks. Manufacturers test their computer systems and peripherals providing extensive
information about their reliability. Telecom companies continually monitor their
network availability. Internal and external auditors perform hundreds of thousands
of audits a year, gathering information about millions of internal controls,
and investigating hundreds of thousands of errors and a smaller number of frauds.
There's no shortage of data. The trick is simply to pull some of it together
into a usable form. I have two suggestions.
Suggestion 1: Controls Designers' Risk Tables
Imagine you are tasked with designing controls over some process or system
and estimate that one of the controls you want would cost $100,000 to implement
but would save money if your assumptions about the number of errors it will
have to handle are correct. You are challenged on the need for spending $100,000.
Wouldn't it be helpful to be able to refer to tables of error rates for different
types of work and system derived from research? It may be that the tables do
not cover the exact situation you are looking at, but even having something
slightly similar would give you a starting point. Let's imagine the tables say
1 in 50 invoices will be wrong without the control you are thinking of, but
in your case, the risk factors are slightly worse than those in the table. At
least you have a starting point, and that's a lot better than nothing, which
is what we have today.
Suggestion 2: Research Risk Factors in Your Organization
Consider the benefits of understanding and quantifying how risk works in
your organization. For example, you may have the view that staff turnover leads
to more mistakes and bigger backlogs of work. It drives risk and productivity.
But how much? Does it matter who changes their job? Is there an interaction
with the complexity of work? When weighing decisions about staff how important
are these effects?
Another example concerns the effectiveness of attempts to improve controls.
Did past attempts actually improve control or did it just encourage people to
change their priorities and let quality problems show somewhere else? For example,
after being pushed for greater accuracy people often slow down and this increases
incidences of lateness. Taking all things into consideration—including changes
in workload, customer and supplier originated errors, new developments, staffing
changes, and so on—did our attempts to improve controls actually work?
Or consider workload. If workload is related to mistakes and backlogs, then
can the impact for process performance of taking on new business be quantified?
Is it possible to say how much effort might be needed on controls development
to meet new requirements?
One final example concerns money lost through undetected billing errors.
In telecom this is called "revenue leakage," and it has been searched for extensively
over the last several years. A big part of solving revenue leakage problems
is deciding where to search for them. Common sense says that some risk factors
make leakage more likely, but the problem is that nobody knows for sure which
factors are most important, and how much weight to put on each factor. Research
is needed.
Already, leading banks seeking to measure Operational Risk accurately are
gathering and analyzing data in an attempt to quantify the sort of problems
that internal control has traditionally tackled. Although the objectives of
this work are narrow and concerned with compliance, perhaps we will see some
exciting progress nonetheless.
Useful Techniques
To get the most out of our data (for either of my suggestions), we need to
use multivariate statistics to tease out the impact of different factors on
error and other risks. Fortunately, there are several great software packages
that offer a range of techniques for visualizing and quantifying these multiple
relationships.
Although the mathematics and algorithms underlying these packages are often
complex and hard to understand, we don't have to understand everything to use
it effectively. (Just like you don't have to understand how a car works to drive
safely from A to B.)
The techniques available go far beyond fitting straight lines to scatterplots,
as most of us did at school. The model doesn't have to be a straight line. It
can rely on a potentially large number of variables, some of which are numbers
while others are categories. This is an exciting area of development with new
ideas being tried all the time.
Neural networks are still an important area. Decision trees (e.g., C4.5,
CART) use information theory to identify the most important variables. Statistical
learning can also be done using Kernel machines (not actually a physical machine)
such as Support Vector Machines. Many of these tools work best when give a large
amount of data to learn from (e.g., hundreds or thousands of examples). However,
there are good reasons to consider using mathematical methods even where the
amount of data available is low.
One problem with human judgment is that we struggle to weigh more than two
or three variables at once. We try to eliminate variables from consideration
by finding one or two that are decisive on their own, or by pairing up pros
and cons in the hope that we can ignore those that seem to balance. But, often,
none of these strategies is applicable or safe.
Using even very crude mathematical formulae can be advantageous. The formula
does not make mistakes, succumb to confirmation bias, or give way to special
pleading. If we want to make objective decisions this is very useful.
Some traditional statistical methods do not give sensible results on the
basis of small amounts of empirical data, or simply refuse to provide any estimates
at all. In contrast, Bayesian methods involve starting with an initial view
and modifying it as data are received. Consequently there is always some view
to use.
Another fascinating approach is the use of "fast and frugal" algorithms that
analyze the database of past experience only when a prediction is required.
One interesting example is PROBEX (PROBabilities from EXemplars) which relies
on a similarity function and then computes probabilities as a similarity weighted
frequency. Although it does not discriminate between important and irrelevant
variables it still performs well, giving results close to human judgment but
without the mistakes and biases.
Conclusions
For too long, internal control and audit specialists have relied on "professional
judgment" and failed to seek and use empirical evidence relevant to their judgments
about risk and control. The data to remedy this are all around us, and today's
powerful, yet easy to use, statistical tools give us the best chance ever to
make sense of this data.
Further reading:
Fast and Frugal Use of Cue Direction in States
of Limited Knowledge by Magnus Persson and Peter Juslin.
Opinions expressed in Expert Commentary articles are those of the author and are
not necessarily held by the author’s employer or IRMI. This article does not purport
to provide legal, accounting, or other professional advice or opinion. If such advice
is needed, consult with your attorney, accountant, or other qualified adviser.