Performance Management and the Human Error Factor: a New Perspective

Peter Furst | December 1, 2010

A circle of construction workers looking at plans

At the operational level, most organizations can be boiled down to three key elements. The organization produces an output (a product or service). The organization has systems and processes with which to create the output, and it has the people to energize, control, and manage the systems/processes so as to produce the output.

There are a number of outcomes resulting from the processes producing the output which are not desired. These can be called any number of things, including—but not limited to—not creating value, unproductive, ineffectual, deficient, defective, barriers, discrepancies, waste, injuries, losses, etc. The operational system resides in a larger system called the organization. It, too, has systems and people. To differentiate between the two, the people at the operational level are the producers, and the people and the organizational level are the managers. The systems at the operation level may be the plant and equipment as well as practices and processes focused on producing the product or rendering the service. At the organizational level, the systems are policies and procedures designed to run the business and manage the workers in an efficient and productive manner.

Figure 1: Organization and It's Operational System

Performance Management

Traditionally, performance has been managed by setting goals for (employees) producers to achieve. These goals may be related to production, quality, or injury. When the goals are not achieved, a series of actions invariably follow. The worker is then trained, counseled, retrained, admonished, possibly punished, demoted, or let go. Generally, the interventions are directed at the worker, ignoring the fact that in the operational model as described above there are two sources of failure risk: people and processes. Since producers work within the system (interface with the system), it influences them and may cause them to take actions, make choices or decisions that may results in errors or discrepancies and so lead to underachievement. Also, the organizational systems will affect the producers as well.

Generally, attribution of such human failings (producer errors) is to inattentiveness, poor judgment, lack of focus, capability, or negligence, to name a few. This prevents the digging into the inner workings for reasons causing such failure, which general resides deep in the systems, process, procedures, and practices of the organization. Human error is simply a difference between an actual state and a desired state. It is important to note that all human errors do not result in catastrophic outcomes; in many cases, the results are tolerable, inconsequential, or may even turn out to have positive results. To understand failure, we must also understand our reaction and response to failure. People do not operate in a vacuum, where they can decide and act all-powerfully. To err or not to err is not a choice. Instead people's work is subject to, and constrained by, multiple factors.

The Human Error Factor

The impact of human error on organizations is far-reaching in terms of productivity, customer service, quality, teamwork, decision-making, execution, injury, and loss. There is little in terms of statistics for most of these categories except for accidents. In many of the most serious accidents in the last 50 years, almost all initial findings attributed the failures primarily to human error. As examples:

In 1965, in Little Rock, Arkansas, 53 contract workers were killed during a fire at a Titan missile silo.
In 1978, in a construction site disaster at a power plant in West Virginia, a cooling tower collapsed, killing 51 workers.
In 1984, in Bhopal, India, a Union Carbide plant explosion released cyanide gas, killing 20,000 people.
In 1988, the Piper Alpha oil platform explosion killed 167 and resulted in a major oil spill.
The 1989 Phillips explosion in Pasadena, Texas, killed 23.
The 1989, Exxon Valdez oil spill in Alaska was a major environmental disaster.
In 1991, the Hamlet Chicken processing plant fire in North Carolina killed 25 workers.
In 2005, the Texas City BP refinery explosion killed 15 workers.
In 2006, a sugar refinery explosion in Georgia killed 42 workers.
In 2010, BP Deepwater Horizon oil spill in the Gulf of Mexico killed 11 men and injured 17 others.

Going back to the organizational model within which the operational model exists, we identified systems and people (management). Management devises the systems, and—as humans—are fallible, creating systems with latent defects. The producers at the operational level have to function within the systems, and these latent defects, combined with operator errors, may lead to failures. The progression of latent conditions may start with the organization's hiring practices, followed by employee development; promotion practices; management's actions; supervisor's goals; operational constraints, requirements, communication, and information flow; task design; and physical environment. All these latent conditions influence the producer's (worker's) choices and decision making. When the worker makes the wrong choice or makes an error, these may lead to an active failure, which may or may not have adverse effects on whatever is being evaluated (production, quality, or injury). Latent conditions are discrepancies in the systems that facilitate error on the part of the producer.

Traditional Approaches to Combatting Human Error

The traditional approaches to managing human performance are not highly effective. Most of this is driven by performance goals, metrics and recognition/rewards. One of the underlying reasons is that the established goals and metrics are set without a thorough understanding of the impact these may have on other aspects of performance and results. There is a vast array of reasons for underperformance. One of the insidious reasons is human error. This is a "newer" area of study of human factors, and until recently, its causal analysis and interventions has been more an art than a science.

Performance has to be reliable, and the system has to be robust. That means that the human-task interface has be free of error, and the system has to be tolerant of unexpected conditions should they arise. Another aspect of performance is resilience. The system has to be able to recover and return to a steady state without much difficulty or delay.

Human error is inevitable and occurs for many reasons. The reasons may reside with the individual or the organization's systems. This mismatch may be due to a misunderstanding of the task, task demand, capability, knowledge, motivation, goals, information, communication, politics, human dynamics, supervision, climate, culture, and leadership to name a few. It also is impacted by the ability of humans to perform the task in a myriad of different ways or break (often unintentional) an "unbreakable" system. This is one of the reasons why some of the implemented protective systems sometimes are breached.

Research has shown that humans do learn from their mistakes. So, from this perspective, making mistakes is not all that bad. It would seem that the way to address performance issues is to make the result (consequences) of the mistakes as inconsequential as possible. Therefore, a performance management strategy might include a number of elements, one of which might be designing out the error producing elements of the systems, or at least reducing their frequency. The next step might be handling the consequences of the error so as not to impact the goal/mission achievement by returning the process to its former unimpaired state.

Human Error Prevention

There are two ways to prevent human error from affecting performance. The first is to stop people from making mistakes (avoidance) or keeping the mistake from impacting (interception) the system. The preventive interventions require that the possible/potential errors be known before they occur. This technique includes design, automation, reduction of exposure time, error proofing, training, etc. For training to be most effective, it has to focus on concepts (education) and not just practice and procedures. Stopping mistakes from occurring has proven difficult as humans invariably find different ways to go about performing their tasks, bypassing interlocks or aids and just plain making mistakes. That does not mean giving up on prevention as it does have benefits and reduces some of the possibility and potential for making errors.

Another aspect of human error is that the error may be made by another person upstream from the producer's activities. These are latent (defects) errors. The process itself may fail and cause the producers to fail. Designing systems with an understanding of recovery time is also important. Consider an example of the Soyuz 11 capsule. On its return to earth, at the capsule separation stage, a pressure equalization value prematurely opened, venting the internal atmosphere. This took about 45 seconds. To manually close the valve took 60 seconds. There is evidence that the crew attempted to close the valve, but events overtook them, and they perished. The design should have taken this into account so that the manual operation could be completed before total loss of breathable air occurred.

Developing Error Tolerance

But errors are going to be made, and error avoidance is not "foolproof," so the next step is critical in optimizing performance—minimize the consequences of the errors. Error tolerance can be achieved in a couple of ways.

For systems where error that cannot be designed out or blocked, there should be a way to detect errors early, and mechanisms developed to recover from them without significant impairment of performance. An example of this is a checklist utilized before engaging in an activity. Pilots routinely go through a preflight checklist. This has helped to render flying safer. Checklists can also be used after completion of an activity, such as maintenance, to ensure that the equipment is operable and in good working order.
Deviations or errors that are not detected or detected "late" are going to have consequences. The minimization of these unexpected and undesired outcomes must be dealt with effectively so as not to adversely impact performance. Such a process will keep an error from escalating into a major undesirable event. Examples of this might include routines maintenance, redundant systems, seatbelts, fall arrest, etc.

Become Resilient

The next element in managing human error is making the organization and its systems resilient. That means there is a built-in mechanism to deal with error, and changing conditions effectively while recovering from adverse effects to quickly return to "normal" operations seamlessly. Agile resilience has five elements: Leadership, culture, people, systems, and the work environment.

Figure 2: Illustration of Resilient Elements

Resilience begins with a vision set by the leadership. The organization must select the right people and provide the resources to devise the systems that foster resilience. Leadership must also establish the acceptable level of risk and the "right" balance between risk taking and risk avoidance. Leadership must create a climate where it is okay to make mistake and, once made, ensure that lessons are learned and disseminated throughout the organization.

A resilient culture is built on four pillars. These are trust, purpose, empowerment, and accountability. Such an organization has a strong sense of purpose that flows vertically and horizontally to all the employees. It encourages self-directed teams that innovate and communicate cross-functionally. The four pillars bind the organization into a cohesive, innovative, purposeful group with a sense of commitment to action problem resolution and win-win thinking, with a passion for excellence.

The core of any organization is its people. The organization must select the "right" people, who are motivated, have the courage to challenge the process, are willing to work toward a common goal, share a common vision and purpose, and are willing to overcome obstacles and barriers. The organization must provide the timely information and resources which will facilitate effective decision making and problem solving.

Systems in a resilient organization have an open structure that allows for the flow of information and resources. Such systems foster innovation and agility. The systems and subsystems are integrated and aligned with the organization's goals and objectives. It enhances risk assessment and selection. It allows for effective planning and strategy implementation. It supports and rewards innovation, cooperation, enhances flow, and creates value.

The work environment in a resilient organization is flexible and conducive to learning (from one's mistakes). It is designed so as to minimize latent defects in the systems. The strategy, objectives, goals, and metrics are integrated so as the accomplish excellence.

Conclusion

Performance management has taken on urgency in the realities of the 21^st century. The traditional business models and management approaches that have worked well in the past cannot be used to solve the problems of today (Einstein). It is these very tools and techniques that have gotten us to where we find ourselves now. The more productive approach is to identify the challenges, define the problems, face reality, stop treating the symptoms, dispel the myths, assess the organizational system and people constraints, foster integration, communicate a compelling vision, move away from command and control, foster trust, empower the people, and lead, lead, and lead.

Opinions expressed in Expert Commentary articles are those of the author and are not necessarily held by the author's employer or IRMI. Expert Commentary articles and other IRMI Online content do not purport to provide legal, accounting, or other professional advice or opinion. If such advice is needed, consult with your attorney, accountant, or other qualified adviser.