hn-classics/_stories/2009/12016234.md

345 lines
17 KiB
Markdown
Raw Normal View History

---
created_at: '2016-07-01T14:23:37.000Z'
title: Automated to Death (2009)
url: http://spectrum.ieee.org/computing/software/automated-to-death
author: mcspecter
points: 100
story_text:
comment_text:
num_comments: 45
story_id:
story_title:
story_url:
parent_id:
created_at_i: 1467383017
_tags:
- story
- author_mcspecter
- story_12016234
objectID: '12016234'
2018-06-08 12:05:27 +00:00
year: 2009
---
2018-03-03 09:35:28 +00:00
The passengers and crew of Malaysia Airlines Flight 124 were just
settling into their five-hour flight from Perth to Kuala Lumpur that
late on the afternoon of 1 August 2005. Approximately 18 minutes into
the flight, as the Boeing 777-200 series aircraft was climbing through
36 000 feet altitude on autopilot, the aircraft—suddenly and without
warning—pitched to 18 degrees, nose up, and started to climb rapidly. As
the plane passed 39 000 feet, the stall and overspeed warning indicators
came on simultaneously—something thats supposed to be impossible, and a
situation the crew is not trained to handle.
2018-02-23 18:19:40 +00:00
2018-03-03 09:35:28 +00:00
At 41 000 feet, the command pilot disconnected the autopilot and lowered
the airplanes nose. The auto throttle then commanded an increase in
thrust, and the craft plunged 4000 feet. The pilot countered by manually
moving the throttles back to the idle position. The nose pitched up
again, and the aircraft climbed 2000 feet before the pilot regained
control.
2018-02-23 18:19:40 +00:00
2018-03-03 09:35:28 +00:00
The flight crew notified air-traffic control that they could not
maintain altitude and requested to return to Perth. The crew and the 177
shaken but uninjured passengers safely returned to the ground.
2018-02-23 18:19:40 +00:00
2018-03-03 09:35:28 +00:00
The [Australian Transport Safety Bureau
investigation](http://www.atsb.gov.au/publications/investigation_reports/2005/aair/aair200503722.aspx)
discovered that the air data inertial reference unit (ADIRU)—which
provides air data and inertial reference data to several systems on the
Boeing 777, including the primary flight control and autopilot flight
director systems—had two faulty accelerometers. One had gone bad in
2001. The other failed as Flight 124 passed 36 571 feet.
The fault-tolerant ADIRU was designed to operate with a failed
accelerometer (it has six). The redundant design of the ADIRU also meant
that it wasnt mandatory to replace the unit when an accelerometer
failed.
Photo: John Boyd
Watch a forensic simulation of key moments of Malaysian Airlines Flight
124
However, when the second accelerometer failed, a latent software anomaly
allowed inputs from the first faulty accelerometer to be used, resulting
in the erroneous feed of acceleration information into the flight
control systems. The anomaly, which lay hidden for a decade, wasnt
found in testing because the ADIRUs designers had never considered that
such an event might occur.
The Flight 124 crew had fallen prey to what psychologist Lisanne
Bainbridge in the early 1980s identified as the ironies and paradoxes of
automation. The irony, she said, is that the more advanced the automated
system, the more crucial the contribution of the human operator becomes
to the successful operation of the system. Bainbridge also discusses the
paradoxes of automation, the main one being that the more reliable the
automation, the less the human operator may be able to contribute to
that success. Consequently, operators are increasingly left out of the
loop, at least until something unexpected happens. Then the operators
need to get involved quickly and flawlessly, says Raja Parasuraman,
professor of psychology at George Mason University in Fairfax, Va., who
has been studying the issue of increasingly reliable automation and how
that affects human performance, and therefore overall system
performance.
“There will always be a set of circumstances that was not expected, that
the automation either was not designed to handle or other things that
just cannot be predicted,” explains Parasuraman. So as system
reliability approaches—but doesnt quite reach—100 percent, “the more
difficult it is to detect the error and recover from it,” he says.
And when the human operator cant detect the systems error, the
consequences can be tragic.
![DC crash](/image/1464827)
Photo: Win McNamee/Getty Images
In June of this year, a Washington Metropolitan Area Transit Authority
(Metro) red line subway train operated by Jeanice McMillan rear-ended a
stationary subway train outside Fort Totten station in northeast
Washington, killing McMillan and eight others and injuring 80. The cause
is still under investigation by the U.S. National Transportation Safety
Board (NTSB), but it appears that a safety-signal system design anomaly
was at fault, in which a spurious signal generated by a track circuit
module transmitter mimicked a valid signal and bypassed the rails via an
unintended signal path. The spurious signal was sensed by the module
receiver, which resulted in the train not being detected when it stopped
in the track circuit where the accident occurred. So the safety system
thought the track was clear when it was not. When she saw the other
train in her path, a surprised McMillan hit the emergency brake in an
attempt to slow her train, which may have been traveling nearly 95
kilometers per hour (59 miles per hour), but it was too late.
To put this accident in perspective, however, it was only the second
fatal crash involving Washington, D.C.s Metro in its 33 years of
operation. In 2008, customers took 215 million trips on the system. Not
counting train-vehicle accidents, a total of 27 people were killed and
324 people were injured in [train accidents in the United States
in 2008](http://safetydata.fra.dot.gov/OfficeofSafety/publicsite/Query/statsSas.aspx).
This compares with statistics from 1910, when W.L. Park, general
superintendent of the Union Pacific Railroad, asserted that “one human
being is killed every hour, and one injured every 10 minutes.”
Not only has automation improved train safety, but travel by plane,
ship, and automobile is safer too. [According to
Boeing](http://www.boeing.com/commercial/safety/pf/pf_howsafe.html), in
2000 the worlds commercial jet airlines carried approximately 1.09
billion people on 18 million flights and suffered only 20 fatal
accidents. The NTSB estimates that traffic deaths in the United States
may drop by 30 percent after electronic stability control becomes
mandatory in 2012 for automobiles.
Charles Perrow, professor emeritus of sociology at Yale University and
author of the landmark book Normal Accidents: Living With High-Risk
Technologies (Princeton University Press, 1999), contends that
“productivity, successful launches, successful targeting, and so on,
increase sharply with automation,” with the result being that “system
failures become more rare.”
One can see this in aviation. As automation has increased aircraft
safety, the rarity of crashes has made it harder to find common causes
for them, the NTSB says.
However, the underlying reason for this rarity, namely the ubiquity of
increasingly reliable automation, is also becoming a concern for system
designers and safety regulators alike, especially as systems become ever
more complex. While designers are trying to automate as much as they
can, complex interactions of hardware systems and their software end up
causing surprising emergencies that the designers never considered—as on
Flight 124—and which humans are often ill-equipped to deal with.
“The really hard things to automate or synthesize, we leave to the
operator to do,” says Ericka Rovira, an assistant professor of
engineering psychology at the U.S. Military Academy at West Point. That
means people have to be alert and ready to act at the most crucial
moments, even though the monotony of monitoring supposedly reliable
systems can leave them figuratively or physically asleep at the wheel.
![Royal Majesty cruise ship](/image/1464842)
Photo: Steven Senne/AP Photo
That was the case in June 1995, when the 568-foot-long [cruise ship
Royal Majesty ran
aground](http://www.ntsb.gov/publictn/1997/MAR9701.htm) onto the sandy
Rose and Crown Shoal about 10 miles east of Nantucket Island, off the
coast of Massachusetts. Fifty-two minutes after leaving St. Georges,
Bermuda, on its way to Boston, the Royal Majestys GPS antenna cable
became detached from the GPS antenna. This placed the GPS in
dead-reckoning mode, which does not take into consideration wind or sea
changes. The degraded GPS continued to feed the ships autopilot. No one
noticed the change in GPS status, even though the GPS position was
supposed to be checked hourly against the Loran-C radio navigation
system, which is accurate by roughly one-half to 1 nautical mile at sea,
and a quarter-mile as a ship approaches shore. The Royal Majesty
proceeded on autopilot for the next 34 hours until it hit the Rose and
Crown Shoal.
Why hadnt the watch officers noticed something was wrong? One major
reason, the NTSB said, was that the ships watch officers had become
overreliant on the automated features of the integrated bridge system.
Watch officers, who in less-automated times actively monitored the
current environment and used this information to control the ship, are
now relegated to passively monitoring the status and performance of the
ships automated systems, the NTSB said. The previous flawless
performance of the equipment also likely encouraged this overreliance.
Checking the accuracy of the GPS system and autopilot perhaps seemed
like a waste of time to the watch officers, like checking a watch
against the Coordinated Universal Time clock every hour.
In many ways, operators are being asked to be omniscient systems
administrators who are able to jump into the middle of a situation that
a complex automated system cant or wasnt designed to handle, quickly
diagnose the problem, and then find a satisfactory and safe solution.
And if they dont, the operators, not the systems designers, often get
the blame.
Adding another system to help detect the error and recover from it isnt
a straightforward solution either. In Flight 124, the fault-tolerant,
redundant system design helped to mask the problem. In fact, such
redundancy often merely serves to act as yet another layer that
abstracts the human operator from the systems operational control.
“In other words, the initial control loop is done by one system, and
then you have a computer that is backing up that system, and another is
backing up that one,” according to Parasuraman. “Finally, you have to
display some information to the operator, but the operator is now so far
from the system and the complexity is so great that their developing a
\[mental\] model of how to deal with something going wrong becomes very,
very difficult.”
Economics also figures into the equation. The ADIRU in Flight 124s
Boeing 777 was designed to be fault-tolerant and redundant not only to
increase safety but also to reduce operating costs by deferring
maintenance.
“The assumption is that automation is not only going to make \[what you
are doing\] safer but that it will make it more efficient,” says Martyn
Thomas, a Fellow of the UKs Royal Academy of Engineering. “This creates
a rather nasty feedback loop, which means that when adverse events
become relatively rare, it is taken as an opportunity to deskill the
people youre employing or to reduce their number in order to reduce a
cost.”
This erosion of skills in pilots was a major concern raised in the last
decade as glass cockpits in aircraft became common \[see IEEE Spectrums
article [The Glass
Cockpit,](http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=406460&isnumber=9128)
September 1995\].
Peter Ladkin, a professor of computer networks and distributed systems
at Bielefeld University, in Germany, is heavily involved in aircraft
accident investigations and is a pilot himself. “Line pilots are trained
and told—and their procedures also say—to use the automation all the
time. Many older pilots are really worried that when they get into
difficulties, they arent going to \[know how to\] get out of them,” he
says.
![Turkish Airlines photo](/image/1464857)
Photo: United Photos/Reuters
The crash in February of Turkish Airlines Flight 1951 just short of
Amsterdams Schiphol International Airport, which killed 9 people and
injured 86 others, raised this concern anew. As the aircraft passed
through 1950 feet, [the left radio altimeter failed and indicated an
altitude of 8
feet](http://www.onderzoeksraad.nl/en/index.php/onderzoeken/Neergestort-tijdens-nadering/),
which it passed on to the autopilot, which in turn reduced engine power
because it assumed the aircraft was in the final stages of approach. The
pilots did not initially react to the warnings that something was wrong
until it was too late to recover the aircraft.
“When we start removing active learning for the operator, the operator
begins to overtrust the automation,” Rovira says. “Theyre not going
back and gathering those data pieces that they need” to make an
effective decision.
Another issue associated with overtrusting the automation is that it can
encourage “practical drift,” a term coined by Scott Snook in his book
Friendly Fire: The Accidental Shootdown of U.S. Black Hawks over
Northern Iraq (Princeton University Press, 2002). The phrase refers to
the slow, steady uncoupling of practice from written procedures.
![NW Flight 188 map](/image/1464873)
Illustration: Boswell/MCT/Landov
We see how that happened in the Royal Majesty incident, where the watch
officers failed to follow established procedures. This was also the case
in the October incident involving Northwest Airlines Flight 188 on its
way to Minneapolis-St. Paul International Airport, which overshot the
airport by 150 miles. The pilots claimed they were working on their
laptops and lost track of the time and their location. The aircraft was
on autopilot, which in normal circumstances leaves the pilots with
little left to do other than monitor the controls.
 Again, when you are only a systems monitor, especially for an
automated system that rarely, if ever, fails, it is hard not to get
fatigued or bored and start taking shortcuts.
The situation isnt hopeless, however. For some time now, researchers
have been working to address the ironies and paradoxes of automation.
One new approach has been to address the issues from the human point of
view instead of the point of view of the system.
“We draw a systems boundary in the wrong place,” Thomas states. “There
is an assumption that the system boundary that the engineer should be
interested in \[sits\] at the boundary of the sensors and actuators of
the box that is being designed by the engineers. The humans who are
interrelating with these systems are outside it. Whether they are
operators, pilots, controllers, or clinicians, they are not part of the
system.
“That is just wrong,” Thomas adds. “The systems designer, engineer, and
overall architect all need to accept responsibility for the ways those
people are going to act.”
Victor Riley, associate technical fellow in Boeing Flight Deck, Crew
Operations, argues that there needs to be a two-way dialogue between the
operator and the automated system.
“The operator-to-the-system part of the dialogue is more important than
the system-to-the-operator part,” Riley says. “People see what they
expect to see, and what they expect to see is based on what they thought
they told the system to do.”
Studies by Parasuraman, Rovira, and others have found that operators of
highly reliable automated systems will often perform worse than if they
were operating a lower-reliability system, which seems paradoxical.
Parasuraman explains that “if you deliberately engineer anomalies into
the automation, people rely less on it and will perform a little bit
better in monitoring the system. For example, if the system is 90
percent reliable, operators will be better at picking up the 10 percent
of the errors than if the system is 99 percent reliable.”
Rovira also says that operators need to be able to see how well the
automation is working in a given context.
“The goal for us as designers is to provide an interface that allows a
drill-down if the operator needs to query the system, in the event they
have a different perspective of the decision than the automation has
given them,” Rovira says. “Or if not a drill-down, there should be some
visibility or transparency right up front about what the underlying
constraints or variables are that make this decision not totally
reliable.”
Maybe one way to remind ourselves of the potential effects of the
ironies and paradoxes of automation is to simply pull the plug.
“If we dont want people to depend on automated systems, we need to turn
them off sometimes,” Thomas observes. “People, after all, are the backup
systems, and they arent being exercised.”
## About the Author
Robert N. Charette, an IEEE Spectrum contributing editor, is a
self-described “risk ecologist” who investigates the impact of the
changing concept of risk on technology and societal development.
Charette also writes Spectrum s blog [The Risk
Factor](/blog/computing/it/riskfactor).
Advertisement