463 lines
25 KiB
Markdown
463 lines
25 KiB
Markdown
---
|
||
created_at: '2015-10-22T07:26:33.000Z'
|
||
title: A small British firm shows that software bugs aren't inevitable (2005)
|
||
url: http://spectrum.ieee.org/computing/software/the-exterminators
|
||
author: cjg
|
||
points: 55
|
||
story_text:
|
||
comment_text:
|
||
num_comments: 58
|
||
story_id:
|
||
story_title:
|
||
story_url:
|
||
parent_id:
|
||
created_at_i: 1445498793
|
||
_tags:
|
||
- story
|
||
- author_cjg
|
||
- story_10430862
|
||
objectID: '10430862'
|
||
year: 2005
|
||
|
||
---
|
||
![Photo of Peter Amey \[left\] and Roderick
|
||
Chapman](/image/MzAxMzYyNw.jpeg) Photo: Peter Searle **Checking Code:**
|
||
At Praxis, Peter Amey \[left\] and Roderick Chapman use mathematical
|
||
logic to make sure their programs do not contain errors.
|
||
|
||
Peter Amey was an aeronautical engineer serving in the United Kingdom’s
|
||
Royal Air Force in the early 1980s when he found a serious flaw in an
|
||
aircraft missile-control system being deployed at the time. It wasn’t a
|
||
defect in any of the thousands of mechanical and electronic parts that
|
||
constituted the system’s hardware. The problem was in the system’s
|
||
software. Amey found an erroneous piece of program code—a bug \[see
|
||
photo, “Checking Code”\]. Because of it, the unthinkable could happen:
|
||
under rare circumstances, a missile could fire without anyone’s having
|
||
commanded it to do so.
|
||
|
||
Amey says his superiors, rather than commending his discovery,
|
||
complained that it would delay the system’s deployment. Like most
|
||
project managers, they didn’t like the idea of fixing errors at the end
|
||
of the development process. After all, good design ought to keep errors
|
||
out in the first place. Yet time and again, Amey knew, the software
|
||
development process didn’t prevent bugs; it merely put off dealing with
|
||
them until the end. Did it have to be that way? Or could developers
|
||
avoid bugs in the first place? He would find the answer to be “yes”
|
||
when, years later, he joined Praxis High Integrity Systems \[see photo,
|
||
“[Bug
|
||
Killer](/images/sep05/images/extf2.jpg "Photo: Peter Searle")”\].
|
||
|
||
Praxis, headquartered in Bath, 2 hours from London by car, was founded
|
||
in 1983 by a group of software experts who firmly believed they could
|
||
put together a sound methodology to ruthlessly exterminate bugs during
|
||
all stages of a software project.
|
||
|
||
At the time, the software world was in a malaise that it hasn’t fully
|
||
shaken even today \[see “[Why Software Fails](/sep05/1685),” in this
|
||
issue\]. Software projects were getting larger and more complex, and as
|
||
many as 70 percent of them, by some estimates, were running into
|
||
trouble: going over budget, missing deadlines, or collapsing completely.
|
||
Even projects considered successful were sometimes delivering software
|
||
without all the features that had been promised or with too many
|
||
errors—errors that, as in the missile-firing system, were sometimes
|
||
extremely serious. The personal computer era, then just starting, only
|
||
reinforced a development routine of “compile first, debug later.”
|
||
|
||
Praxis armed itself not only with an arsenal of the latest software
|
||
engineering methods but also with something a little more unusual in the
|
||
field: mathematical logic. The company is one of the foremost software
|
||
houses to use mathematically based techniques, known as formal methods,
|
||
to develop software.
|
||
|
||
Basically, formal methods require that programmers begin their work not
|
||
by writing code but rather by stringing together special symbols that
|
||
represent the program’s logic. Like a mathematical theorem, these symbol
|
||
strings can be checked to verify that they form logically correct
|
||
statements. Once the programmer has checked that the program doesn’t
|
||
have logical flaws, it’s a relatively simple matter to convert those
|
||
symbols into programming code. It’s a way to eliminate bugs even before
|
||
you start writing the actual program.
|
||
|
||
Praxis doesn’t claim it can make bug-free software, says Amey, now the
|
||
company’s chief technical officer. But he says the methodology pays off.
|
||
Bugs are notoriously hard to count, and estimates of how common they are
|
||
vary hugely. With an average of less than one error in every 10 000
|
||
lines of delivered code, however, Praxis claims a bug rate that is at
|
||
least 50—and possibly as much as 1000—times better than the industry
|
||
standard.
|
||
|
||
Praxis is still a small, lonely asteroid compared to the Jupiter-size
|
||
companies that dominate the software universe—companies like Microsoft,
|
||
Oracle, and SAP. The tiny British software house doesn’t make products
|
||
for the masses; it focuses on complex, custom systems that need to be
|
||
highly reliable. Such mission-critical systems are used to control
|
||
military systems, industrial processes, and financial applications,
|
||
among other things.
|
||
|
||
Sometimes the software needs to work 99.999 percent of the time, like an
|
||
air-traffic control program Praxis delivered some years ago. Sometimes
|
||
it needs to be really, really secure, like the one Praxis recently
|
||
developed for the National Security Agency, the supersecret U.S. signals
|
||
intelligence and cryptographic agency, in Fort Meade, Md.
|
||
|
||
And though Praxis employs just 100 people, its name has become
|
||
surprisingly well known. “They’re very, very talented, with a very
|
||
different approach,” says John C. Knight, a professor of computer
|
||
science at the University of Virginia and the editor in chief of IEEE
|
||
Transactions on Software Engineering. Praxis’s founders, he says,
|
||
believed that building software wasn’t as hard as people made it out to
|
||
be. “They thought, it isn’t rocket science, just very careful
|
||
engineering.”
|
||
|
||
Watts S. Humphrey, who once ran software development at IBM and is now a
|
||
fellow at the Software Engineering Institute at Carnegie Mellon
|
||
University, in Pittsburgh, also speaks highly of Praxis. He says the
|
||
company’s methodology incorporates things like quality control that
|
||
should be more widely used in the field. In fact, Humphrey spent this
|
||
past summer at Praxis headquarters to learn how they do things. He wants
|
||
to use that knowledge to improve a complementary methodology he
|
||
developed to help organizations better manage their software projects.
|
||
|
||
Praxis’s approach, however, isn’t perfect and isn’t for everybody.
|
||
Formal methods obviously are no silver bullet. For one thing, using
|
||
formal methods can take more time and require new skills, all of which
|
||
can mean higher up-front costs for a client. In fact, Praxis charges
|
||
more—50 percent more in some cases—than the standard daily rate. To this
|
||
its engineers will say: “You get what you pay for; our bug rate speaks
|
||
for itself.”
|
||
|
||
And although formal methods have been used to great effect in small and
|
||
medium-size projects, no one has yet managed to apply them to large
|
||
ones. There’s some reason to think no one ever will, except perhaps in a
|
||
limited fashion. Nevertheless, even though the company may not have all
|
||
the answers to make software projects more successful, those working in
|
||
the field can learn plenty of lessons from it, say advocates like Knight
|
||
and Humphrey.
|
||
|
||
**Software was conceived as** a mathematical artifact in the early days
|
||
of modern computing, when British mathematician Alan Turing formalized
|
||
the concept of algorithm and computation by means of his now famous
|
||
Turing Machine, which boils the idea of a computer down to an idealized
|
||
device that steps though logical states.
|
||
|
||
But over time, software development gradually became more of a craft
|
||
than a science. Forget the abstractions and the mathematical
|
||
philosophers. Enter the realm of fearless, caffeinated programmers who
|
||
can churn out hundreds of lines of code a day (often by hastily gluing
|
||
together existing pieces of code). The problem is that for some
|
||
projects, even tirelessness, dedication, and skill aren’t good enough if
|
||
the strategy is wrong.
|
||
|
||
Large, complex software systems usually involve so many modules that
|
||
dealing with them all can overwhelm a team following an insufficiently
|
||
structured approach. That’s especially true of the mission-critical
|
||
applications Praxis develops, as well as of large enterprise
|
||
resource-planning systems, of the sort used by Fortune 500 companies,
|
||
and complex data-driven software, such as the FBI’s Virtual Case File
|
||
project \[see “[Who Killed the Virtual Case
|
||
File?](/computing/software/who-killed-the-virtual-case-file)” in this
|
||
issue\].
|
||
|
||
Even when you break such a big program down into small, seemingly
|
||
manageable pieces, making a change to one turns out to affect 10 others,
|
||
which may in turn affect scores or maybe even hundreds of other pieces.
|
||
It may happen that making all the fixes will require more time and money
|
||
than you have. If your system’s correct operation depends on those
|
||
changes, you’ll have to either admit defeat or scramble to find a way to
|
||
salvage what you’ve done so far, perhaps by giving up on some of the
|
||
capabilities or features you’d hoped to have in the software.
|
||
|
||
As it turns out, complete failure—projects canceled before completion—is
|
||
the fate of 18 percent of all information technology projects surveyed
|
||
in a 2004 study by consultancy Standish Group International Inc., in
|
||
West Yarmouth, Mass. Apparently that’s the good news; the rate 10 years
|
||
ago, according to Standish, was 31 percent.
|
||
|
||
Still, the overall picture is pretty bleak. Standish asserts that more
|
||
than 50 percent of the thousands of projects it surveyed faced problems,
|
||
from being turned over without significant features to going well beyond
|
||
their deadlines or budgets. In the end, according to the Standish
|
||
numbers, only 28 percent of projects could be considered successes by
|
||
any rigorous definition.
|
||
|
||
Standish’s numbers, however, are far from universally accepted in the
|
||
computer industry. For contract software projects, more specifically,
|
||
other analyses in recent years have put the success rate as low as 16
|
||
percent and as high as 62 percent. Nevertheless, even using those
|
||
numbers as a guide, it’s hard not to see the contract software business
|
||
as anything but an enterprise all too often mired in mediocrity. As one
|
||
study by consultant Capers Jones, in Marlborough, Mass., put it: “Large
|
||
software systems...have one of the highest failure rates of any
|
||
manufactured object in human history.”
|
||
|
||
Today, ever more sophisticated tools are available to help companies
|
||
manage all aspects of their software projects. These tools help
|
||
conceptualize and design the system; manage all people, files,
|
||
computers, and documents involved; keep track of all versions and
|
||
changes made to the system and its modules; and automate a number of
|
||
tests that can be used to find system errors.
|
||
|
||
Indeed, worldwide sales of such software development tools, according to
|
||
Stamford, Conn.based market research firm Gartner Inc., generate more
|
||
than US $3 billion a year. Rational Software Corp., a company acquired
|
||
by IBM in 2002 for $2.1 billion, is currently the market leader,
|
||
followed by Microsoft, Computer Associates International, Compuware,
|
||
Borland, and others, according to Gartner.
|
||
|
||
But the effect of widespread use of these tools on overall software
|
||
quality hasn’t been gauged in a detailed or rigorous way. Some would
|
||
even argue that the sector is a little reminiscent of the market for
|
||
diet products: it, too, is a billion-dollar industry, and yet, somehow,
|
||
obesity as a public health problem hasn’t gone away. And, just as the
|
||
few successful diet strategies all seem to require a major change in
|
||
lifestyle, perhaps, too, the software failure rates won’t improve
|
||
significantly without a basic and widespread shift in tactics.
|
||
|
||
Certainly, Praxis’s experience supports that idea. Consider one of the
|
||
company’s recent projects, for Mondex International Inc., a financial
|
||
services company founded in the UK that is now a subsidiary of
|
||
MasterCard International Inc. First, a little background. Mondex had a
|
||
product called an electronic purse, a credit cardlike payment card that
|
||
stored electronic cash. That is, it did not debit a bank account or draw
|
||
on a line of credit; it stored the cash digitally in an embedded chip.
|
||
Mondex wanted to make the card flexible enough to run a variety of
|
||
applications that would keep track not only of electronic cash but also
|
||
of discount coupons, loyalty reward points, and other items still
|
||
unimagined.
|
||
|
||
The critical issue was to make sure that only cards with legitimate
|
||
applications would work; any other card, even if programmed to pass as a
|
||
Mondex card, would be deemed invalid. The solution Mondex chose was to
|
||
use a special program, known as a certification authority, that would
|
||
run on a central computer at the company’s headquarters. The
|
||
certification authority would generate unique digital certificates—long
|
||
strings of numbers—to accompany all applications on the cards. That way,
|
||
a card reader at, say, a store could validate a card’s certificates by
|
||
running them through a series of mathematical operations that would
|
||
prove unequivocally that they came from Mondex.
|
||
|
||
Mondex hired Praxis to develop the certification authority, which was
|
||
the most critical part of the whole system. After all, if the security
|
||
of one card was broken, then just that one card could be forged. But
|
||
compromising the certification authority would allow mass forgery of
|
||
cards.
|
||
|
||
**The Praxis team began working** on the solution in late 1998. The
|
||
first step was to hammer out what, precisely, the Mondex system was
|
||
supposed to do—in software jargon, the system’s requirements. These are
|
||
essentially English-language bullet points that detail everything the
|
||
program will do but not how it will be done.
|
||
|
||
Getting the requirements right is perhaps the most critical part of
|
||
Praxis’s methodology. For that reason, Praxis engineers held many
|
||
exhaustive meetings with the people from Mondex, during which they tried
|
||
to imagine all possible scenarios of what could happen. As Praxis does
|
||
for all its projects, it insisted that Mondex make available not only
|
||
its IT people but everyone who would have any involvement with the
|
||
product—salespeople, accountants, senior managers, and perhaps even the
|
||
CEO. “We focus very hard on identifying all stakeholders, everybody that
|
||
cares,” says Roderick Chapman, a principal engineer at Praxis \[see
|
||
photo, “[Spark
|
||
Maker](/images/sep05/images/extf3.jpg "Photo: Peter Searle")”\].
|
||
|
||
To make sure Praxis was on target with the system requirements, it
|
||
devised a prototype program that simulated the graphical interface of
|
||
the proposed system. This prototype had no real system underlying it;
|
||
data and commands entered through the interface served only to check the
|
||
requirements. In fact, Praxis made no further use of the prototype—the
|
||
real graphical interface would be developed later, using much more
|
||
rigorous methods. In following this approach, Praxis was complying with
|
||
an edict from Frederick P. Brooks’s 1975 classic study of software
|
||
development, The Mythical Man-Month: Essays on Software Engineering
|
||
(Addison-Wesley, 2nd edition, 1995):
|
||
|
||
> In most projects, the first system built is barely usable. It may be
|
||
> too slow, too big, awkward to use, or all three. There is no
|
||
> alternative but to start again, smarting but smarter and build a
|
||
> redesigned version in which these problems are solved. The discard and
|
||
> redesign may be done in one lump, or it may be done piece-by-piece.
|
||
> But all large-system experience shows that it will be done....
|
||
>
|
||
> Hence plan to throw one away; you will, anyhow.
|
||
|
||
Once Praxis’s engineers had a general idea of what the system would do,
|
||
they began to describe it in great detail, in pages and pages of
|
||
specifications. For example, if a requirement said that every user’s
|
||
action on the system should produce an audit report, then the
|
||
corresponding specification would flesh out what data should be logged,
|
||
how the information should be formatted, and so on.
|
||
|
||
This is the first math-intensive phase, because the specifications are
|
||
written mostly in a special language called Z (pronounced the British
|
||
way: “zed”). It’s not a programming language—it doesn’t tell a computer
|
||
how to do something—but it is a formal specification language that
|
||
expresses notions in ways that can be subjected to proof. Its purpose is
|
||
simple: to detect ambiguities and inconsistencies. This forces engineers
|
||
to resolve the problems right then and there, before the problems are
|
||
built into the system.
|
||
|
||
Z, which was principally designed at the University of Oxford, in
|
||
England, in the late 1970s and early 1980s, is based on set theory and
|
||
predicate logic. Once translated into Z, a program’s validity can be
|
||
reviewed by eye or put through theorem-proving software tools. The goal
|
||
is to spot bugs as soon as possible \[see sidebar, “[Bugproof
|
||
Code](/computing/software/the-exterminators/bugproof-code)”\].
|
||
|
||
The process is time-consuming. For the Mondex project, spec-writing took
|
||
nearly a year, or about 25 percent of the entire development process.
|
||
That was a long time to go without producing anything that looks like a
|
||
payoff, concedes Andrew Calvert, Mondex’s information technology liaison
|
||
for the project. “Senior management would say: ‘We are 20 percent into
|
||
the project and we’re getting nothing. Why aren’t we seeing code? Why
|
||
aren’t we seeing implementation?’” he recalls. “I had to explain that we
|
||
were investing much more than usual in the initial analysis, and that we
|
||
wouldn’t see anything until 50 percent of the way through.” For
|
||
comparison, in most projects, programmers start writing code before the
|
||
quarter-way mark.
|
||
|
||
Only after Praxis’s engineers are sure that they have logically correct
|
||
specifications written in Z do they start turning the statements into
|
||
actual computer code. The programming language they used in this case,
|
||
called Spark, was also selected for its precision. Spark, based on Ada,
|
||
a programming language created in the 1970s and backed by the U.S.
|
||
Department of Defense, was designed by Praxis to eliminate all
|
||
expressions, functions, and notations that can make a program behave
|
||
unpredictably.
|
||
|
||
By contrast, many common programming languages suffer from ambiguity.
|
||
Take, for example, the programming language C and the expression “i++ \*
|
||
i++,” in which “\*” denotes a multiplication and “++” means you should
|
||
increment the variable “i” by 1. It’s not an expression a programmer
|
||
would normally use; yet it serves to illustrate the problem. Suppose “i”
|
||
equals 7. What’s the value of the expression? Answer: it is not possible
|
||
to know. Different compilers—the special programs that transform source
|
||
code into instructions that microprocessors can understand—would
|
||
interpret the expression in different ways. Some would do the
|
||
multiplication before incrementing either “i,” giving 49 as the answer.
|
||
Others would increment the first “i” only and then do the
|
||
multiplication, giving 56 as the answer. Yet others would do unexpected
|
||
things.
|
||
|
||
Such a problem could not happen in Spark, says Praxis’s Chapman, because
|
||
all such ambiguous cases were considered—and eliminated—when the
|
||
language was created. Coding with Spark thus helps Praxis achieve
|
||
reduced bug rates. In fact, once Spark code has been written, Chapman
|
||
says, it has the uncanny tendency to work the first time, just as you
|
||
wanted. “Our defect rate with Spark is at least 10 times, sometimes 100
|
||
times lower than those created with other languages,” he says.
|
||
|
||
Peter Amey explains that the two-step translation—from English to Z and
|
||
from Z to Spark—lets engineers keep everything in mind. “You can’t
|
||
reason across the semantic gap between English and code,” he says, “but
|
||
the gap from English to an unambiguous mathematical language is smaller,
|
||
as is the gap from that language to code.”
|
||
|
||
What’s more, Spark lets engineers analyze certain properties of a
|
||
program—the way data flows through the program’s variables, for
|
||
example—without actually having to compile and run it. Such a
|
||
technique, called static analysis, often lets them prevent two serious
|
||
software errors: using uninitialized variables, which may inject
|
||
spurious values into the program, and allocating data to a memory area
|
||
that is too small, a problem known as buffer overflow.
|
||
|
||
In practice, though, not everything can be put through the mathematical
|
||
wringer. Problems with the way different modules exchange data, for
|
||
instance, by and large have to be solved the old-fashioned way: by
|
||
thinking. Nor can Praxis completely eliminate classic trial-and-error
|
||
testing, in which the programmers try to simulate every situation the
|
||
software is likely to confront.
|
||
|
||
But what Praxis does do is make such simulation a last resort, instead
|
||
of the main line of defense against bugs. (As famed computer scientist
|
||
Edsger Dijkstra wrote, “Program testing can be used to show the presence
|
||
of bugs, but never to show their absence\!”) For the Mondex project,
|
||
such testing took up 34 percent of the contract time. That’s in the
|
||
lower end of the usual share, which typically ranges from 30 to 90
|
||
percent. Reduced efforts on testing mean huge savings that go a long way
|
||
toward balancing the extra time spent on the initial analysis.
|
||
|
||
The system went live in 1999. Though it cost more up front, the contract
|
||
called for Praxis to fix for free any problem—that is, any deviation
|
||
from the specification—that came up in the first year of operation, a
|
||
guarantee rarely offered in the world of contract software. That first
|
||
year, just four defects triggered the clause. According to Chapman,
|
||
three of the problems were so trivial that they took no more than a few
|
||
hours to correct. Only one was functionally significant; it took two
|
||
days to fix. With about 100 000 lines of code, that’s an average of 0.04
|
||
faults per 1000 lines. Fault rates for projects not using formal
|
||
methods, by some estimates, can vary from 2 to 10 per 1000 lines of
|
||
code, and sometimes more.
|
||
|
||
For Mondex, fewer bugs meant saving money. Calvert estimates that Mondex
|
||
will spend 20 to 25 percent less than the norm in maintenance costs over
|
||
the lifetime of the project.
|
||
|
||
**Formal methods were relatively new** when Praxis started using them,
|
||
and after some ups and downs, they have recently been gaining
|
||
popularity. Among their leading proponents are John Rushby at SRI
|
||
International, Menlo Park, Calif.; Constance Heitmeyer, at the U.S.
|
||
Naval Research Laboratory’s Center for High Assurance Computer Systems,
|
||
Washington, D.C.; Jonathan Bowen at London South Bank University; the
|
||
developers of Z at the University of Oxford and other institutions; and
|
||
the supporters of other specification languages, such as B, VDM, Larch,
|
||
Specware, and Promela.
|
||
|
||
In recent years, even Microsoft has used formal methods, applying them
|
||
to develop small applications, such as a bug-finding tool used in-house
|
||
and also a theorem-proving “driver verifier,” which makes sure device
|
||
drivers run properly under Windows.
|
||
|
||
But still, the perceived difficulty of formal tools repels the
|
||
rank-and-file programmer. After all, coders don’t want to solve logical
|
||
problems with the help of set theory and predicate logic. They want to,
|
||
well, code. “Few people, even among those who complete computer science
|
||
degrees, are skilled in those branches of pure mathematics,” says
|
||
Bernard Cohen, a professor in the department of computing at City
|
||
University, in London.
|
||
|
||
In every other branch of engineering, he insists, practitioners master
|
||
difficult mathematical notations. “Ask any professional engineer if he
|
||
could do the job without math, and you’ll get a very rude reply,” Cohen
|
||
says. But in programming, he adds, the emphasis has often been to ship
|
||
it and let the customer find the bugs.
|
||
|
||
Until formal methods become easier to use, Cohen says, Praxis and
|
||
companies like it will continue to rely on clients’
|
||
“self-selection”—only those users who are highly motivated to get
|
||
rock-solid software will beat a path to their door. Those that need
|
||
software to handle functions critical to life, limb, national security,
|
||
or the survival of a company will self-select; so will those that are
|
||
contractually obligated to meet software requirements set by some
|
||
regulator. That’s the case with many military contractors that now need
|
||
to demonstrate their use of formal methodologies to government
|
||
purchasers; the same goes for financial institutions. Mondex, for
|
||
instance, required the approval of the Bank of England, in London, and
|
||
formal methods were part of that approval.
|
||
|
||
Yet even if regulators were omnipresent, not all problems would be
|
||
amenable to formal methods, at least not to those that are available
|
||
now. First, there is the problem of scaling. The largest system Praxis
|
||
has ever built had 200 000 lines of code. For comparison, Microsoft
|
||
Windows XP has around 40 million, and some Linux versions have more than
|
||
200 million. And that’s nothing compared with the monster programs that
|
||
process tax returns for the U.S. Internal Revenue Service or manage a
|
||
large telecom company’s infrastructure. Such systems can total hundreds
|
||
of millions of lines of code.
|
||
|
||
What does Praxis say about that? “The simple answer is, we’ve never gone
|
||
that big,” says Chapman. “We believe these methods should scale, but we
|
||
have no evidence that they won’t or that they will.” So what if a client
|
||
approaches Praxis with a really big project? Would the company handle
|
||
it? “The key weapon is abstraction,” he says. “If you can build
|
||
abstractions well enough, you should be able to break things down into
|
||
bits you can handle.” That maxim guides every other discipline in
|
||
engineering, not least the design of computer hardware. Why not apply it
|
||
to software, too?
|
||
|
||
## About the Author
|
||
|
||
Philip E. Ross (IEEE Member) wrote “[Managing Care Through the
|
||
Air](/computing/networks/managing-care-through-the-air)” for the
|
||
December 2004 issue of IEEE Spectrum. His work has also appeared in
|
||
Scientific American, Forbes, and Red Herring.
|