2018-02-23 18:58:03 +00:00
|
|
|
|
---
|
|
|
|
|
created_at: '2017-10-08T01:26:46.000Z'
|
|
|
|
|
title: Personal Observations on Reliability of Shuttle (1986)
|
|
|
|
|
url: https://history.nasa.gov/rogersrep/v2appf.htm
|
|
|
|
|
author: michaelsbradley
|
|
|
|
|
points: 109
|
|
|
|
|
story_text:
|
|
|
|
|
comment_text:
|
|
|
|
|
num_comments: 15
|
|
|
|
|
story_id:
|
|
|
|
|
story_title:
|
|
|
|
|
story_url:
|
|
|
|
|
parent_id:
|
|
|
|
|
created_at_i: 1507426006
|
|
|
|
|
_tags:
|
|
|
|
|
- story
|
|
|
|
|
- author_michaelsbradley
|
|
|
|
|
- story_15426562
|
|
|
|
|
objectID: '15426562'
|
2018-06-08 12:05:27 +00:00
|
|
|
|
year: 1986
|
2018-02-23 18:58:03 +00:00
|
|
|
|
|
|
|
|
|
---
|
2018-03-03 09:35:28 +00:00
|
|
|
|
**Report of the PRESIDENTIAL COMMISSION on the Space Shuttle Challenger
|
|
|
|
|
Accident**
|
2018-02-23 18:19:40 +00:00
|
|
|
|
|
2018-03-03 09:35:28 +00:00
|
|
|
|
** **
|
2018-02-23 18:19:40 +00:00
|
|
|
|
|
2018-03-03 09:35:28 +00:00
|
|
|
|
**Volume 2: Appendix F - Personal Observations on Reliability of
|
|
|
|
|
Shuttle**
|
2018-02-23 18:19:40 +00:00
|
|
|
|
|
2018-03-03 09:35:28 +00:00
|
|
|
|
by R. P. Feynman
|
|
|
|
|
|
|
|
|
|
** **
|
|
|
|
|
|
|
|
|
|
** **
|
|
|
|
|
|
|
|
|
|
**Introduction**
|
|
|
|
|
|
|
|
|
|
\[**F1**\] It appears that there are enormous differences of opinion as
|
|
|
|
|
to the probability of a failure with loss of vehicle and of human life.
|
|
|
|
|
The estimates range from roughly 1 in 100 to 1 in 100,000. The higher
|
|
|
|
|
figures come from the working engineers, and the very low figures from
|
|
|
|
|
management. What are the causes and consequences of this lack of
|
|
|
|
|
agreement? Since 1 part in 100,000 would imply that one could put a
|
|
|
|
|
Shuttle up each day for 300 years expecting to lose only one, we could
|
|
|
|
|
properly ask "What is the cause of management's fantastic faith in the
|
|
|
|
|
machinery?"
|
|
|
|
|
|
|
|
|
|
We have also found that certification criteria used in Flight Readiness
|
|
|
|
|
Reviews often develop a gradually decreasing strictness. The argument
|
|
|
|
|
that the same risk was flown before without failure is often accepted as
|
|
|
|
|
an argument for the safety of accepting it again. Because of this,
|
|
|
|
|
obvious weaknesses are accepted again and again, sometimes without a
|
|
|
|
|
sufficiently serious attempt to remedy them, or to delay a flight
|
|
|
|
|
because of their continued presence.
|
|
|
|
|
|
|
|
|
|
There are several sources of information. There are published criteria
|
|
|
|
|
for certification, including a history of modifications in the form of
|
|
|
|
|
waivers and deviations. In addition, the records of the Flight Readiness
|
|
|
|
|
Reviews for each flight document the arguments used to accept the risks
|
|
|
|
|
of the flight. Information was obtained from the direct testimony and
|
|
|
|
|
the reports of the range safety officer, Louis J. Ullian, with respect
|
|
|
|
|
to the history of success of solid fuel rockets. There was a further
|
|
|
|
|
study by him (as chairman of the launch abort safety panel (LASP)) in an
|
|
|
|
|
attempt to determine the risks involved in possible accidents leading to
|
|
|
|
|
radioactive contamination from attempting to fly a plutonium power
|
|
|
|
|
supply (RTG) for future planetary missions. The NASA study of the same
|
|
|
|
|
question is also available. For the History of the Space Shuttle Main
|
|
|
|
|
Engines, interviews with management and engineers at Marshall, and
|
|
|
|
|
informal interviews with engineers at Rocketdyne, were made. An
|
|
|
|
|
independent (Cal Tech) mechanical engineer who consulted for NASA about
|
|
|
|
|
engines was also interviewed informally. A visit to Johnson was made to
|
|
|
|
|
gather information on the reliability of the avionics (computers,
|
|
|
|
|
sensors, and effectors). Finally there is a report "A Review of
|
|
|
|
|
Certification Practices, Potentially Applicable to Man-rated Reusable
|
|
|
|
|
Rocket Engines," prepared at the Jet Propulsion Laboratory by N. Moore,
|
|
|
|
|
et al., in February, 1986, for NASA Headquarters, Office of Space
|
|
|
|
|
Flight. It deals with the methods used by the FAA and the military to
|
|
|
|
|
certify their gas turbine and rocket engines. These authors were also
|
|
|
|
|
interviewed informally.
|
|
|
|
|
|
|
|
|
|
** **
|
|
|
|
|
|
|
|
|
|
**Solid Fuel Rockets (SRB)**
|
|
|
|
|
|
|
|
|
|
An estimate of the reliability of solid rockets was made by the range
|
|
|
|
|
safety officer, by studying the experience of all previous rocket
|
|
|
|
|
flights. Out of a total of nearly 2,900 flights, 121 failed (1 in 25).
|
|
|
|
|
This includes, however, what may be called, early errors, rockets flown
|
|
|
|
|
for the first few times in which design errors are discovered and fixed.
|
|
|
|
|
A more reasonable figure for the mature rockets might be 1 in 50. With
|
|
|
|
|
special care in the selection of parts and in inspection, a figure of
|
|
|
|
|
below 1 in 100 might be achieved but 1 in 1,000 is probably not
|
|
|
|
|
attainable with today's technology. (Since there are two rockets on the
|
|
|
|
|
Shuttle, these rocket failure rates must be doubled to get Shuttle
|
|
|
|
|
failure rates from Solid Rocket Booster failure.)
|
|
|
|
|
|
|
|
|
|
NASA officials argue that the figure is much lower. They point out that
|
|
|
|
|
these figures are for unmanned rockets but since the Shuttle is a manned
|
|
|
|
|
vehicle "the probability of mission success is necessarily very close to
|
|
|
|
|
1.0." It is not very clear what this phrase means. Does it mean it is
|
|
|
|
|
close to 1 or that it ought to be close to 1? They go on to explain
|
|
|
|
|
"Historically this extremely high degree of mission success has given
|
|
|
|
|
rise to a difference in philosophy between manned space flight programs
|
|
|
|
|
and unmanned programs; i.e., numerical probability usage versus
|
|
|
|
|
engineering judgment." (These quotations are from "Space Shuttle Data
|
|
|
|
|
for Planetary Mission RTG Safety Analysis," Pages 3-1, 3-2, February 15,
|
|
|
|
|
1985, NASA, JSC.) It is true that if the probability of failure was as
|
|
|
|
|
low as 1 in 100,000 it would take an inordinate number of tests to
|
|
|
|
|
determine it ( you would get nothing but a string of perfect flights
|
|
|
|
|
from which no precise figure, other than that the probability is likely
|
|
|
|
|
less than the number of such flights in the string so far). But, if the
|
|
|
|
|
real probability is not so small, flights would show troubles, near
|
|
|
|
|
failures, and possible actual failures with a reasonable number of
|
|
|
|
|
trials. and standard statistical methods could give a reasonable
|
|
|
|
|
estimate. In fact, previous NASA experience had shown, on occasion, just
|
|
|
|
|
such difficulties, near accidents, and accidents, all giving warning
|
|
|
|
|
that the probability of flight failure was not so very small. The
|
|
|
|
|
inconsistency of the argument not to determine reliability through
|
|
|
|
|
historical experience, as the range safety officer did, is that NASA
|
|
|
|
|
also appeals to history, beginning "Historically this high degree of
|
|
|
|
|
mission success..." Finally, if we are to replace standard numerical
|
|
|
|
|
probability usage with engineering judgment, why do we find such an
|
|
|
|
|
enormous disparity between the management estimate and the judgment of
|
|
|
|
|
the engineers? It would appear that, for whatever purpose, be it for
|
|
|
|
|
internal or external consumption, the management of NASA exaggerates the
|
|
|
|
|
reliability of its product, to the point of fantasy.
|
|
|
|
|
|
|
|
|
|
The history of the certification and Flight Readiness Reviews will not
|
|
|
|
|
be repeated here. (See other part of Commission reports.) The phenomenon
|
|
|
|
|
of accepting for flight, seals that had shown erosion and blow-by in
|
|
|
|
|
previous flights, is very clear. The Challenger flight is an excellent
|
|
|
|
|
example. There are several references to flights that had gone before.
|
|
|
|
|
The acceptance and success of these flights is taken as evidence of
|
|
|
|
|
safety. But erosion and blow-by are not what the design expected. They
|
|
|
|
|
are warnings that something is wrong. The equipment is not operating as
|
|
|
|
|
expected, and therefore there is a danger that it can operate with even
|
|
|
|
|
wider deviations in this unexpected and not thoroughly understood way.
|
|
|
|
|
The fact that this danger did not lead to a catastrophe before is no
|
|
|
|
|
guarantee that it will not the next time, unless it is completely
|
|
|
|
|
understood. When playing Russian roulette the fact that the first shot
|
|
|
|
|
got off safely is little comfort for the next. The origin and
|
|
|
|
|
consequences of the erosion and blow-by were not understood. They did
|
|
|
|
|
not occur equally on all flights and all joints; sometimes more, and
|
|
|
|
|
sometimes less. Why not sometime, when whatever conditions determined it
|
|
|
|
|
were right, still more leading to catastrophe?
|
|
|
|
|
|
|
|
|
|
In spite of these variations from case to case, officials behaved as if
|
|
|
|
|
they understood it, giving apparently logical arguments to each other
|
|
|
|
|
often depending on the "success" of previous flights. For example. in
|
|
|
|
|
determining if flight 51-L was safe to fly in the face of ring erosion
|
|
|
|
|
in flight 51-C, it was noted that the erosion depth was only one-third
|
|
|
|
|
of the radius. It had been noted in an \[**F2**\] experiment cutting the
|
|
|
|
|
ring that cutting it as deep as one radius was necessary before the ring
|
|
|
|
|
failed. Instead of being very concerned that variations of poorly
|
|
|
|
|
understood conditions might reasonably create a deeper erosion this
|
|
|
|
|
time, it was asserted, there was "a safety factor of three." This is a
|
|
|
|
|
strange use of the engineer's term ,"safety factor." If a bridge is
|
|
|
|
|
built to withstand a certain load without the beams permanently
|
|
|
|
|
deforming, cracking, or breaking, it may be designed for the materials
|
|
|
|
|
used to actually stand up under three times the load. This "safety
|
|
|
|
|
factor" is to allow for uncertain excesses of load, or unknown extra
|
|
|
|
|
loads, or weaknesses in the material that might have unexpected flaws,
|
|
|
|
|
etc. If now the expected load comes on to the new bridge and a crack
|
|
|
|
|
appears in a beam, this is a failure of the design. There was no safety
|
|
|
|
|
factor at all; even though the bridge did not actually collapse because
|
|
|
|
|
the crack went only one-third of the way through the beam. The O-rings
|
|
|
|
|
of the Solid Rocket Boosters were not designed to erode. Erosion was a
|
|
|
|
|
clue that something was wrong. Erosion was not something from which
|
|
|
|
|
safety can be inferred.
|
|
|
|
|
|
|
|
|
|
There was no way, without full understanding, that one could have
|
|
|
|
|
confidence that conditions the next time might not produce erosion three
|
|
|
|
|
times more severe than the time before. Nevertheless, officials fooled
|
|
|
|
|
themselves into thinking they had such understanding and confidence, in
|
|
|
|
|
spite of the peculiar variations from case to case. A mathematical model
|
|
|
|
|
was made to calculate erosion. This was a model based not on physical
|
|
|
|
|
understanding but on empirical curve fitting. To be more detailed, it
|
|
|
|
|
was supposed a stream of hot gas impinged on the O-ring material, and
|
|
|
|
|
the heat was determined at the point of stagnation (so far, with
|
|
|
|
|
reasonable physical, thermodynamic laws). But to determine how much
|
|
|
|
|
rubber eroded it was assumed this depended only on this heat by a
|
|
|
|
|
formula suggested by data on a similar material. A logarithmic plot
|
|
|
|
|
suggested a straight line, so it was supposed that the erosion varied as
|
|
|
|
|
the .58 power of the heat, the .58 being determined by a nearest fit. At
|
|
|
|
|
any rate, adjusting some other numbers, it was determined that the model
|
|
|
|
|
agreed with the erosion (to depth of one-third the radius of the ring).
|
|
|
|
|
There is nothing much so wrong with this as believing the answer\!
|
|
|
|
|
Uncertainties appear everywhere. How strong the gas stream might be was
|
|
|
|
|
unpredictable, it depended on holes formed in the putty. Blow-by showed
|
|
|
|
|
that the ring might fail even though not, or only partially eroded
|
|
|
|
|
through. The empirical formula was known to be uncertain, for it did not
|
|
|
|
|
go directly through the very data points by which it was determined.
|
|
|
|
|
There were a cloud of points some twice above, and some twice below the
|
|
|
|
|
fitted curve, so erosions twice predicted were reasonable from that
|
|
|
|
|
cause alone. Similar uncertainties surrounded the other constants in the
|
|
|
|
|
formula, etc., etc. When using a mathematical model careful attention
|
|
|
|
|
must be given to uncertainties in the model.
|
|
|
|
|
|
|
|
|
|
** **
|
|
|
|
|
|
|
|
|
|
**Liquid Fuel Engine (SSME)**
|
|
|
|
|
|
|
|
|
|
During the flight of 51-L the three Space Shuttle Main Engines all
|
|
|
|
|
worked perfectly, even, at the last moment, beginning to shut down the
|
|
|
|
|
engines as the fuel supply began to fail. The question arises, however,
|
|
|
|
|
as to whether, had it failed, and we were to investigate it in as much
|
|
|
|
|
detail as we did the Solid Rocket Booster, we would find a similar lack
|
|
|
|
|
of attention to faults and a deteriorating reliability. In other words,
|
|
|
|
|
were the organization weaknesses that contributed to the accident
|
|
|
|
|
confined to the Solid Rocket Booster sector or were they a more general
|
|
|
|
|
characteristic of NASA? To that end the Space Shuttle Main Engines and
|
|
|
|
|
the avionics were both investigated. No similar study of the Orbiter, or
|
|
|
|
|
the External Tank were made.
|
|
|
|
|
|
|
|
|
|
The engine is a much more complicated structure than the Solid Rocket
|
|
|
|
|
Booster, and a great deal more detailed engineering goes into it.
|
|
|
|
|
Generally, the engineering seems to be of high quality and apparently
|
|
|
|
|
considerable attention is paid to deficiencies and faults found in
|
|
|
|
|
operation.
|
|
|
|
|
|
|
|
|
|
The usual way that such engines are designed (for military or civilian
|
|
|
|
|
aircraft) may be called the component system, or bottom-up design. First
|
|
|
|
|
it is necessary to thoroughly understand the properties and limitations
|
|
|
|
|
of the materials to be used (for turbine blades, for example), and tests
|
|
|
|
|
are begun in experimental rigs to determine those. With this knowledge
|
|
|
|
|
larger component parts (such as bearings) are designed and tested
|
|
|
|
|
individually. As deficiencies and design errors are noted they are
|
|
|
|
|
corrected and verified with further testing. Since one tests only parts
|
|
|
|
|
at a time these tests and modifications are not overly expensive.
|
|
|
|
|
Finally one works up to the final design of the entire engine, to the
|
|
|
|
|
necessary specifications. There is a good chance, by this time that the
|
|
|
|
|
engine will generally succeed, or that any failures are easily isolated
|
|
|
|
|
and analyzed because the failure modes, limitations of materials, etc.,
|
|
|
|
|
are so well understood. There is a very good chance that the
|
|
|
|
|
modifications to the engine to get around the final difficulties are not
|
|
|
|
|
very hard to make, for most of the serious problems have already been
|
|
|
|
|
discovered and dealt with in the earlier, less expensive, stages of the
|
|
|
|
|
process.
|
|
|
|
|
|
|
|
|
|
The Space Shuttle Main Engine was handled in a different manner, top
|
|
|
|
|
down, we might say. The engine was designed and put together all at once
|
|
|
|
|
with relatively little detailed preliminary study of the material and
|
|
|
|
|
components. Then when troubles are found in the bearings, turbine
|
|
|
|
|
blades, coolant pipes, etc., it is more expensive and difficult to
|
|
|
|
|
discover the causes and make changes. For example, cracks have been
|
|
|
|
|
found in the turbine blades of the high pressure oxygen turbopump. Are
|
|
|
|
|
they caused by flaws in the material, the effect of the oxygen
|
|
|
|
|
atmosphere on the properties of the material, the thermal stresses of
|
|
|
|
|
startup or shutdown, the vibration and stresses of steady running, or
|
|
|
|
|
mainly at some resonance at certain speeds, etc.? How long can we run
|
|
|
|
|
from crack initiation to crack failure, and how does this depend on
|
|
|
|
|
power level? Using the completed engine as a test bed to resolve such
|
|
|
|
|
questions is extremely expensive. One does not wish to lose an entire
|
|
|
|
|
engine in order to find out where and how failure occurs. Yet, an
|
|
|
|
|
accurate knowledge of this information is essential to acquire a
|
|
|
|
|
confidence in the engine reliability in use. Without detailed
|
|
|
|
|
understanding, confidence can not be attained.
|
|
|
|
|
|
|
|
|
|
A further disadvantage of the top-down method is that, if an
|
|
|
|
|
understanding of a fault is obtained, a simple fix, such as a new shape
|
|
|
|
|
for the turbine housing, may be impossible to implement without a
|
|
|
|
|
redesign of the entire engine.
|
|
|
|
|
|
|
|
|
|
The Space Shuttle Main Engine is a very remarkable machine. It has a
|
|
|
|
|
greater ratio of thrust to weight than any previous engine. It is built
|
|
|
|
|
at the edge of, or outside of, previous engineering experience.
|
|
|
|
|
Therefore, as expected, many different kinds of flaws and difficulties
|
|
|
|
|
have turned up. Because, unfortunately, it was built in the top-down
|
|
|
|
|
manner, they are difficult to find and fix. The design aim of a lifetime
|
|
|
|
|
of 55 missions equivalent firings (27,000 seconds of operation, either
|
|
|
|
|
in a mission of 500 seconds, or on a test stand) has not been obtained.
|
|
|
|
|
The engine now requires very frequent maintenance and replacement of
|
|
|
|
|
important parts, such as turbopumps, bearings, sheet metal housings,
|
|
|
|
|
etc. The high-pressure fuel turbopump had to be replaced every three or
|
|
|
|
|
four mission equivalents (although that may have been fixed, now) and
|
|
|
|
|
the high pressure oxygen turbopump every five or six. This is at most
|
|
|
|
|
ten percent of the original specification. But our main concern here is
|
|
|
|
|
the determination of reliability.
|
|
|
|
|
|
|
|
|
|
In a total of about 250,000 seconds of operation, the engines have
|
|
|
|
|
failed seriously perhaps 16 times. Engineering pays close attention to
|
|
|
|
|
these failings and tries to remedy them as quickly as possible. This it
|
|
|
|
|
does by test studies on special rigs experimentally designed for the
|
|
|
|
|
flaws in question, by careful inspection of the engine for suggestive
|
|
|
|
|
clues (like cracks), and by considerable study and analysis. In this
|
|
|
|
|
way, in spite of the difficulties of top-down design, through hard work,
|
|
|
|
|
many of the problems have apparently been solved.
|
|
|
|
|
|
|
|
|
|
\[**F3**\] A list of some of the problems follows. Those followed by an
|
|
|
|
|
asterisk (\*) are probably solved:
|
|
|
|
|
|
|
|
|
|
- Turbine blade cracks in high pressure fuel turbopumps (HPFTP). (May
|
|
|
|
|
have been solved.)
|
|
|
|
|
- Turbine blade cracks in high pressure oxygen turbopumps (HPOTP).
|
|
|
|
|
- Augmented Spark Igniter (ASI) line rupture.\*
|
|
|
|
|
- Purge check valve failure.\*
|
|
|
|
|
- ASI chamber erosion.\*
|
|
|
|
|
- HPFTP turbine sheet metal cracking.
|
|
|
|
|
- HPFTP coolant liner failure.\*
|
|
|
|
|
- Main combustion chamber outlet elbow failure.\*
|
|
|
|
|
- Main combustion chamber inlet elbow weld offset.\*
|
|
|
|
|
- HPOTP subsynchronous whirl.\*
|
|
|
|
|
- Flight acceleration safety cutoff system (partial failure in a
|
|
|
|
|
redundant system).\*
|
|
|
|
|
- Bearing spalling (partially solved).
|
|
|
|
|
- A vibration at 4,000 Hertz making some engines inoperable, etc.
|
|
|
|
|
|
|
|
|
|
Many of these solved problems are the early difficulties of a new
|
|
|
|
|
design, for 13 of them occurred in the first 125,000 seconds and only
|
|
|
|
|
three in the second 125,000 seconds. Naturally, one can never be sure
|
|
|
|
|
that all the bugs are out, and, for some, the fix may not have addressed
|
|
|
|
|
the true cause. Thus, it is not unreasonable to guess there may be at
|
|
|
|
|
least one surprise in the next 250,000 seconds, a probability of 1/500
|
|
|
|
|
per engine per mission. On a mission there are three engines, but some
|
|
|
|
|
accidents would possibly be contained, and only affect one engine. The
|
|
|
|
|
system can abort with only two engines. Therefore let us say that the
|
|
|
|
|
unknown suprises do not, even of themselves, permit us to guess that the
|
|
|
|
|
probability of mission failure do to the Space Shuttle Main Engine is
|
|
|
|
|
less than 1/500. To this we must add the chance of failure from known,
|
|
|
|
|
but as yet unsolved, problems (those without the asterisk in the list
|
|
|
|
|
above). These we discuss below. (Engineers at Rocketdyne, the
|
|
|
|
|
manufacturer, estimate the total probability as 1/10,000. Engineers at
|
|
|
|
|
marshal estimate it as 1/300, while NASA management, to whom these
|
|
|
|
|
engineers report, claims it is 1/100,000. An independent engineer
|
|
|
|
|
consulting for NASA thought 1 or 2 per 100 a reasonable estimate.)
|
|
|
|
|
|
|
|
|
|
The history of the certification principles for these engines is
|
|
|
|
|
confusing and difficult to explain. Initially the rule seems to have
|
|
|
|
|
been that two sample engines must each have had twice the time operating
|
|
|
|
|
without failure as the operating time of the engine to be certified
|
|
|
|
|
(rule of 2x). At least that is the FAA practice, and NASA seems to have
|
|
|
|
|
adopted it, originally expecting the certified time to be 10 missions
|
|
|
|
|
(hence 20 missions for each sample). Obviously the best engines to use
|
|
|
|
|
for comparison would be those of greatest total (flight plus test)
|
|
|
|
|
operating time -- the so-called "fleet leaders." But what if a third
|
|
|
|
|
sample and several others fail in a short time? Surely we will not be
|
|
|
|
|
safe because two were unusual in lasting longer. The short time might be
|
|
|
|
|
more representative of the real possibilities, and in the spirit of the
|
|
|
|
|
safety factor of 2, we should only operate at half the time of the
|
|
|
|
|
short-lived samples.
|
|
|
|
|
|
|
|
|
|
The slow shift toward decreasing safety factor can be seen in many
|
|
|
|
|
examples. We take that of the HPFTP turbine blades. First of all the
|
|
|
|
|
idea of testing an entire engine was abandoned. Each engine number has
|
|
|
|
|
had many important parts (like the turbopumps themselves) replaced at
|
|
|
|
|
frequent intervals, so that the rule must be shifted from engines to
|
|
|
|
|
components. We accept an HPFTP for a certification time if two samples
|
|
|
|
|
have each run successfully for twice that time (and of course, as a
|
|
|
|
|
practical matter, no longer insisting that this time be as large as 10
|
|
|
|
|
missions). But what is "successfully?" The FAA calls a turbine blade
|
|
|
|
|
crack a failure, in order, in practice, to really provide a safety
|
|
|
|
|
factor greater than 2. There is some time that an engine can run between
|
|
|
|
|
the time a crack originally starts until the time it has grown large
|
|
|
|
|
enough to fracture. (The FAA is contemplating new rules that take this
|
|
|
|
|
extra safety time into account, but only if it is very carefully
|
|
|
|
|
analyzed through known models within a known range of experience and
|
|
|
|
|
with materials thoroughly tested. None of these conditions apply to the
|
|
|
|
|
Space Shuttle Main Engine.
|
|
|
|
|
|
|
|
|
|
Cracks were found in many second stage HPFTP turbine blades. In one case
|
|
|
|
|
three were found after 1,900 seconds, while in another they were not
|
|
|
|
|
found after 4,200 seconds, although usually these longer runs showed
|
|
|
|
|
cracks. To follow this story further we shall have to realize that the
|
|
|
|
|
stress depends a great deal on the power level. The Challenger flight
|
|
|
|
|
was to be at, and previous flights had been at, a power level called
|
|
|
|
|
104% of rated power level during most of the time the engines were
|
|
|
|
|
operating. Judging from some material data it is supposed that at the
|
|
|
|
|
level 104% of rated power level, the time to crack is about twice that
|
|
|
|
|
at 109% or full power level (FPL). Future flights were to be at this
|
|
|
|
|
level because of heavier payloads, and many tests were made at this
|
|
|
|
|
level. Therefore dividing time at 104% by 2, we obtain units called
|
|
|
|
|
equivalent full power level (EFPL). (Obviously, some uncertainty is
|
|
|
|
|
introduced by that, but it has not been studied.) The earliest cracks
|
|
|
|
|
mentioned above occurred at 1,375 EFPL.
|
|
|
|
|
|
|
|
|
|
Now the certification rule becomes "limit all second stage blades to a
|
|
|
|
|
maximum of 1,375 seconds EFPL." If one objects that the safety factor of
|
|
|
|
|
2 is lost it is pointed out that the one turbine ran for 3,800 seconds
|
|
|
|
|
EFPL without cracks, and half of this is 1,900 so we are being more
|
|
|
|
|
conservative. We have fooled ourselves in three ways. First we have only
|
|
|
|
|
one sample, and it is not the fleet leader, for the other two samples of
|
|
|
|
|
3,800 or more seconds had 17 cracked blades between them. (There are 59
|
|
|
|
|
blades in the engine.) Next we have abandoned the 2x rule and
|
|
|
|
|
substituted equal time. And finally, 1,375 is where we did see a crack.
|
|
|
|
|
We can say that no crack had been found below 1,375, but the last time
|
|
|
|
|
we looked and saw no cracks was 1,100 seconds EFPL. We do not know when
|
|
|
|
|
the crack formed between these times, for example cracks may have formed
|
|
|
|
|
at 1,150 seconds EFPL. (Approximately 2/3 of the blade sets tested in
|
|
|
|
|
excess of 1,375 seconds EFPL had cracks. Some recent experiments have,
|
|
|
|
|
indeed, shown cracks as early as 1,150 seconds.) It was important to
|
|
|
|
|
keep the number high, for the Challenger was to fly an engine very close
|
|
|
|
|
to the limit by the time the flight was over.
|
|
|
|
|
|
|
|
|
|
Finally it is claimed that the criteria are not abandoned, and the
|
|
|
|
|
system is safe, by giving up the FAA convention that there should be no
|
|
|
|
|
cracks, and considering only a completely fractured blade a failure.
|
|
|
|
|
With this definition no engine has yet failed. The idea is that since
|
|
|
|
|
there is sufficient time for a crack to grow to a fracture we can insure
|
|
|
|
|
that all is safe by inspecting all blades for cracks. If they are found,
|
|
|
|
|
replace them, and if none are found we have enough time for a safe
|
|
|
|
|
mission. This makes the crack problem not a flight safety problem, but
|
|
|
|
|
merely a maintenance problem.
|
|
|
|
|
|
|
|
|
|
This may in fact be true. But how well do we know that cracks always
|
|
|
|
|
grow slowly enough that no fracture can occur in a mission? Three
|
|
|
|
|
engines have run for long times with a few cracked blades (about 3,000
|
|
|
|
|
seconds EFPL) with no blades broken off.
|
|
|
|
|
|
|
|
|
|
But a fix for this cracking may have been found. By changing the blade
|
|
|
|
|
shape, shot-peening the surface, and covering with insulation to exclude
|
|
|
|
|
thermal shock, the blades have not cracked so far.
|
|
|
|
|
|
|
|
|
|
A very similar story appears in the history of certification of the
|
|
|
|
|
HPOTP, but we shall not give the details here.
|
|
|
|
|
|
|
|
|
|
It is evident, in summary, that the Flight Readiness Reviews and
|
|
|
|
|
certification rules show a deterioration for some of the problems of the
|
|
|
|
|
Space Shuttle Main Engine that is closely analogous to the deterioration
|
|
|
|
|
seen in the rules for the Solid Rocket Booster.
|
|
|
|
|
|
|
|
|
|
** **
|
|
|
|
|
|
|
|
|
|
**Avionics**
|
|
|
|
|
|
|
|
|
|
By "avionics" is meant the computer system on the Orbiter as well as its
|
|
|
|
|
input sensors and output actuators. At first we will restrict ourselves
|
|
|
|
|
to the computers proper and not be concerned with the reliability of the
|
|
|
|
|
input information from the sensors of \[**F4**\] temperature, pressure,
|
|
|
|
|
etc., nor with whether the computer output is faithfully followed by the
|
|
|
|
|
actuators of rocket firings, mechanical controls, displays to
|
|
|
|
|
astronauts, etc.
|
|
|
|
|
|
|
|
|
|
The computer system is very elaborate, having over 250,000 lines of
|
|
|
|
|
code. It is responsible, among many other things, for the automatic
|
|
|
|
|
control of the entire ascent to orbit, and for the descent until well
|
|
|
|
|
into the atmosphere (below Mach 1) once one button is pushed deciding
|
|
|
|
|
the landing site desired. It would be possible to make the entire
|
|
|
|
|
landing automatically (except that the landing gear lowering signal is
|
|
|
|
|
expressly left out of computer control, and must be provided by the
|
|
|
|
|
pilot, ostensibly for safety reasons) but such an entirely automatic
|
|
|
|
|
landing is probably not as safe as a pilot controlled landing. During
|
|
|
|
|
orbital flight it is used in the control of payloads, in displaying
|
|
|
|
|
information to the astronauts, and the exchange of information to the
|
|
|
|
|
ground. It is evident that the safety of flight requires guaranteed
|
|
|
|
|
accuracy of this elaborate system of computer hardware and software.
|
|
|
|
|
|
|
|
|
|
In brief, the hardware reliability is ensured by having four essentially
|
|
|
|
|
independent identical computer systems. Where possible each sensor also
|
|
|
|
|
has multiple copies, usually four, and each copy feeds all four of the
|
|
|
|
|
computer lines. If the inputs from the sensors disagree, depending on
|
|
|
|
|
circumstances, certain averages, or a majority selection is used as the
|
|
|
|
|
effective input. The algorithm used by each of the four computers is
|
|
|
|
|
exactly the same, so their inputs (since each sees all copies of the
|
|
|
|
|
sensors) are the same. Therefore at each step the results in each
|
|
|
|
|
computer should be identical. From time to time they are compared, but
|
|
|
|
|
because they might operate at slightly different speeds a system of
|
|
|
|
|
stopping and waiting at specific times is instituted before each
|
|
|
|
|
comparison is made. If one of the computers disagrees, or is too late in
|
|
|
|
|
having its answer ready, the three which do agree are assumed to be
|
|
|
|
|
correct and the errant computer is taken completely out of the system.
|
|
|
|
|
If, now, another computer fails, as judged by the agreement of the other
|
|
|
|
|
two, it is taken out of the system, and the rest of the flight canceled,
|
|
|
|
|
and descent to the landing site is instituted, controlled by the two
|
|
|
|
|
remaining computers. It is seen that this is a redundant system since
|
|
|
|
|
the failure of only one computer does not affect the mission. Finally,
|
|
|
|
|
as an extra feature of safety, there is a fifth independent computer,
|
|
|
|
|
whose memory is loaded with only the programs of ascent and descent, and
|
|
|
|
|
which is capable of controlling the descent if there is a failure of
|
|
|
|
|
more than two of the computers of the main line four.
|
|
|
|
|
|
|
|
|
|
There is not enough room in the memory of the main line computers for
|
|
|
|
|
all the programs of ascent, descent, and payload programs in flight, so
|
|
|
|
|
the memory is loaded about four time from tapes, by the astronauts.
|
|
|
|
|
|
|
|
|
|
Because of the enormous effort required to replace the software for such
|
|
|
|
|
an elaborate system, and for checking a new system out, no change has
|
|
|
|
|
been made to the hardware since the system began about fifteen years
|
|
|
|
|
ago. The actual hardware is obsolete; for example, the memories are of
|
|
|
|
|
the old ferrite core type. It is becoming more difficult to find
|
|
|
|
|
manufacturers to supply such old-fashioned computers reliably and of
|
|
|
|
|
high quality. Modern computers are very much more reliable, can run much
|
|
|
|
|
faster, simplifying circuits, and allowing more to be done, and would
|
|
|
|
|
not require so much loading of memory, for the memories are much larger.
|
|
|
|
|
|
|
|
|
|
The software is checked very carefully in a bottom-up fashion. First,
|
|
|
|
|
each new line of code is checked, then sections of code or modules with
|
|
|
|
|
special functions are verified. The scope is increased step by step
|
|
|
|
|
until the new changes are incorporated into a complete system and
|
|
|
|
|
checked. This complete output is considered the final product, newly
|
|
|
|
|
released. But completely independently there is an independent
|
|
|
|
|
verification group, that takes an adversary attitude to the software
|
|
|
|
|
development group, and tests and verifies the software as if it were a
|
|
|
|
|
customer of the delivered product. There is additional verification in
|
|
|
|
|
using the new programs in simulators, etc. A discovery of an error
|
|
|
|
|
during verification testing is considered very serious, and its origin
|
|
|
|
|
studied very carefully to avoid such mistakes in the future. Such
|
|
|
|
|
unexpected errors have been found only about six times in all the
|
|
|
|
|
programming and program changing (for new or altered payloads) that has
|
|
|
|
|
been done. The principle that is followed is that all the verification
|
|
|
|
|
is not an aspect of program safety, it is merely a test of that safety,
|
|
|
|
|
in a non-catastrophic verification. Flight safety is to be judged solely
|
|
|
|
|
on how well the programs do in the verification tests. A failure here
|
|
|
|
|
generates considerable concern.
|
|
|
|
|
|
|
|
|
|
To summarize then, the computer software checking system and attitude is
|
|
|
|
|
of the highest quality. There appears to be no process of gradually
|
|
|
|
|
fooling oneself while degrading standards so characteristic of the Solid
|
|
|
|
|
Rocket Booster or Space Shuttle Main Engine safety systems. To be sure,
|
|
|
|
|
there have been recent suggestions by management to curtail such
|
|
|
|
|
elaborate and expensive tests as being unnecessary at this late date in
|
|
|
|
|
Shuttle history. This must be resisted for it does not appreciate the
|
|
|
|
|
mutual subtle influences, and sources of error generated by even small
|
|
|
|
|
changes of one part of a program on another. There are perpetual
|
|
|
|
|
requests for changes as new payloads and new demands and modifications
|
|
|
|
|
are suggested by the users. Changes are expensive because they require
|
|
|
|
|
extensive testing. The proper way to save money is to curtail the number
|
|
|
|
|
of requested changes, not the quality of testing for each.
|
|
|
|
|
|
|
|
|
|
One might add that the elaborate system could be very much improved by
|
|
|
|
|
more modern hardware and programming techniques. Any outside competition
|
|
|
|
|
would have all the advantages of starting over, and whether that is a
|
|
|
|
|
good idea for NASA now should be carefully considered.
|
|
|
|
|
|
|
|
|
|
Finally, returning to the sensors and actuators of the avionics system,
|
|
|
|
|
we find that the attitude to system failure and reliability is not
|
|
|
|
|
nearly as good as for the computer system. For example, a difficulty was
|
|
|
|
|
found with certain temperature sensors sometimes failing. Yet 18 months
|
|
|
|
|
later the same sensors were still being used, still sometimes failing,
|
|
|
|
|
until a launch had to be scrubbed because two of them failed at the same
|
|
|
|
|
time. Even on a succeeding flight this unreliable sensor was used again.
|
|
|
|
|
Again reaction control systems, the rocket jets used for reorienting and
|
|
|
|
|
control in flight still are somewhat unreliable. There is considerable
|
|
|
|
|
redundancy, but a long history of failures, none of which has yet been
|
|
|
|
|
extensive enough to seriously affect flight. The action of the jets is
|
|
|
|
|
checked by sensors, and, if they fail to fire the computers choose
|
|
|
|
|
another jet to fire. But they are not designed to fail, and the problem
|
|
|
|
|
should be solved.
|
|
|
|
|
|
|
|
|
|
** **
|
|
|
|
|
|
|
|
|
|
**Conclusions**
|
|
|
|
|
|
|
|
|
|
If a reasonable launch schedule is to be maintained, engineering often
|
|
|
|
|
cannot be done fast enough to keep up with the expectations of
|
|
|
|
|
originally conservative certification criteria designed to guarantee a
|
|
|
|
|
very safe vehicle. In these situations, subtly, and often with
|
|
|
|
|
apparently logical arguments, the criteria are altered so that flights
|
|
|
|
|
may still be certified in time. They therefore fly in a relatively
|
|
|
|
|
unsafe condition, with a chance of failure of the order of a percent (it
|
|
|
|
|
is difficult to be more accurate).
|
|
|
|
|
|
|
|
|
|
Official management, on the other hand, claims to believe the
|
|
|
|
|
probability of failure is a thousand times less. One reason for this may
|
|
|
|
|
be an attempt to assure the government of NASA perfection and success in
|
|
|
|
|
order to ensure the supply of funds. The other may be that they
|
|
|
|
|
sincerely believed it to be true, demonstrating an almost incredible
|
|
|
|
|
lack of communication between themselves and their working engineers.
|
|
|
|
|
|
|
|
|
|
In any event this has had very unfortunate consequences, the most
|
|
|
|
|
serious of which is to encourage ordinary citizens to fly in such a
|
|
|
|
|
dangerous machine, as if it had attained the safety of an ordinary
|
|
|
|
|
airliner. The astronauts, like test pilots, should know their risks, and
|
|
|
|
|
we honor them for their courage. Who can doubt that McAuliffe was
|
|
|
|
|
equally a person of great courage, who was closer to an awareness of the
|
|
|
|
|
true risk than NASA management would have us believe?
|
|
|
|
|
|
|
|
|
|
\[**F5**\] Let us make recommendations to ensure that NASA officials
|
|
|
|
|
deal in a world of reality in understanding technological weaknesses and
|
|
|
|
|
imperfections well enough to be actively trying to eliminate them. They
|
|
|
|
|
must live in reality in comparing the costs and utility of the Shuttle
|
|
|
|
|
to other methods of entering space. And they must be realistic in making
|
|
|
|
|
contracts, in estimating costs, and the difficulty of the projects. Only
|
|
|
|
|
realistic flight schedules should be proposed, schedules that have a
|
|
|
|
|
reasonable chance of being met. If in this way the government would not
|
|
|
|
|
support them, then so be it. NASA owes it to the citizens from whom it
|
|
|
|
|
asks support to be frank, honest, and informative, so that these
|
|
|
|
|
citizens can make the wisest decisions for the use of their limited
|
|
|
|
|
resources.
|
|
|
|
|
|
|
|
|
|
For a successful technology, reality must take precedence over public
|
|
|
|
|
relations, for nature cannot be fooled.
|