It isn’t really a book, but Richard Feynman’s Appendix to the Challenger Disaster Report is still definitely something you should read. It’s not particularly long, but it’s educational and relevant not just as an example of critical thinking in action, but as a reminder not to fool oneself; neither individually, nor on an organizational level. Sadly, while much was learned from the events leading to and surrounding the Challenger disaster, over thirty years later many of us can still find a lot of the same things to relate to in our own professional lives. There isn’t a single magic solution, because these problems are subtle and often masquerade as normal.
Feynman and the Challenger Disaster
Richard Feynman (1918-1988) was a Nobel Prize winning physicist and one of the best-known scientists of his time. In 1986 he somewhat reluctantly agreed to join the Rogers Commission, whose task was to investigate the Challenger disaster. The space shuttle Challenger had exploded a little more than a minute after launch, killing everyone on board. The commission’s job was to find out what had gone wrong and how it had happened, and figure out how to keep it from happening again.
Feynman, who had undergone cancer-related surgery at the time, was initially reluctant to join the commission for simple reasons: he didn’t want to go anywhere near Washington and didn’t want anything at all to do with government. As for the shuttle itself, he would read about it going up and coming down but it bothered him a little that he never saw in any scientific journal any results of the experiments carried out on the shuttle, so as a result he wasn’t paying much attention to it. Ultimately, he did join the commission and in the process changed himself. The shuttle and its related systems were feats of engineering at (and sometimes beyond) the very limits of technology at the time. He hadn’t fully appreciated the enormous number of people working on the shuttle, and the sheer scale of their dedicated effort. He came to see how the accident was a terrible blow, and became greatly determined to do what he could to help.
It came out that the cause of the disaster was that O-rings on one of the sections of solid rocket booster had failed, but that was only the proximate cause of the disaster. The real problem was a culture of NASA management reducing criteria and accepting more and more errors while, as Feynman put it, “engineers are screaming from below HELP! and This is a RED ALERT!” This demonstrates an almost incredible lack of communication between management and working engineers, which was itself a clue as to what was really wrong at an organizational level.
How Did This Happen?
This situation didn’t happen all at once; it grew over time. NASA was filled with dedicated people, but at an organizational level it had developed a culture of gradually decreasing strictness when it came to certifications for flight readiness. A common argument for accepting flight risks was that the same risk was flown before with no failure, and that fact was accepted as an argument for the safety of accepting it again. As a result, obvious weaknesses and problems were accepted repeatedly. This was not limited to the O-rings in the solid rocket boosters that caused the catastrophic failure with Challenger. A slow shift toward lower standards was evident time and time again in other areas as well; safety criteria were being subtly altered with apparently logical arguments for doing so. What was happening was that NASA was fooling itself.
Fooling Oneself Still Happens Today
Much has been learned from the Challenger disaster and similar cases, but over 30 years later people and organizations still struggle with the same basic issues and end up with an environment of bad decision-making. There’s no easy solution, but at least it’s possible to understand more about what to look out for. There isn’t always a particular broken part to blame or replace, and there isn’t always someone specifically at fault. The things that go wrong can be subtle and numerous, and the environment it creates may seem actually normal in an oh-well-what-can-you-do kind of way.
Feynman often asserted that you must never fool yourself. Once you have succeeded in not fooling yourself, it’s easier to not fool others. Feynman observed several ways in which NASA had grown to fool itself, and a common thread was lack of communication.
For example, management confidently estimated the chance of shuttle failure as 1 in 100,000 whereas engineering estimated it closer to 1 in 100 or 1 in 200, and were fervently crossing their fingers for every flight.
Feynman’s experiences on the commission led him to think hard about how this situation actually happened. It struck him that there was a fair bit of fishiness associated with the “big cheeses” at NASA. Every time the commission spoke to higher level managers, they kept saying they didn’t know anything about the problems below them. Assuming the higher-ups weren’t lying, there had to be some reason why problems at lower levels weren’t making it up to them. Feynman suspected that it came down to bad communication due to management and engineering fundamentally having different priorities and goals. When the engineers at the bottom say things like “No, no! We can’t do that unless this because it would mean such-and-such!” and the higher-ups don’t want to hear such talk, pretty soon attitudes start to change and you get an environment that suppresses bad news. Feynman described this process in “What Do You Care What Other People Think?“:
Maybe they don’t say explicitly “Don’t tell me,” but they discourage communication, which amounts to the same thing. It’s not a question of what has been written down, or who should tell what to whom; it’s a question of whether, when you do tell someone about some problem, they’re delighted to hear about it and they say “Tell me more,” and “Have you tried such-and-such?” or whether they instead say “Well, see what you can do about it” — which is a completely different atmosphere. If you try once or twice to communicate and get pushed back, pretty soon you decide “To hell with it.”
That was Feynman’s theory: because promises being made at the top are inconsistent with reality at the bottom, communications got slowed up and ultimately jammed; that’s how it’s possible the higher-ups at NASA actually didn’t know about problems below them. I can’t help but think of all the modern-day situations where technical staff are left to figure out how to deliver on unrealistic promises sales or executives have made, and somehow get through the mess only to have to do it all over again the next week, and I wonder if Feynman wasn’t right on the money.
Two things are certain: people and organizations still fool themselves in similar ways today, and lack of communication is always a factor. But fooling others always starts with fooling oneself, and when it comes to that Feynman had clear advice: “The first principle is that you must not fool yourself — and you are the easiest person to fool. After you’ve not fooled yourself, it’s easy not to fool others. You just have to be honest in a conventional way after that.”
Note: There is a proper book related to this article. “What Do You Care What Other People Think?” by Richard Feynman devotes its second half to Feynman’s experience working on the Rogers Commission.