The crisis over getting the U.S. Affordable Care Act, or “Obamacare,” website to work might benefit from a little historical perspective. At its root is a fundamental disconnect between the capabilities of computer and networking hardware, and the craft nature of software production.
Advances in hardware and networking are based on engineering principles that have deep roots in electrical and communications engineering going back to the 19th century. Not that hardware is perfect, but those engineers draw from a long tradition that emphasizes testing, prototyping, failure analysis, redundancy and other procedures in engineering curricula from day one. We see the results: pocket-sized computers that are rugged, powerful and compact, with enormous storage capacity, and able to communicate around the world at almost instantaneous speeds.
Software, in spite of a long effort to emulate those traditions, is forever playing catch-up. As early as 1968, when a typical computer filled a room and was programmed by punched cards, a group of computer scientists convened a conference, under the auspices of the NATO Science Committee, on this topic. They published their findings in a report with a deliberately provocative title: “Software Engineering.” It was a bold attempt to place software on an equal, rigorous footing with more classic engineering disciplines. The phrase “software engineering” took root, but the “engineering” aspect of the practice never quite got there. It is humbling to read some of the comments expressed at that conference:
- “Many people agreed that one of the main problems was the pressure to produce even bigger and more sophisticated systems. … I am concerned about the current growth of systems, and what I expect is probably an exponential growth of errors.”
- Another participant described an application intended for a hospital. The problem with its implementation was traced to a poor interface between the applications software that the users saw, and the underlying “systems” software that was buried deeper in the machine. This apparently is the same difficulty that has plagued the Affordable Care Act website, HealthCare.gov. As Joseph November demonstrated in his recent book “Biomedical Computing,” health care delivery stubbornly refuses to this day to benefit fully from the computer revolution.
- Although women were involved in programming at the time, one conference presenter was obsessed with the number of “man-months” it took to produce a complex program. A few years after the conference, Frederick P. Brooks of IBM wrote a book called “The Mythical Man-Month,” in which he argued that when a software project is in trouble, throwing people (men or women) at it makes things worse, not better. The observation has since become known as “Brooks’ Law”: adding people to a software project that is late makes it later. The law is well-known and presumably is taught in introductory computer science and engineering classes, but it is often honored in the breach.
It is painful to read the NATO report in the context of the difficulties of the Affordable Care Act website. Regardless of how one feels about the merits of this legislation, one ought to be concerned that this failed computer system threatens Americans’ ability to obtain good health care.
It did not have to be this way. In August 2012, NASA’s Curiosity rover landed on Mars, executing line after line of complex code on a machine millions of miles from Earth, with no chance of fixing a bug if one occurred. The software obviously did not have the challenge faced by the Obamacare website, namely the need to accept input from millions of citizens trying to log on. However, the Curiosity software did consist of a number of complex software modules, each tailored for a portion of the “seven minutes of terror” as the machine descended to the red planet. At predetermined moments, the control was transferred from one module to another — this is the place in a complex system where bugs, often fatal, occur. But the people at the Jet Propulsion Laboratory, and the contractors who worked with them on the programing, knew that, and they made sure that the interfaces were cleanly specified. It had to work right the first time, and it did.
Developers of aerospace software, who typically work in-house at government laboratories or are contractors with very close ties to the project managers, have learned the lessons of the 1968 NATO conference.
I once interviewed a woman who worked at the Massachusetts Institute of Technology (MIT) Instrumentation Laboratory (now called Draper Lab), where the critical code for the Apollo Moon landings was written. This was in the mid-1960s, around the time of the NATO conference and before many formal methods of software checking were known. She told me that her team tested the software by the “Auge Kugel” method. That’s the German phrase for “eyeball”: They looked at the code and tried to find errors in it. That was hardly a formal, engineered way of debugging software, and at NASA’s insistence it was supplemented by more formal methods (although the Auge Kugel method, now known by the more prosaic term “walkthrough,” was not abandoned). At first the MIT programmers resisted, but after a NASA programmer found some potentially serious errors after checking the code, the Draper Lab people came around. The result came to be known as a formal “validation and verification” procedure, variations of which have served NASA well over the following decades.
What I also think she meant by that term was that when human lives depend on the software, you do not rest until you are sure that you have thought of every possible thing that could be wrong with the code you have written. And after you do that, you test, and test again.
The software developers for the Curiosity rover had a similar arrangement: They set up a separate team whose sole job was to look at others’ code and see if they could find errors in it.
Since 1968 a number of more formal techniques have been developed, but in the case of the Affordable Care Act website, none of them seems to have been used to any effect. To be fair, the specifications for the Apollo software were unambiguous: Get two astronauts onto the surface of the Moon, and get the crew back home safely. Still, one should not make excuses about the ambiguity of the specifications when a piece of software does not work. Making excuses and pointing fingers only makes the profession of computer programming look bad.
There is a famous phrase attributed to Harry S. Truman: “The only thing new in the world is the history you don’t know.” That is an overstatement, but it applies well in this case. I am sure the programmers working on a fix for this current problem have a number of techniques at their disposal. To those techniques I would suggest one more: a knowledge of the half-century history of their field.
Paul Ceruzzi is chairman of the Space History Division at the Smithsonian Institution’s National Air and Space Museum. He can be reached at ceruzzip@si.edu.