Artemis II’s Fault Tolerant Computer

The other day, I wrote about the amazing engineering behind Voyager 1. Space is a harsh environment and equipment that would be perfectly fine here on Earth would not survive in space.

Artemis II, which carried the crew farther away from Earth than any human had even been, is another case in point. Obviously, the Artemis II program is full of excellent engineering but, as recounted in the on-line Communication of the ACM, Artemis II featured a “fail silent” computer with multiple CPUs that could survive everything from a cosmic ray induced bit flip to total processor divergence. The goal was to survive any hardware failure with no downtime.

This was important because unlike the Apollo computer, which was concerned only with guidance, Artemis II’s computer had a hand in almost every safety-critical system. The computer is built so that effectively 8 CPUs are running the flight software in parallel. These CPU are spread across two computers each with two Flight Control Modules. The term “fail silent” means that a computer will remain silent rather than give a wrong answer. The system gets its answer from one of the computers that hasn’t failed. Naturally, the failed computers reset themselves and resynchronize with the others.

Read the article to see how amazing this system is. The software is equally amazing. They even have a Backup Flight Software system that is implemented on different hardware running a different OS and software to help guard against a software failure on the primary system.

As we know now, this system operated perfectly and was able to take the Artemis II crew on a moon flyby and deliver them back Earth without a problem. Even if you aren’t a hardware nerd, you’ll enjoy reading this article.

This entry was posted in General and tagged . Bookmark the permalink.