Since the interconnect link can be anything including a simple serial port, an Ethernet link or a custom parallel bus, the update section of the software is critical. The challenge is that the software engineer, who is building the FT software, needs to work with the hardware team, as well as understand the system requirement, to implement the interlink update software.
Let us say we are implementing an FT system for a telephone exchange. From the revenue point of view, both duplicated controllers should be consistent for all billing data of on-going calls. With a metering data resolution of one second, and let us say that this controller handles about 500 customers, link speed of 2Mbps to 5Mbps will be good for consistency. The choice of interlink is based on the system’s need and the smallest resolution that the system has to handle. This is a critical aspect that every implementer should be aware of.
Often, the interlink speed and performance requirement is an after-thought, leading to sub-optimal performance of the system. Fig. 6 shows the high-level software architecture for FT systems.
For mission-critical systems, like avionics, railway-signalling controllers, medical devices and nuclear plant systems, a failure may be life-threatening. These systems have multiple CPUs (three or more) and use a complex, majority logic based voting system to implement the FT system.
CON-MON architecture. One of the unique architectures frequently used in avionics and other systems is CON-MON architecture. The name CON-MON stands for control-and-monitor processor architecture. This architecture is not fault-tolerant as the main function of this architecture is to sound an alarm when the main CPU fails. It uses two CPUs—the main CPU, which controls the function of the system, and a small microcontroller, which monitors the main CPU through the WDT run-out.
You may think what the advantage of this architecture is. Let us assume that the small microcontroller is not there. When the main CPU restarts due to a fault, details about the fault, like the time and duration, are lost, and failure is known only when the main CPU stops working. With the CON-MON architecture, the smaller controller will log this data, apart from raising the alarm.