I’ve worked in two industries where safety and reliability of soft designs are serious concerns: manned spaceflight at NASA and FDA Medical devices (and there was a few years in the Navy Nuclear Propulsion program, but I was just a knuckle-dragging MM3 back then). . And if you are a software engineer in a safety critical system design and you’ve never missed a little sleep worrying if you made a mistake, you probably have. For safety critical systems, the key questions are how do I know my design will do what I say and how do I know it does not have some unforeseen latent error? FPGA based state machines (also sometimes referred to Finite State Machines (FSM)) offer a unique advantage over a traditional CPU design in that safety critical tasks can be truly isolated and run on hardware tailored to the tasks.
A CPU is a state machine. Every CPU architecture is a general purpose state machine that accepts instructions (aka assembly, machine code) and performs IO or manipulates data based on those instructions. The advances in speed, power, and price in CPUs is breathtaking. We have a $25 Raspberry Pi at the office that can render real time texture shading onto a 1080p image. But all this power carries several drawbacks for high reliability required systems if you use the typical recipe of “high level language + compiler + CPU”:
- If you have any kind of hardware IO, you share CPU cycles with an interrupt handler that cannot be deterministically modeled over time. Interrupts happen when they happen and your safety critical code shares the same processor and memory.
- The abstraction from the machine code provided by a high level language is just that, an abstraction from what is really happening at the low level. Yes you can examine the machine code produced by the compiler but that somewhat negates the value of a high level language. Or in other words, you can’t have both abstraction from underlying complexity and clear insight into the inner workings. Decreasing design complexity increases confidence in correctness
- The CPU and memory are a shared resource among all software routines executing on a processor. If one swimmer in the pool exhibits poor judgement, all swimmers are exposed to the same water. Yes you can have various memory protection schemes that are well tested and robust, but even those have their own cost in time, complexity, and money.
FPGAs offer an alternative platform to implement the same digital logic and algorithms that a CPU + Software but in a dedicated purpose state machine. Sure you get less processing power per $$ or watt (usually) but you gain several key advantages:
-
- The algorithm can be truly isolated from any non-essential IO. Your algorithm gets its own swimming pool. You don’t have to prove that tweaking the GUI update task did not alter the behavior of your reactor cooling monitor because you can point to physical and logical isolation of the tasks.
- You can tailor your algorithm to your problem. FPGA’s have less abstraction between language and implementation. This leads to greater clarity of design. It is much easier to prove a positive in testing and documentation than to disprove a negative. With an isolated VHDL algorithm, I don’t have to show why the parameters of my execution environment (i.e. other tasks, stack size, interrupt handlers, OS time-slicing, etc) don’t hurt me because I define my execution environment as part of my algorithm.
- Algorithm re-use gets easier when your algorithms don’t intersect each other. Once timing is closed on a particular block, it can be moved unchanged to other FPGA fabrics
Although funding concerns have prevented this project reaching market, we had a recent medical device project that served as the inspiration and proving ground for this approach. The customer was time constrained to build a prototype of a complex medical device that integrated several OEM vendor modules from different sources into a single device. They wanted to base the prototype on an existing mature medical device that had already been thoroughly tested. The challenge was the existing product only had one available serial port for this expansion but needed to communicate with a half dozen additional sensors that all wanted to communicate via serial.
We chose to route all the new sensors into an FPGA and build a serial data router in state machines in the FPGA fabric. The biggest advantage to this approach is fault isolation. Each sensor had its own dedicated data path that shared no resources with other sensors. The serial data stream from the sensor was de-serialized first then passed through a “parser” that used pattern matching to extract the data of interest. The parsed data of interest was then re-packetized into a single serial data stream of a common format. In this way we interleaved serial data in many formats into a common format single stream.
The reverse path to command all the different sensors from a single serial stream was similar but in the opposite direction. The command stream was de-serialized and the data of interest was then routed to unique command handlers based upon a pattern matching algorithm. Each sensor had its own command handler that took the data of interest and re-packetized it into the native sensor format.
With this design, we could isolate all our algorithms from each other. Plus we had the ability to rapidly prototype our soft design for each sensor on a totally different FPGA platform. But that is another blog post.