You may be used to rebooting a server every so often to ensure that it doesn't crash because of some resource problem, but what about a modern jet airliner like the Boeing 787?

The inevitable creep of software into engineering brings with it the problem of bugs. Embedded computer system engineers have a long history of trying to find ways of making software provably correct. Languages used for process control tend to be single-tasking as do their operating systems, and there are usually lots of hardware checks to make sure that nothing serious could go wrong.

This makes a recent directive from the US Federal Aviation Administration all the more shocking.

Basically it says that all Boeing 787 Dreamliners have to be switched off every 248 days. If they are not reset then the generator control units GCUs will go into failsafe mode and the plane will lose all electrical power.

Why exactly?

To quote the FAA directive:

This condition is caused by a software counter internal to the GCUs that will overflow after 248 days of continuous power. We are issuing this AD to prevent loss of all AC electrical power, which could result in loss of control of the airplane.

A simple guess suggests the the problem is a signed 32-bit overflow as 231 is the number of seconds in 248 days multiplied by 100, i.e. a counter in hundredths of of a second.



So, the problem is a simple classical overflow. You would think that this is something that could have been spotted by formal methods, but think for a moment how are you going to implement this sort of counter?

Your options are to increase the number of bits used, which puts off the overflow, or you could work with infinite precision arithmetic, which would slowly use up the available memory and finally bring the system down.

Perhaps the new overflow detection system from MIT, see MIT Finds Overflow Bugs, would have pointed it out and then the programmers could have implemented a test and a safe clock reset routine which is the best that could be hoped for.

Until there is a patch for the problem all Dreamliners have to be rebooted before the 248 day period is up. Apparently if the worse does happen and the GCUs overflow and switch off the power then the plane should have enough backup power from a lithium-ion battery for about 6 seconds while a ram air turbine deploys for emergency power generation. So, with luck, this isn't a bug that could cause planes to fall out of the sky.

One interesting fact is that the FAA claim that it will take about one hour to reboot the GCUs - so there clearly isn't a reset button.

Not a possible fix for the Dreamliner.

More cartoon fun at xkcd a webcomic of romance,sarcasm, math, and language

It is estimated that the Airbus A380, comparable in complexity to the Dreamliner, has more than 100 million lines of code.







Comments



Make a Comment or View Existing Comments Using Disqus





or email your comment to: comments@i-programmer.info