The West Australian reported that two autonomous haulage systems (AHS) trucks experienced a collision when one of the trucks backed into the cab of the second truck that was stationary at the time. This is of interest to us as the AHS trucks are software controlled and they crashed. Clearly a failure mode. The initial report is that the sequence of events leading to the crash were:
- The first truck was in a reverse operation when WIFI coverage was lost.
- The first truck stopped in response to the loss of WIFI coverage.
- When WIFI signal returned, the first truck activated LiDAR (light detection and ranging), and presumably remained stationary because of LiDAR detection of the second truck.
- For reasons not known at this time, the first truck then resumed the reverse operation at which point it struck the second truck.
Interestingly the company executive said that the crash [of the software controlled trucks] was not the result of any failure of the autonomous system. So what did fail?
As software professionals we can learn much from real-world scenarios and actual events. This event draws attention to the concept of “safe states” – what is the proper safe state given the intended use of the system and potential for harm when the system is operational or stationary. Stopping the motion of the truck in response to WIFI loss seems obvious and reasonable to prevent a crash. We would call this a risk control measure and likely a “safe state.” However, should an autonomous system require operator intervention to re-start? Or does the system have adequate sensor inputs to perform a re-start on its own? The activation of the LiDAR seems to be part of a safe re-start protocol but one wonders if the LiDAR should have already been activated during a reverse operation (of course we are only reading an early news account). Perhaps the software processing of LiDAR needs historical information to establish the presence of an obstacle (i.e., object is new to the field, rather than it existing in the original image) – just questions one begins to pose in the root cause analysis process.
What about testing? Testing fault conditions, and plausible scenarios like loss of WIFI while in a reverse operation. We are challenged to consider failure modes and we must do this activity as a team because one person cannot think of the many scenarios to consider.
It will be interesting to follow the investigation as this will reveal much about the fault analysis and software design process for this autonomous system. Hopefully the details of the root cause discovery will be made public.
Another domain where autonomous operation is very critical is with outer space operations. Ron Baerg wrote an interesting blog related to this a while back – read about it NASA Software Verification Article and Key Points.