CFBMC-2191: AFF A900, FAS9500: Communication failures with multiple sensors results in environmental shutdown
Issue
On AFF A900 and FAS9500 systems, communication failures with multiple sensors can result in environmental shutdown. The communication failures occur when the Controller Area Network (CAN bus) manager on the Baseboard Management Controller (BMC) can no longer access the CAN bus due to a transient failure of a previous "unlock" operation of the CAN bus.
The BMC might have previously rebooted to recover from this condition, as indicated by this example event in the BMC system event log:
Record 538: Tue Aug 02 05:53:45.994305 2022 [BMC.critical]: Rebooting BMC due to CAN skt error
Also, events similar to the following are seen in the BMC's messages log:
Aug 16 08:09:53 (none) canbusmngr2can2can3[1099]: [1099 : 1104 CRITICAL][semaph.c:433]semtimedop error
Aug 16 08:09:53 (none) canbusmngr2can2can3[1099]: [1099 : 1104 CRITICAL][ipc.c:319]Waiting for semaphore failed
Aug 16 08:09:53 (none) canbusmngr2can2can3[1099]: [1099 : 1104 WARNING] mutex.c:335]gm_lock_mutex(key=37, timeout=15000) : mutex_lock_recursive() FAILED!