StorageGRID node stuck in rebooting with multiple HIC ports showing unknown
Applies to
NetApp StorageGRID Appliances
Issue
After upgrading StorageGRID, a storage node is stuck in rebooting state.
kern.log reports:
2025-02-20T12:59:21.404932+00:00 StorageGRID-PGE root: [2025-02-20 12:59:21+00:00 SGA] A Mellanox device is missing, issuing a full-chip reset and rebooting 2025-02-20T12:59:21.480570+00:00 StorageGRID-PGE root: [2025-02-20 12:59:21+00:00 SGA] [root@StorageGRID-PGE:/root] >>> mstfwreset --level 3 --type 0 --yes --device 1c:00.0 reset 2025-02-20T12:59:21.779344+00:00 StorageGRID-PGE root: [2025-02-20 12:59:21+00:00 SGA] -E- Failed to send Register MFRL: ME_ICMD_STATUS_ICMD_NOT_READY (523). 2025-02-20T12:59:07.955396+00:00 StorageGRID-PGE kernel: [ 253.452041] mlx5_core: probe of 0000:1c:00.1 failed with error -16
Appliance Installer shows ports in Unknown status:
