CHW-1659: Multiple drives go missing on partner node when motherboard is removed
Issue
- On FAS 27XX and AFF-A2XX series platforms that have internal disk shelves, the partner node (surviving node) encounters a multi-disk panic during a motherboard replacement or any replacement that requires the motherboard to be removed from the chassis:
Panic String: aggr aggr_01: raid volfsm, fatal multi-disk error. raid type raid_dp Group name plex0/rg0 state NORMAL
3 disks failed in the group.
Disk 0b.00.0P1 Shelf 0 Bay 0 [NETAPP X343_STBTE1T8A10 NA02] S/N [W3Z12ABCDE001]
UID [6000C500:BBBE6E33:500A0981:00000001:00000000:00000000:00000000:00000000:00000000:00000000] error disk does not exist.
Disk 0b.00.1P1 Shelf 0 Bay 1 [NETAPP X343_STBTE1T8A10 NA02] S/N [W3Z12ABCDE002]
UID [6000C500:BBAEAEDB:500A0981:00000001:00000000:00000000:00000000:00000000:00000000:00000000] error disk does not exist.
Disk 0b.00.3P1 Shelf 0 Bay 3 [NETAPP X343_STBTE1T8A10 NA02] S/N [W3Z12ABCDE003]
UID [6000C500:BBAEF7B7:500A0981:00000001:00000000:00000000:00000000:00000000:00000000:00000000] error disk does not exist.
in SK process config_thread on release 9.9.1P9 (C)
- This issue occurs when multiple drives go missing from the internal shelf and the following events are seen in the shelf logs:
Sat Nov 26 02:41:32 2024 ( 636+02:03:20.110); 0303014D; S0; ENC_MGT; drive_manager; 02; Drive removed : 1
Sat Nov 26 02:41:32 2024 ( 636+02:03:20.128); 0303014D; S0; ENC_MGT; drive_manager; 02; Drive removed : 3
Sat Nov 26 02:41:32 2024 ( 636+02:03:20.148); 0303014D; S0; ENC_MGT; drive_manager; 02; Drive removed : 5
Sat Nov 26 02:41:32 2024 ( 636+02:03:20.165); 0303014D; S0; ENC_MGT; drive_manager; 02; Drive removed : 6
Sat Nov 26 02:41:32 2024 ( 636+02:03:20.166); 0303014D; S0; ENC_MGT; drive_manager; 02; Drive removed : 7
Sat Nov 26 02:41:32 2024 ( 636+02:03:20.186); 0303014D; S0; ENC_MGT; drive_manager; 02; Drive removed : 8
Sat Nov 26 02:41:32 2024 ( 636+02:03:20.205); 0303014D; S0; ENC_MGT; drive_manager; 02; Drive removed : 10
Sat Nov 26 02:41:32 2024 ( 636+02:03:20.205); 0303014D; S0; ENC_MGT; drive_manager; 02; Drive removed : 11