Windows Guest OS reporting Event 129 Reset to device, \Device\RaidPort0
Applies to
- Windows Server 2008 and above
- VMware ESXi 6.5
- E-Series
- FAS/AFF
- NetApp HCI
- iSCSI
Issue
- Multiple Windows Server 2016 VM goes to unresponsive state when Event ID 129 is reported and recovers automatically.
- The affected VMs are write intensive.
- This environment has mix of E-series and FAS storage controllers
- The Event ID 129 is also observed on FAS and E-series based VMs
- Packet capture between the ESXi host and E-series controller shows initiator aborting SCSI read requests within (~7 ms).
Example:
Snippet from vmkernel logs:
2020-05-04T14:43:08.215Z cpu54:65940)NMP: nmp_ThrottleLogForDevice:3616: Cmd 0x88 (0x43959767ba40, 9089744) to dev "naa.600a098000fb3005000004355d6f1da3" on path "vmhba64:C2:T1:L5" Failed: H:0x8 D:0x0 P:0x0 Invalid sense data: 0x0 0x0 0x0. Act:EVAL
2020-05-04T14:43:08.215Z cpu54:65940)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "naa.600a098000fb3005000004355d6f1da3" state in doubt; requested fast path state update...
2020-05-04T14:43:08.215Z cpu54:65940)ScsiDeviceIO: 2965: Cmd(0x43959767ba40) 0x88, CmdSN 0xffffd2880bfd9210 from world 9089744 to dev "naa.600a098000fb3005000004355d6f1da3" failed H:0x8 D:0x0 P:0x0
2020-05-04T14:43:11.216Z cpu11:9089750)WARNING: VSCSI: 3502: handle 170795(vscsi1:2):WaitForCIF: Issuing reset; number of CIF:16
2020-05-04T14:43:11.216Z cpu11:9089750)WARNING: VSCSI: 2650: handle 170795(vscsi1:2):Ignoring double reset
Snippet from packet trace:
211411 May 4, 2020 14:43:07.493525000 UTC 192.168.20.32 192.168.20.21 44468,3260 SCSI: Read(16) LUN: 0x05 (LBA: 40408422656, Len: 128)
211412 May 4, 2020 14:43:07.500645000 UTC 192.168.20.32 192.168.20.21 44468,3260 Task Management Function (Abort Task)
211413 May 4, 2020 14:43:07.501001000 UTC 192.168.20.21 192.168.20.32 3260,44468 Task Management Function Response (Function complete)