OracleDB instance down caused by NetApp disk not ready and long CP
Applies to
- ONTAP 9
- VMware ESXi 8.0.3
- Oracle 11.2.0.4.0
- RHEL 6.7
- Fabric OS v9.1.1d
- OracleDB running as VM guest
Issue
- OracleDB instance down.
- Linux kernel log shows multiple SCSI device resets:
sd 2:0:2:0: [sdl] SCSI device reset on scsi2:2
sd 1:0:0:0: [sdh] SCSI device reset on scsi1:0
sd 2:0:3:0: [sdm] SCSI device reset on scsi2:3
sd 1:0:1:0: [sdi] SCSI device reset on scsi1:1
sd 2:0:4:0: [sdn] SCSI device reset on scsi2:4
- NetApp EMS log shows RAID group missing disk, reconstruction started, and repeated IO errors:
raid.rg.recons.missing: RAID group /nasontap13_aggr1/plex0/rg1 is missing 1 disk(s)scsi.cmd.pastTimeToLive:error: Disk device 0a.01.10: request failed after try #1: cdb 0x2a:0d199c00:0200disk.IO.status: SCSI:not ready - Drive spinning upwafl.cp.toolong:error: Aggregate nasontap13_aggr1 experienced a long CP
- VMware/OS logs show lost access to volume due to connectivity issues
- OracleDB instance fails due to I/O errors and storage disconnect
