I/O timeout on NVMe drive triggers a node panic
- Views:
- 1,401
- Visibility:
- Public
- Votes:
- 0
- Category:
- solidfire-chassis
- Specialty:
- solidfire
- Last Updated:
- 9/12/2024, 2:08:29 PM
Applies to
- H-Series Storage Nodes
- SF-Series Storage Nodes
- Element Software 11.x and 12.0
Issue
Node panics after encountering an I/O timeout on a NVMe drive.
Example : (Kern.log)
2020-05-30T15:01:29.908585Z KUL01-SFCL01-H610S1-06 kernel: [3319602.993359] nvme nvme7: I/O 251 QID 5 timeout, aborting
2020-05-30T15:01:29.908599Z KUL01-SFCL01-H610S1-06 kernel: [3319602.993368] nvme nvme7: I/O 808 QID 5 timeout, aborting
2020-05-30T15:01:29.908601Z KUL01-SFCL01-H610S1-06 kernel: [3319602.993374] nvme nvme7: I/O 76 QID 8 timeout, aborting
2020-05-30T15:01:29.908602Z KUL01-SFCL01-H610S1-06 kernel: [3319602.993377] nvme nvme7: I/O 79 QID 8 timeout, aborting
2020-05-30T15:01:29.908604Z KUL01-SFCL01-H610S1-06 kernel: [3319602.993380] nvme nvme7: I/O 180 QID 8 timeout, aborting
2020-05-30T15:01:29.908608Z KUL01-SFCL01-H610S1-06 kernel: [3319602.993384] nvme nvme7: I/O 181 QID 8 timeout, aborting
2020-05-30T15:01:29.908609Z KUL01-SFCL01-H610S1-06 kernel: [3319602.993387] nvme nvme7: I/O 182 QID 8 timeout, aborting
2020-05-30T15:01:29.908610Z KUL01-SFCL01-H610S1-06 kernel: [3319602.993390] nvme nvme7: I/O 183 QID 8 timeout, aborting
2020-05-30T15:02:00.948585Z KUL01-SFCL01-H610S1-06 kernel: [3319634.032001] nvme nvme7: I/O 251 QID 5 timeout, reset controller
2020-05-30T15:02:04.698578Z KUL01-SFCL01-H610S1-06 kernel: [3319637.781819] nvme nvme7: I/O 19 QID 0 timeout, reset controller