NFS client hang up if storage reply NFS4ERR_DELAY followed by NFS4ERR_SEQ_MISORDERED
Applies to
- ONTAP 9
- NFSv4.1
- pNFS
- RedHat
Issue
- Client <> storage continues pNFS I/O
- Network DOWN UP occurs due to switch issue
- After re-establishing TCP connection, client start to send non-pNFS I/O request
- For those non-pNFS I/O requests, storage system returns NFS4ERR_DELAY instead of NFS4ERR_SEQ_MISORDERED
- According to RFC 8881, client is not able to handle this delayed NFS4ERR_SEQ_MISORDERED response from storage
- Client <> storage stuck in an infinite loop(hang)
Example:
Before network DOWN:
608 2022-03-02 20:32:07.696321 1.745212 client_ip storage_ip 224 NFS 0x0002598a 0 12272513024 V4 Call READ StateID: 0xc7c1 Offset: 12272513024 Len: 4096
(pNFS V4 call)
After network DOWN > UP due to switch issue
628 2022-03-02 20:32:44.479467 0.000027 client_ip storage_ip 208 NFS 0x0002598b 0 12272513024 V4 Call (Reply In 636) READ StateID: 0xc7c1 Offset: 12272513024 Len: 4096
(non-pNFS V4 call)
629 2022-03-02 20:32:44.479473 0.000006 client_ip 172.26.57.218 228 NFS 0x00002f7c 3 0 V4 Call (Reply In 638) LAYOUTRETURN
(non-pNFS V4 call)
636 2022-03-02 20:32:45.479110 0.000015 storage_ip client_ip 48 NFS V4 Reply (Call In 628) SEQUENCE Status: NFS4ERR_DELAY
638 2022-03-02 20:32:45.479116 0.000001 storage_ip client_ip 48 NFS V4 Reply (Call In 629) SEQUENCE Status: NFS4ERR_DELAY
645 2022-03-02 20:33:00.487253 0.000081 client_ip storage_ip 568 NFS 0x00002f7c,0x0002598b,0x0000afae 3,0,1 0,12272513024 V4 Call (Reply In 649) LAYOUTRETURN ; V4 Call (Reply In 651) READ StateID: 0xc7c1 Offset: 12272513024 Len: 4096 ; V4 Call (Reply In 652) SEQUENCE
(non-pNFS V4 call)
649 2022-03-02 20:33:01.487050 0.000010 storage_ip client_ip 48 NFS V4 Reply (Call In 645) SEQUENCE Status: NFS4ERR_DELAY
662 2022-03-02 20:33:16.607231 0.000046 client_ip storage_ip 628 NFS 0x00004985,0x0002598b,0x00002f7c 2,0,3 12272513024,0 V4 Call (Reply In 666) GETATTR FH: 0x6769b2c2 ; V4 Call (Reply In 668) READ StateID: 0xc7c1 Offset: 12272513024 Len: 4096 ; V4 Call (Reply In 670)
(non-pNFS V4 call)
670 2022-03-02 20:33:17.607069 0.000004 storage_ip client_ip 48 NFS V4 Reply (Call In 662) SEQUENCE Status: NFS4ERR_DELAY
683 2022-03-02 20:33:32.671209 0.000037 client_ip storage_ip 568 NFS 0x00002f7c,0x0002598b,0x0000afae 3,0,1 0,12272513024 V4 Call (Reply In 684) LAYOUTRETURN ; V4 Call (Reply In 686) READ StateID: 0xc7c1 Offset: 12272513024 Len: 4096 ; V4 Call (Reply In 688) SEQUENCE
(non-pNFS V4 call)
684 2022-03-02 20:33:32.671406 0.000197 storage_ip client_ip 104 NFS 0x00002f7c 3 V4 Reply (Call In 683) LAYOUTRETURN
697 2022-03-02 20:33:47.679159 3.935932 client_ip storage_ip 208 NFS 0x0002598b 0 12272513024 V4 Call (Reply In 698) READ StateID: 0xc7c1 Offset: 12272513024 Len: 4096
(non-pNFS V4 call)
698 2022-03-02 20:33:47.679338 0.000179 storage_ip client_ip 48 NFS V4 Reply (Call In 697) SEQUENCE Status: NFS4ERR_SEQ_MISORDERED