Top RHEL Issues/Workarounds/Best Practices for NFSv3/v4.0

Last updated
Save as PDF
Share
1. Share
2. Tweet
3. Share

Views:: 12,616

Visibility:: Public

Votes:: 9

Category:: data-ontap-8

Specialty:: nas

Last Updated:

Applies to

ONTAP 9
Clustered Data ONTAP 8
Data ONTAP 7-Mode
Red Hat Enterprise Linux

Answer

This article lists the top known issues, workarounds, and best practices for a RHEL client when used with NetApp FAS controllers running Data ONTAP 7-Mode or clustered Data ONTAP

Known Issues and Workarounds:

RHEL 5.x

Issue 1:
RHEL5.8: Mount with sec=krb5 displays warning as 'rpc.idmapd/rpc.gssd appears not to be running'.
Description: Mount displays an error as 'rpc.idmapd/rpc.gssd appears not to be running' though they are running, hence nfs-kerberos fails when it is mounted with sec=krb5.
Workaround: Update the nfs-utils package to version 1.0.9-66.el5 or later. For more information, see [RHEL5.8] NFSv4 'mount' command produces warnings about RPC services not running.
Issue 2:
RHEL5.9: chmod, chgrp, and chown commands fail on NFSv3 mounts.
Workaround: Traverse to the mounted path and then use chmod, chgrp, and chown commands to work.
Issue 3:
RHEL 5.10 and RHEL 5.11 : NLM clients can fail to recover locks with pending UNLOCK operations.
Description: When clients perform NLM lock recovery, RHEL 5.10 and RHEL 5.11 NLM clients can get into a state where they stop issuing pending UNLOCK operations and only reclaim existing locks, leaving the locks with pending UNLOCK operations not unlocked and not reclaimed.
Workaround: No workaround available in RHEL5. Upgrade to RHEL 6.6 GA clients which properly handle pending UNLOCK operations when performing lock recovery and all locks are cleaned up after recovery.
Issue 4:
RHEL 5.11:"EIO (errno=5)" hit on RHEL5U11 NFSv3 mounts during Storage fail over operations.
Description: RHEL 5.11 does not have the latest SUNRPC layer fixes which results client EIO errors.
Workaround: No workaround available in RHEL5. Upgrade the client to latest RHEL kernels like RHEL 6.6 GA which addresses this problem and has latest fixes.
Issue 5:
RHEL 5.11: "ERROR: No locks available (errno=37)" hit on NFSv3 mounts during Storage Failover operations
Description: RHEL 5.11 does not have the latest SUNRPC layer fixes which results "error # 37" failure.
Workaround: No workaround available in RHEL5. Upgrade the client to latest RHEL kernels like RHEL 6.6 GA which addresses this problem and has latest fixes.
Issue 6:
The following error messages are reported:
- /var/log/messages:Aug 16 19:08:17 uscf1plat0 kernel: NFS: v4 server 10.113.49.8 returned a bad sequence-id error!
  
  grep sequence-id messages.1 Aug 16 19:08:17 uscf1plat0 kernel: NFS: v4 server 10.113.49.8 returned a bad sequence-id error! ... Aug 22 01:09:27 uscf1plat0 kernel: NFS: v4 server 10.113.49.8 returned a bad sequence-id error!
- No logs matching on the storage system
- Checked pktt: After the client goes into a loop for write call, the storage system returns NFS4ERR_BAD_STATEID
Workaround: For more information, see Red Hat BUG 620502 - [NetApp 5.6 bug] RHEL NFS clients disconnect from NetApp NFSv4 shares with: v4 server returned a bad sequence-id error!

RHEL 6.x

Issue 1:
RHEL6.x: nfs-kerberos fails with a permission denied error if allow_weak_crypto is disabled.
Description: With kinit enabled on these clients, nfs-kerberos authentication will fail and displays kinit: No supported encryption types (config file error?).
Workaround: Add allow_weak_crypto=yes in /etc/krb5/krb5.conf to get that nfs-kerberos working.
Issue 2:
RHEL6.3/6.4: chmod, chgrp, and chown commands fail on NFSv3 mounts.
Workaround: Traverse to the mounted path and then use chmod, chgrp, and chown commands to work.
Issue 3:
RHEL 6.2/6.3 : Applications running on Redhat 6.2/6.3 clients might not be able to stop writing to files to a NFS mount point. Pressing 'Ctrl C' will not help. RHEL6 (> kernl 2.6.25) and later mount options intr/nointr options have been deprecated. Applications that are hung should be killed using SIGKILL to interrupt.
Workaround: Use the command kill -9 pid to kill the process.
Issue 4:
RHEL6.3/6.4: The NFSv3 process occasionally exits with 'No locks available' when NFS server crashes.
Description: NLM is NOT retrying waiting byte lock request after the server crashes and just fails the application with -ENOLCK error.
Workaround: Upgrade to RHEL 6.3 Errata Kernel (2.6.32-279.46.1.el6). 6.4 Errata Kernel (2.6.32-358.49.1.el6).
Issue 5:
RHEL 6.1/6.2/6.3:
Description: Application running on RHEL 6.1/6.2/6.3 clients might hang with the BAD_STATID error while NFS v4.0 delegations are enabled
Workaround: Upgrade to RHEL 6.3 Errata Kernel (2.6.32-279.46.1.el6).
Issue 6:
RHEL 6.4: NFS v4.0 referral directories missing from directory listing on RHEL 6.4 clients.
Description: RHEL 6.4 client getting lookups for all the volumes but its listing only the volumes of one of the HA storage nodes. Tracked with RedHat found at https://bugzilla.redhat.com/show_bug.cgi?id=963337
Issue 7:
RHEL 6.4: Mount on RHEL 6.4 client succeeds with sec=krb5 though the volume has export ro rule set as sys.
Description: RHEL 6.4 client switches its security flavor from 'RPCSEC_GSS' to 'AUTH_UNIX' during mount and hence, the mount gets succeeded while mounting with sec=krb5, though the volume has exported with the ro rule set as sys. Similarly, export with ro rule as sys and mount with krb5 also succeeds, since the client switches the security flavor based on the SECINFO reply; but this seems to be a security gap. Tracked with RedHat found at https://bugzilla.redhat.com/show_bug.cgi?id=948145
Issue 8:
RHEL 6.4: NFS v4.0 state recovery can deadlock with permission checking on delegated OPEN.
Description: The state manager can deadlock on an open owner's nfsv4 sequence ID if you happen to be recovering OPEN state while the client is also checking open permission for a delegated OPEN.
Workaround: Upgrade to RHEL 6.4 Errata Kernel (2.6.32-358.49.1.el6.x86_64).
Issue 9:
RHEL 6.4: NFSv4.0: Fix handling of revoked delegations by setattr.
Description: _nfs4_do_setattr() will use the delegation stateid if no writeable open file stateid is available. If the server revokes that delegation stateid, then the call to nfs4_handle_exception() will fail to handle the error due to the lack of a struct nfs4_state, and will just convert the error into an EIO.
Workaround: Upgrade the client to Rhel 6.4 Errata Kernel (2.6.32-358.49.1.el6.x86_64) and above.

Issue 10:
RHEL6.3/6.4: NFS v4.0 Stateid recovery can result in zero_stateid NFS4ERR_BAD_STATEID loop. This is being tracked at https://bugzilla.redhat.com/show_bug.cgi?id=923939.
Workaround: Upgrade the client to latest RHEL 6.5 and above.

Issue 11:
RHEL6.x: CDOT: Circular directory structure warning when running chown -R on junctioned volumes via NFSv4. This is being tracked at https://bugzilla.redhat.com/show_bug.cgi?id=1223978.
Description: when you have a multilevel junction path (/vol1/vol2/vol3), mounted, you might see a recursive directory structure warning while doing setattr or chmod.
Workaround: upgrade to latest kernel of 6.7.

RHEL 7.x

Best Practices:

See TR-4067 for 'Clustered Data ONTAP NFS Best Practice and Implementation Guide'.

Issue 12:
RHEL6.x: "MIT krb5 gss_accept_sec_context() implementation used by SECD rejects an incoming token if the ticket end-time expires before the server current time, with no Kerberos clock skew consideration.This is being tracked at #8268: krb5 gss_accept_sec_context() does not allow clock skew
Description: "If the check triggers, gss_accept_sec_context() returns GSS_S_CREDENTIALS_EXPIRED on GSS context refresh which is incorrect as itsuggests that the NFS client's verifier_cred_handle has expired.
The check fails and ""Permission Denied"" is returned to the application when:
1) The server clock is ahead of the client clock, but within the server's configured Kerberos clock skew.
2) The maximum service ticket lifetime is less than the maximum TGT lifetime.
3) The NFS client is sending requests on a Kerberos share during the ""clock skew window"" when the client service ticket is good according to the client clock, and expired according to the server clock.
This results in a ""Permissions Denied"" error being returned to the application during the clock skew window. In this case, the server will reject the NFS request with an AUTH_ERROR which triggers a GSS context refresh attempt on the client. The client will use the existing server service ticket for the refresh which is rejected by the server with the GSS_S_CREDENTIALS_EXPIRED error. Note that if the service ticket is refreshable, this situation heals itself once the service ticket expires on the client, forcing the client to refresh the service ticket."
Workaround: The workaround is to configure the KDC maximum service ticket lifetime to be equal to the TGT maximum lifetime. In this case, the client when asked to refresh the service ticket, it first must refresh the TGT which on the Linux NFS client takes the Kerberos clock skew into consideration.
Issue 13:
RHEL6.3,6.4,6.5: RHEL NFS client hangs and I/O outages with older RHEL 6 kernels and NFSv4.1 (BURT:814789).
Description: When using NFSv4.1 with Red Hat Enterprise Linux (RHEL) NFS clients running olderRHEL 6 kernels such as RHEL 6.4 and RHEL 6.3, you might encounter client issues such as hangs & long I/O outages when performing NetApp-specific controller tasks such as LIF migrations, that require clients to recover state under heavy workloads.

Workaround: The above issues have been addressed in the latest RHEL 6.5.z errata packages described below:
1) kernel-2.6.32-431.29.2.el6
2) nfs-utils-1.2.3-39.el6_5.3
3) libtirpc-0.2.1-6.el6_5.2
To avoid these issues, it is recommended to upgrade clients to these respective RHEL 6.5.z errata packages.
Issue 14:
RHEL 6.4,6.5,6.5z: RHEL NFS client I/O outages and errors (BURT:866544).
Description: I/O outages and errors are observed on Red Hat Enterprise Linux (RHEL) NFS clients running older kernels such as RHEL 6.5. This is seen for all NFS versions including NFSv3, v4.0 & v4.1.
Workaround: The above issues have now been addressed in the RHEL 6.6 GA kernel-2.6.32-504.el6. It is recommended to upgrade clients to RHEL 6.6 GA to obtain these fixes and avoid encountering the above issues.
1. Issue 1:
  RHEL7.0x: RHEL 7.0 NFS client issues & fixes (BURT:915798)
  Description:NFS client I/O errors, kernel crashes, permission-denied errors on Kerberized mount points and an intermittent data corruption issue have been observed on older Red Hat Enterprise Linux (RHEL) kernels such as RHEL 7.0. This is seen across all NFS versions including NFS v3, v4.0 & v4.1.
  Workaround:The above issues are now addressed in the RHEL 7.1 GA kernel-3.10.0-229.el7. Upgrade clients to this RHEL 7.1 GA release to access these fixes and thereby avoid encountering the above issues.
2. Issue 2:
  RHEL7.x: on pNFS mounts umount does not destroy session to DS after getting EIO error. This is being tracked at https://bugzilla.redhat.com/show_bug.cgi?id=1234986.
  Description:On RHEL 7.x clients, if you are mounted using pNFS, and you do an umount command, the client might fail to delete session with the server.
  Workaround: upgrade to latest kernel version of 7.1z or 7.2 ( > kernel 3.10.0-320 ) .
3. Issue 3: RHEL7.x: Received EIO because of ADMIN_REVOKED on the SETATTR on Rhel7.1 with pNFS . This is being tracked at https://bugzilla.redhat.com/show_bug.cgi?id=1214410
  Description: On RHEL 7.x clients, if you are mounted using pNFS and the server has to go through a failover (takeover/giveback), you might see EIO error ( error=5)
  Workaround: upgrade to latest kernel version of 7.1z or 7.2 ( > kernel 3.10.0-289)
4. Issue 4:
  RHEL7.1: NFS IO for vols on disaster side cluster intermittently fails post MetroCluster switchover. This is being tracked at https://bugzilla.redhat.com/show_bug.cgi?id=1240790
  Description: On RHEL 7.x client, if you are mounted using NFSv4 and the storage system has to go through a Metrocluster switchover, you might see and EIO (error=5).
  Workaround:upgrade to latest kernel version of 7.1z or 7.2 ( > kernel 3.10.0-295)
5. Issue 5:
  RHEL7.x: CDOT: Circular directory structure warning when running chown -R on junctioned volumes via NFSv4. This is being tracked at https://bugzilla.redhat.com/show_bug.cgi?id=1225090.
  Description: when you have a multilevel junction path (/vol1/vol2/vol3), mounted, you might see a recursive directory structure warning while doing setattr or chmod.
  Workaround: upgrade to latest kernel of 7.1
6. Issue 6:
  RHEL7.1: NFSv4 sharelocks not cleared after closing files following LIF migrate with new RHEL7.1 clients.This is being tracked at https://bugzilla.redhat.com/show_bug.cgi?id=1263376
  Description: If there are files open and Locks on an NFSv4.x mount and a server goes through a LIF migrate or an SFO, you might see locks not being cleared. this might lead data corruption.
  Workaround: use kernel version 3.10.0-320 or above
7. Issue 7:
  RHEL6.x/7.x: "MIT krb5 gss_accept_sec_context() implementation used by SECD rejects an incoming token if the ticket end-time expires before the server current time, with no Kerberos clock skew consideration.This is being tracked at http://krbdev.mit.edu/rt/Ticket/Display.html?id=8268
  Description: "If the check triggers, gss_accept_sec_context() returns GSS_S_CREDENTIALS_EXPIRED on GSS context refresh which is incorrect as it suggests that the NFS client's verifier_cred_handle has expired.
  The check fails and "Permission Denied" is returned to the application when:
  1) The server clock is ahead of the client clock, but within the server's configured Kerberos clock skew.
  2) The maximum service ticket lifetime is less than the maximum TGT lifetime.
  3) The NFS client is sending requests on a Kerberos share during the ""clock skew window"" when the client service ticket is good according to the client clock, and expired according to the server clock.
  This results in a "Permissions Denied" error being returned to the application during the clock skew window. In this case, the server will reject the NFS request with an AUTH_ERROR which triggers a GSS context refresh attempt on the client. The client will use the existing server service ticket for the refresh which is rejected by the server with the GSS_S_CREDENTIALS_EXPIRED error. Note that if the service ticket is refreshable, this situation heals itself once the service ticket expires on the client, forcing the client to refresh the service ticket."
  Workaround: The workaround is to configure the KDC maximum service ticket lifetime to be equal to the TGT maximum lifetime. In this case, the client when asked to refresh the service ticket, it first must refresh the TGT which on the Linux NFS client takes the Kerberos clock skew into consideration.

Additional Information

N/A