CONTAP-115704: After upgrade to 9.13.1+, one or more node's /tmp/ directory is intermittently hitting 100% capacity and vol0 inode usage increases permanently
Issue
- To hit this defect, users must authenticate via ssh and public key
- The higher the frequency of ssh public key authentication sessions (for example automation scripts), the higher the chance to hit this defect
- The defect leaves orphaned inodes behind under the node /tmp/ directory until /tmp/ reaches 100% usage, stays at 100% for minutes up to hours and finally drops down to ~1-2% again
- Once /tmp/ is at 100% usage and ssh public key authentication continues while /tmp/ is still at 100% capacity usage, the defect leaves permanent files behind at location /mroot/etc/cluster_config/vserver/.vserver_*/config/auth.[0-9,a-z,A-Z]+
Resulting signatures can vary, for example:
1. When logging into ONTAP using the serial console or the virtual SP/BMC "system console", errors are shown like:
SP MyCluster-01> system console
Type "exit" to log out of the console session.
Type Ctrl+D to return to the SP prompt.
SP-login: ad
pid 36994 (top), uid 0 inumber 20 on /tmp: filesystem full
pid 6889 (bash), uid 0 inumber 22 on /tmp: filesystem full
2. An error is raised similar to:
Apply failed for Object: publickey_ui Method: baseline. Reason: Failed to generate fingerprint for the public key.
3. SSH logins using public key fail-back to password authentication while password authentication still works at the same time if the user has a password next to the ssh public key configured in ONTAP.
4. ONTAP EMS shows errors like:
1/2/2025 15:05:23 node1 NOTICE sshd.auth.loginDenied: message="Failed password for user from 192.168.1.1 port 31337 ssh2 "
1/2/2025 15:05:11 node1 NOTICE sshd.auth.loginDenied: message="Failed keyboard-interactive / pam for user from 192.168.1.1 port 31337 ssh2 "
5. The node root volume, usually "vol0", shows a permanently long term increasing count of consumed inodes.