Traversing a large sparse directory is extremely slow with long running READDIR requests
Applies to
- ONTAP 9
- NFS
- SMB
Issue
- It takes a long time to traverse a large directory with the following possible symptoms:
- Many long running READDIR/READDIR+ requests could be identified from a packet trace, it could take up to tens of seconds, or even hundreds of seconds in the worst case scenario
- Note: The same applies to SMB2 QUERY_DIRECTORY requests or its equivalents in other SMB versions
- Timeout on the NFS/SMB clients
- Horrible latency reported from various Performance Monitoring tools
- Many long running READDIR/READDIR+ requests could be identified from a packet trace, it could take up to tens of seconds, or even hundreds of seconds in the worst case scenario
- A number of different error messages would pop up from EMS logs:
[<node_name>: wafl_exemptxx: wafl.readdir.expired:error]: A READDIR file operation has expired for the directory associated with volume <volume_name>/@vserver:<vserver_uuid> Snapshot copy ID xx and inode <inode_number>.
[<node_name>: wafl_exemptxx: wafl.readdir.sparse:notice]: A READDIR file operation has detected that a directory associated with volume <volume_name>/@vserver:<vserver_uuid> Snapshot ID xx and inode <inode_name> is sparse.