Why file transfer speeds of small vs large files could be different
- ONTAP 9
- Clustered Data ONTAP 8
- Data ONTAP operating in 7-Mode
- When copying many small files to a location takes longer than copying a few large files, whose total size is equal to that of the small files, to the same location. This is a well-known, but little understood, host-based filesystem performance issue.
- With all major operating systems, attempting to read or write large numbers of small files results in a large amount of O/S system overhead. This is because more time is spent at the O/S level performing find(), open(), and close() operations on each and every file you're processing.
- While these operations don't necessarily take a lot of time for a single file, they quickly rack-up when processing hundreds or thousands of small files. These problems manifest when performing activities like backups, restores, virus scans, etc... With small files like 4K, you can spend more time finding, opening and closing the file than you do reading or writing the data.
- Once the O/S finds the file in the file system and then opens the file, that is when it gets around to reading/writing the contents of the file and communicating with the storage array. That's why everything looks OK from the storage array performance perspective.
- The array is responding very quickly to the read/write request from the O/S. In these scenarios, faster CPUs and using the lowest latency disk helps, but you can never truly eliminate the issue. Even if you stored all the files on SSD or in a RAM-based disk, the OS still must perform all the system calls on each file.
- With large files the above find open and close operations are still there, but due to the lower count of files less time is required to complete these portions of the access and more is time is spent on the actual read/write.
Microsoft knowledge base going over slow write performance for small files