How to troubleshoot pBlk exhaustion due to vscan server on Data ONTAP 8 7-Mode
Applies to
- Data ONTAP 8 operating in 7-Mode
- Data ONTAP 7
Description
- The response time of an external Vscan server directly impacts the ability of a Storage Controller to respond to client requests.
- In Data ONTAP 7 and Data ONTAP 8-7 Mode, Vscan servers are external to the Storage Controller.
- pBlks usage increases because the request from the client accounts for pBlk usage.
- Scanning of the file (via opening the file) from the Vscan server accounts for additional pBlk usage.
- If Vscan server can complete the file scan faster, Data ONTAP can respond to the original client request and free up pBlk(s) faster.
Consider the following four aspects of pBlk exhaustion
and external Vscan servers:
- Number of Vscan servers:
- Storage Controller can send a maximum of 50 scan requests to a Vscan server at any given time.
(If Multistore is used,100 requests can be sent to one server with two vfilers; 50 per vfiler) - If 100 requests come in at the same time, one Vscan server will have to process the first 50 requests before it can start the second 50 requests.
- In this scenario, the
Max gOffloadQueue depth
becomes 50, since the storage controller has to wait for some of the first block of 50 to finish before sending requests held in the second block of 50. - In this example,
pBlk exhaustion
might not have occurred, but this is highlighted as more clients are added to a Storage Controller, and there is a need for optimal performance from the AV infrastructure.
- Storage Controller can send a maximum of 50 scan requests to a Vscan server at any given time.
- Speed of Vscan Servers:
- Since speed of external Vscan servers is critical, it is recommended to run Vscan servers on dedicated hardware rather than running them as Virtual Machines (for the current up to date information on Vscan in Data ONTAP 7.x environments, see TR-3107: Antivirus Scanning Best Practices Guide).
- If performance of the external Vscan server is degraded, it will take longer to respond to Storage Controller Vscan requests, resulting in pBlks being held for longer periods of time.
- If the speed of the Vscan server is so degraded and enough clients send requests in a short period of time,
pBlk exhaustion
can occur
- Configuration of Vscan servers:
- Vscan vendors control the tunable options for their application.
- The best place to start is the installation and configuration guides from the Vscan server vendor to make sure that the best practices are being met for the Vscan product.
- Configurations not meeting vendor's best practices will likely result in decreased performance, putting the Storage Controller at risk of
pBlk exhaustion
.
- Vscan options timeout:
- In addition to a properly sized Vscan infrastructure, Data ONTAP has options that control the amount of time it will wait for virus scans to complete.
- It is imperative that these values are set to the specification for the Vscan vendor and are defined based on best practices.
Example of pBlk consumption (the assumption here is that the file needs to be scanned by the Vscan server):
This is a very high level overview of the process. The numbers reflected below are not real world and are for example purposes only. |
- Client issues read request for
fileA.txt
on the storage controller. The storage controller will allocate a pBlk for the client read request.
Total pBlks consumed = 1
- Filer issues RPC call to the Vscan server (VSCAN01) requesting a scan of
fileA.txt
,total pBlks consumed = 1
- Vscan server, VSCAN01, will then make a request over a share on the filer,
ONTAP_ADMIN$
, to retrieve the file to be scanned. In order to scan the file, the Vscan server has to read all or part of the file.
Total pBlks consumed = 2
. Note the increase. - Vscan server, VSCAN01, finishes reading the file and then satisfies the scan operation by sending back a reply to the storage controller.
Total pBlks consumed = 1. Note the decrease. - Storage Controller does internal accounting to mark
fileA.txt
as being scanned.
Total pBlks consumed = 1
- The filer responds to the clients read request.
Total pBlks consumed = 0
During the entire time between steps 1 and 6, the client is holding a pBlk until the virus scan of the file is complete. |