What is an Fpolicy EAGAIN error and when do they occur?
EAGAIN errors occur when the send or receive buffer allocated to a specific TCP session between a LIF and a fpolicy server is filled and no longer has available room for requests to be sent or responses to be received. The buffer empties when requests are responded to or when responses are pulled out of the buffer by ONTAP. When an EAGAIN occurs a rewind behaviour is triggered for that SVM to that fpolicy server, depending on the version of ONTAP this rewind/delay behavior can vary.
- Versions without the fix for bug 1372994:
- ONTAP will pause fpolicy requests on that TCP session for 2 seconds, then start sending queued requests on resume.
- Versions with the fix for bug 1372994:
- For the first attempt, ONTAP will retry the request after 1 ms. If another EAGAIN occurs for the same request, ONTAP will continue to retry every 100ms for maximum of 2 seconds, closing the socket if the request cannot be added to the buffer prior to the timeout. If the request can be added to the buffer prior to the disconnection, the process reverts to the 1ms delay timer for the next EAGAIN occurrence.
Examples of errors:
- EMS or
event log show:
[filer1: fpolicy: fpolicy.eagain.on.write:notice]: Write returned EAGAIN while sending notification to the FPolicy server "184.108.40.206" for vserver ID 3.
Fpolicy.log reports errors pertaining to EAGAIN errors similar to the following:
[kern_fpolicy:error:1552] Write returned EAGAIN [0x0x80c408d00] src/fsm/fsm_external_engine.cc:864
Fixed versions for bug 1479704 have extended behaviour to the above flow.
- If the full flow is completed and a disconnect occurs, a 2 minute timer is started.
- If 4 disconnects due to EAGAIN occur within the 2 minute timer, the FPolicy server is perminantly disconnected and an EMS is triggered.
EMS details: fpolicy.eagain events
Bug 1372994: EAGAIN errors during FPolicy screening might lead to high latency