Mellanox switch hung due to Too many open files
Applies to
- Mellanox SN2010
- Onyx version 3.9.3220
Issue
- Switch was hung and not responding until it was rebooted.
- can login to CLI but doesn't return prompt to exceute commands.
- can login to WEBUI but unable to manage the switch
- Sysdump Logs samples:
SNMP is continuously trying to send a trap but fails.
Line 68934: Jul 16 16:13:28 DC-ENCOA-FL5-SN2010-21 snmpd[5488]: [snmpd.ERR]: snmpd: send_trap: Failure in sendto (No route to host)
Line 68935: Jul 16 16:13:28 DC-ENCOA-FL5-SN2010-21 snmpd[5488]: message repeated 8 times: [ [snmpd.ERR]: snmpd: send_trap: Failure in sendto (No route to host)]
Line 68936: Jul 16 16:13:28 DC-ENCOA-FL5-SN2010-21 snmpd[5488]: [snmpd.ERR]: snmpd: send_trap: Failure in sendto (No route to host)
Line 68937: Jul 16 16:13:28 DC-ENCOA-FL5-SN2010-21 snmpd[5488]: message repeated 8 times: [ [snmpd.ERR]: snmpd: send_trap: Failure in sendto (No route to host)]
Line 68938: Jul 16 16:13:28 DC-ENCOA-FL5-SN2010-21 snmpd[5488]: [snmpd.ERR]: snmpd: send_trap: Failure in sendto (No route to host)
As a result, the below log indicates a lot of open files:
Line 68918: Jul 16 16:13:23 DC-ENCOA-FL5-SN2010-21 mgmtd[6612]: [mgmtd.ERR]: lc_launch_pre_fork(), proc_utils.c:726, build 1: Too many open files: Making temp file with base name /vtmp/proc-output