What are the SolidFire FC cluster best practices?
- Last Updated:
- NetApp SolidFire scale-out storage system
- NetApp Element software
Here is the list of major recommendations for SolidFire Fibre Channel (FC) clusters:
Enable FC In-Order Delivery (IOD)
- Frames that arrive out of order are “dropped”, this forces the initiator to restart the transaction which consumes Initiator, Target, and, possibly, switch resources (very broadly, this adds to the fabric latency and ESXi Path Down Discovery (PDL)).
- This, eventually, impacts the fabric and Initiator timeouts expire which causes another series of fabric events, Aborts.
- The Abort of an existing sequence causes additional work in the NetApp SolidFire target that consumes FC node resources. This may cause additional fabric delays and more abort sequences.
For more information about IOD see: When is in-order delivery (IOD) required and how is it set?
Disable ESXi smartd polling
- ESXi hosts on a certain interval will request SMART data from storage devices by sending a 0x85 SCSI command (ATA PASS THROUGH(16)) to a device to look at mode page 0x1c.
- NetApp SolidFire storage does not support this command and responds with CHECK STATUS and Sense Key 5h (ILLEGAL REQUEST).
- After the initial rejection, ESXi hosts continue to send the command. Processing this command interrupts the flow of data transfers (READs and WRITEs) and adds to the “workload” of the cluster (see IOD).
- Also, in some cases, the ESXi host thinks the device is under a Permanent Device Loss (PDL) condition.
Refer VMware documentation on how to disable smartd.
- For NetApp SolidFire storage a zone should have two items:
- one Initiator WWPN
- one Target WWPN
- Above recommendation keeps State Change notifications constrained to just the Initiator and Target in the zone and they do not need to process the rest of the fabric. State Change notifications are bounded/minimized.
Check Additional Information section below for example configuration.
FC node non-optimal as Cluster Master
- The primary task of the FC node is to manage the transfer of data between front end fibre channel interfaces and back end storage.
- Additional load is placed on the node when it takes on Cluster Master activity. In a heavily loaded system, this contention can increase latency.
- In the short term, to restore operating margins, work with your Support representative to demote any FC node that is a Cluster Master or upgrade to Element software 12.3 or later release.
For additional recommendation contact NetApp support.
LACP mandatory for all Bond10G interfaces
- Without LACP configured on the data path cluster node ports and corresponding switch ports, there may not be enough bandwidth available for optimal cluster operation.
- In a heavily loaded system, this network bottleneck can increase latency.
- LACP on all SolidFire FC and storage nodes for Bond10G interfaces (storage network) and corresponding switch ports is required
Configuring Max I/O Size
NetApp SolidFire FC nodes support a max I/O size of 2MB; the nodes advertise this requirement to Fibre Channel initiators during the login process. In certain circumstances, VMware ignores this limit, causing failures in tasks that send I/O requests larger than 2MB to the SolidFire cluster, such as backup jobs. There are two possible workarounds:
- Set the VMware advanced setting Disk.DiskMaxIOSize to 2048 according to the instructions in VMware KB article 1003469. This limits all workloads on the ESXi host to the specified I/O size.
- Reconfigure the application issuing the large I/Os to issue a maximum I/O size of 2MB.
For example, to see how to limit the max I/O size for the Veeam® Backup & Replication™ software follow :
How to limit maximum IO size for the Veeam Backup Replication software
Examples of recommended zone configuration:
Four paths to storage - prevent path explosion and keep IxL count low
Consider an ESXi host with a dual-port HBA and a four node FC cluster.
Create "Zone A" with ESXi host and FC Node A HBA-1 Port-A
Create "Zone B" with ESXi host and FC Node B HBA-1 Port-A
Create "Zone C" with ESXi host and FC Node C HBA-1 Port-A
Create "Zone D" with ESXi host and FC Node D HBA-1 Port-A
Next host is similar to the first - instead of zoning to FC node HBA-1 Port-A, make the zones to HBA-2 Port-A (you only need two fibers per node)
Four paths to storage for two node FC cluster - prevent path explosion and keep IxL count low.
Consider an ESXi host with a dual-port HBA and a two node FC cluster.
Create "Zone A" with ESXi -1 host and FC Node A HBA-1 Port-A
Create "Zone B" with ESXi -1 host and FC Node A HBA-2 Port-A
Create "Zone C" with ESXi -1 host and FC Node B HBA-1 Port-A
Create "Zone D" with ESXi -1 host and FC Node B HBA-2 Port-A
For the next ESXi host, create four zones using port B of FC node HBAs so that load is distributed. This way we utilize the FC ports on node effectively to balance the load.
Create "Zone A" with ESXi-2 host and FC Node A HBA-1 Port-B
Create "Zone B" with ESXi-2 host and FC Node A HBA-2 Port-B
Create "Zone C" with ESXi-2 host and FC Node B HBA-1 Port-B
Create "Zone D" with ESXi-2 host and FC Node B HBA-2 Port-B
For more information on iXL count and other FC node limits,see What are the SolidFire FC cluster limits?