Skip to main content
NetApp Knowledge Base

SolidFire FC cluster best practice

Views:
431
Visibility:
Public
Votes:
0
Category:
element-software
Specialty:
solidfire
Last Updated:

 

Applies to

  •   SolidFire All Flash Arrays    
  •   SolidFire Element OS 

Answer

What are the SolidFire FC cluster best practices?

Below is the list of major recommendations for SolidFire FC cluster:

Fabric Channel In-Order Delivery - IOD is required with NetApp SolidFire storage clusters that use Fibre Channel nodes

  • Frames that arrive out of order are “dropped”, this forces the initiator to restart the transaction which consumes Initiator, Target, and, possibly, switch resources (very broadly, this adds to the fabric latency and ESXi Path Down Discovery (PDL).)
  • This, eventually, impacts the fabric and Initiator timeouts expire which causes another series of fabric events, Aborts.
  • The Abort of an existing sequence causes additional work in the NetApp/SolidFire target that consumes FC node resources. This may cause additional fabric delays and more abort sequences
For more information about IOD, see KB When is in-order delivery (IOD) required and how is it set? 

ESXi smartd polling - global smartd disable recommendation

  • ESXi hosts on a certain interval will request SMART data from storage devices by sending a 0x85 SCSI command ( ATA PASS THROUGH(16)) to a device to look at mode page 0x1c.
  • NetApp/SolidFire storage does not support this command and responds with CHECK STATUS and Sense Key 5h (ILLEGAL REQUEST).
  • After the initial rejection, ESXi hosts continue to send the command. Processing this command interrupts the flow of data transfers (READs and WRITEs) and adds to the “workload” of the cluster (see IOD).
  • Also, in some cases, the ESXi host thinks the device is under a Permanent Device Loss (PDL) condition.
Refer VMware documentation on how to disable smartd.

Zone recommendation

  • For NetApp SolidFire storage a zone should have two items:
    • one Initiator WWPN
    • one Target WWPN
  • Above recommendation keeps State Change notifications constrained to just the Initiator and Target in the zone and they do not need to process the rest of the fabric. State Change notifications are bounded/minimized.

Examples:
Four paths to storage - prevent path explosion and keep IxL count low

Consider an ESXi host with a dual-port HBA and a four node FC cluster.
Create "Zone A" with ESXi host and FC Node A HBA-1 Port-A
Create "Zone B" with ESXi host and FC Node B HBA-1 Port-A
Create "Zone C" with ESXi host and FC Node C HBA-1 Port-A
Create "Zone D" with ESXi host and FC Node D HBA-1 Port-A

Next host is similar to the first - instead of zoning to FC node HBA-1 Port-A, make the zones to HBA-2 Port-A (you only need two fibers per node)

1091832_1.png

Four paths to storage for two node FC cluster - prevent path explosion and keep IxL count low.

Consider an ESXi host with a dual-port HBA and a two node FC cluster.
Create "Zone A" with ESXi -1 host and FC Node A HBA-1 Port-A
Create "Zone B" with ESXi -1 host and FC Node A HBA-2 Port-A
Create "Zone C" with ESXi -1 host and FC Node B HBA-1 Port-A
Create "Zone D" with ESXi -1 host and FC Node B HBA-2 Port-A

For the next ESXi host , create four zones using port B of FC node HBAs so that load is distributed. This way we utilize the FC ports on node effetively to balance the load.

Example:
Create "Zone A" with ESXi-2 host and FC Node A HBA-1 Port-B
Create "Zone B" with ESXi-2 host and FC Node A HBA-2 Port-B
Create "Zone C" with ESXi-2 host and FC Node B HBA-1 Port-B
Create "Zone D" with ESXi-2 host and FC Node B HBA-2 Port-B

 For more information on iXL count and other FC node limits,see KB What are the SolidFire FC cluster limits?

FC node non-optimal as Cluster Master

  • The The primary task of the FC node is to manage the transfer of data between front end fibre channel interfaces and back end storage.
  • Additional load is placed on the node when it takes on Cluster Master activity. In a heavily loaded system, this contention can increase latency.
  • In the short term, to restore operating margins, work with your Support representative to demote any FC node that is a Cluster Master.

For additional recommendation, contact SolidFire Support team.

LACP mandatory for all Bond10G interfaces

  • Without LACP configured on the data path cluster node ports and corresponding switch ports, there may not be enough bandwidth available for optimal cluster operation.
  • In a heavily loaded system, this network bottleneck can increase latency.
  • LACP on all SolidFire FC and storage nodes for Bond10G interfaces (storage network) and corresponding switch ports is required

 

Configuring Max I/O Size

Fibre Channel nodes support a max I/O size of 2MB; the nodes advertise this requirement to Fibre Channel initiators during the login process. In certain circumstances, VMware ignores this limit, causing failures in tasks that send I/O requests larger than 2MB to the SolidFire cluster, such as backup jobs.

There are two possible workarounds:

  • You can set the VMware advanced setting Disk.DiskMaxIOSize to 2048 according to the instructions in VMware KB article 1003469. This limits all workloads on the ESXi host to the specified I/O size.
  • You can reconfigure the application issuing the large I/Os to issue a maximum I/O size of 2MB.

For example, to limit the max I/O size for the Veeam® Backup & Replication™ software, complete the following task:

Procedure

1. Log in to the Veeam proxy server.

2. Open the registry editor (regedit.exe).

3. Navigate to HKEY_LOCAL_MACHINE/SOFTWARE/Veeam/Veeam Backup and Replication.

4. Create a new DWORD key named VddkPreReadBufferSize and set the decimal value to 2097152 (2 MB).

5. Restart the Veeam Backup Service or reboot the proxy.

Configuring this key decreases the I/O size that Veeam requests from 4 MB, which is the default, to 2 MB.

Additional Information

Add your text here.