Solaris host support considerations in a MetroCluster configuration
Applies to
- Solaris host support considerations in a MetroCluster configuration
- MetroCluster
- ONTAP 9
Answer
By default, Solaris OS can survive 'All Path Down' (APD) up to 20 seconds; this is controlled by the fcp_offline_delay parameter.
In order for the Solaris hosts to continue without any disruption during all MetroCluster workflows, like Negotiated Switchover, Switchback, Tiebreaker unplanned Switchover, and Automated Unplanned Switchover, it is recommended to set the fcp_offline_delay to 120s.
Important MetroCluster Support Considerations:
Host response to Local HA failover |
When the fcp_offline_delay value is increased, application service resumption time increases during a local HA failover (such as a node panic followed by surviving node takeover of the panicking node.) |
FCP error handling |
With the default value of fcp_offline_delay, when the initiator port connection fails, the fcp driver takes 110s to notify the upper layers (MPxIO). Once the fcp_offline_delay is increased to 120s, the total time taken by the driver to notify the upper layers (MPxIO) is 210s; this may cause an I/O delay. Refer Oracle Doc ID: 1018952.1. When a fibre channel port fails, an additional 110 second delay may be seen before the device is offlined. |
Co-Existence with 3rd party arrays |
As the fcp_offline_delay parameter is a global parameter, and may affect the interaction with all storage connected to the FCP driver. |
How to modify the setting for the fcp_offline_delay.
For Solaris 10u8, 10u9, 10u10 and 10u11: |
For Solaris 11 |
Host Recovery example:
In the event of a disaster failover or an unplanned Switchover happening and taking abnormally long (exceeding 120s) time, which may cause the host application to fail, see the example below before remediating the host applications:
Zpool Recovery:
Ensure all the LUNs are online.
Run the following commands:
# zpool list
NAME SIZE ALLOC FREE CAP HEALTH ALTROOT
n_zpool_site_a 99.4G 1.31G 98.1G 1% OFFLINE -
n_zpool_site_b 124G 2.28G 122G 1% OFFLINE -
Check the individual pool status:
# zpool status n_zpool_site_b
pool: n_zpool_site_b
state: SUSPENDED ==============è>>>>>>>>>>>>>> POOL SUSPENDED
status: One or more devices are faulted in response to IO failures.
action: Make sure the affected devices are connected, then run 'zpool clear'.
see: http://www.sun.com/msg/ZFS-8000-HC
scan: none requested
config:
NAME STATE READ WRITE CKSUM
n_zpool_site_b UNAVAIL 1 1.64K 0 experienced I/O failures
c0t600A098051764656362B45346144764Bd0 UNAVAIL 1 0 0 experienced I/O failures
c0t600A098051764656362B453461447649d0 UNAVAIL 1 40 0 experienced I/O failures
c0t600A098051764656362B453461447648d0 UNAVAIL 0 38 0 experienced I/O failures
c0t600A098051764656362B453461447647d0 UNAVAIL 0 28 0 experienced I/O failures
c0t600A098051764656362B453461447646d0 UNAVAIL 0 34 0 experienced I/O failures
c0t600A09805176465657244536514A7647d0 UNAVAIL 0 1.03K 0 experienced I/O failures
c0t600A098051764656362B453461447645d0 UNAVAIL 0 32 0 experienced I/O failures
c0t600A098051764656362B45346144764Ad0 UNAVAIL 0 34 0 experienced I/O failures
c0t600A09805176465657244536514A764Ad0 UNAVAIL 0 1.03K 0 experienced I/O failures
c0t600A09805176465657244536514A764Bd0 UNAVAIL 0 1.04K 0 experienced I/O failures
c0t600A098051764656362B45346145464Cd0 UNAVAIL 1 2 0 experienced I/O failures
The above pool has degraded.
Run the following commands to clear the pool status:
#zpool clear n_zpool_site_b
Check the pool again:
# zpool status n_zpool_site_b
pool: n_zpool_site_b
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: http://www.sun.com/msg/ZFS-8000-8A
scan: none requested
config:
NAME STATE READ WRITE CKSUM
n_zpool_site_b ONLINE 0 0 0
c0t600A098051764656362B45346144764Bd0 ONLINE 0 0 0
c0t600A098051764656362B453461447649d0 ONLINE 0 0 0
c0t600A098051764656362B453461447648d0 ONLINE 0 0 0
c0t600A098051764656362B453461447647d0 ONLINE 0 0 0
c0t600A098051764656362B453461447646d0 ONLINE 0 0 0
c0t600A09805176465657244536514A7647d0 ONLINE 0 0 0
c0t600A098051764656362B453461447645d0 ONLINE 0 0 0
c0t600A098051764656362B45346144764Ad0 ONLINE 0 0 0
c0t600A09805176465657244536514A764Ad0 ONLINE 0 0 0
c0t600A09805176465657244536514A764Bd0 ONLINE 0 0 0
c0t600A098051764656362B45346145464Cd0 ONLINE 0 0 0
errors: 1679 data errors, use '-v' for a list
Check the pool status again; here a disk in the pool is degraded.
[22] 05:44:07 (root@host1) /
# zpool status n_zpool_site_b -v
cannot open '-v': name must begin with a letter
pool: n_zpool_site_b
state: DEGRADED
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: http://www.sun.com/msg/ZFS-8000-9P
scan: scrub repaired 0 in 0h0m with 0 errors on Fri Dec 4 05:44:17 2015
config:
NAME STATE READ WRITE CKSUM
n_zpool_site_b DEGRADED 0 0 0
c0t600A098051764656362B45346144764Bd0 ONLINE 0 0 0
c0t600A098051764656362B453461447649d0 ONLINE 0 0 0
c0t600A098051764656362B453461447648d0 ONLINE 0 0 0
c0t600A098051764656362B453461447647d0 ONLINE 0 0 0
c0t600A098051764656362B453461447646d0 ONLINE 0 0 0
c0t600A09805176465657244536514A7647d0 DEGRADED 0 0 0 too many errors
c0t600A098051764656362B453461447645d0 ONLINE 0 0 0
c0t600A098051764656362B45346144764Ad0 ONLINE 0 0 0
c0t600A09805176465657244536514A764Ad0 ONLINE 0 0 0
c0t600A09805176465657244536514A764Bd0 ONLINE 0 0 0
c0t600A098051764656362B45346145464Cd0 ONLINE 0 0 0
errors: No known data errors
Clear the disk error by running the following command:
# zpool clear n_zpool_site_b c0t600A09805176465657244536514A7647d0
[24] 05:45:17 (root@host1) /
# zpool status n_zpool_site_b -v
cannot open '-v': name must begin with a letter
pool: n_zpool_site_b
state: ONLINE
scan: scrub repaired 0 in 0h0m with 0 errors on Fri Dec 4 05:44:17 2015
config:
NAME STATE READ WRITE CKSUM
n_zpool_site_b ONLINE 0 0 0
c0t600A098051764656362B45346144764Bd0 ONLINE 0 0 0
c0t600A098051764656362B453461447649d0 ONLINE 0 0 0
c0t600A098051764656362B453461447648d0 ONLINE 0 0 0
c0t600A098051764656362B453461447647d0 ONLINE 0 0 0
c0t600A098051764656362B453461447646d0 ONLINE 0 0 0
c0t600A09805176465657244536514A7647d0 ONLINE 0 0 0
c0t600A098051764656362B453461447645d0 ONLINE 0 0 0
c0t600A098051764656362B45346144764Ad0 ONLINE 0 0 0
c0t600A09805176465657244536514A764Ad0 ONLINE 0 0 0
c0t600A09805176465657244536514A764Bd0 ONLINE 0 0 0
c0t600A098051764656362B45346145464Cd0 ONLINE 0 0 0
errors: No known data errors
or export and import the zpool.
# zpool export n_zpool_site_b
# zpool import n_zpool_site_b
The pool is now online.
If the above steps do not recover the pool, reboot the host.
Storage Virtual Machine(SVM) (metaset)
Ensure all the LUNs are online, reboot the system and then mount the Storage Virtual Machine(SVM).
Additional Information
Add your text here.