ONTAP Select giveback vetoed by raid and partner iSCSI sessions up/down

Last updated
Save as PDF
Share
1. Share
2. Tweet
3. Share

Views:: 306

Visibility:: Public

Votes:: 0

Category:: ontap-select

Specialty:: ontapselect

Last Updated:

Applies to

ONTAP Select with Software Raid Configuration
ESXi 7.0U2 with multipathing configured to HPP (High Performance Plugin) and Path selection scheme to FIXED

Issue

Automatic or manual takeover was issued, in this example for node cluster-01 owning all *_01 aggregates
Previousely rebooted node boots into waiting for giveback, but all plexes associated with rebooted node remain offline and do not start synching:

::> aggr plex show

                    Is      Is         Resyncing

Aggregate Plex      Online  Resyncing    Percent Status

--------- --------- ------- ---------- --------- ---------------

aggr1_01  plex0     false   false              - failed,inactive

aggr1_01  plex1     true    false              - normal,active

aggr2_02  plex0     true    false              - normal,active

aggr2_02  plex1     false   false              - failed,inactive

aggr0_01  plex0     false   false              - failed,inactive

aggr0_01  plex4     true    false              - normal,active

aggr0_02  plex0     true    false              - normal,active

aggr0_02  plex4     false   false              - failed,inactive

8 entries were displayed.

::> storage aggregate plex online -aggregate aggr0_01 -plex plex0

Error: command failed: Failed to bring plex aggr0_01/plex0 online. Reason: Plex is failed and cannot be operated on.

Note: In the example, plexes > 0 for node cluster-02's aggregates, are owned by the down-node cluster-01.

Giveback for the rebooted node gets vetoed by raid:

::> storage failover giveback -ofnode cluster-01

::> storage failover show-giveback

               Partner

Node           Aggregate         Giveback Status

-------------- ----------------- ---------------------------------------------

Warning: Unable to list entries on node cluster-01. RPC: Couldn't make connection [from mgwd on node "cluster-02" (VSID: -1) to mgwd at 169.254.133.31]

cluster-02

               CFO Aggregates    Failed module: raid. Giveback vetoed: Cannot

                                 send all specified aggregates home. Use the

                                 "event log show -message-name

                                 gb.sfo.abort.raid.fm|gb.cfo.abort.raid.fm"

                                 command to get more information, and follow

                                 the provided corrective actions. To execute

                                 the giveback without checks, use the

                                 "override-vetoes" parameter.  Warning:

                                 overriding vetoes may result in a data

                                 service outage.

               aggr1             Not attempted yet

2 entries were displayed.

iSCSI sessions to the rebooted node are down and do not come up:

::*> storage iscsi-initiator show

                                                                       Status

Node Type Label    Target Portal      Target Name                      Admin/Op

---- ---- -------- ------------------ -------------------------------- --------

Warning: Unable to list entries on node cluster-01. RPC: Couldn't make connection [from mgwd on node "cluster-02" (VSID: -1) to mgwd at 169.254.133.31]

cluster-02

     mailbox

          2fef72d8-3b87-11ea-9c42-005056a16698-mailbox

                   10.0.0.1      iqn.2012-05.local:mailbox.target.select000000

                                                                         up/up

     partner

          2fee0ea2-3b87-11ea-9c42-005056a16698-partner

                   169.254.123.123:65200

                                      iqn.2012-06.com.bsdctl:target0     up/down

              Failure Reason:  no ping reply after 58218 seconds

     partner2

          2fee0ea2-3b87-11ea-9c42-005056a16698-partner2

                   169.254.123.123:65200

                                      iqn.2012-06.com.bsdctl:target0     up/down

              Failure Reason:  no ping reply after 58218 seconds

3 entries were displayed.

ESXi vmkernel.log shows the physical device in APd state and HPP in use:

2023-09-19T11:27:28.124Z cpu0:2097216)ScsiDeviceIO: 4176: Cmd(0x45b8c1940ac8) 0x9e, CmdSN 0x800101b8 from world 1234567 to dev "naa.51402ec010d30442" failed H:0x1 D:0x0 P:0x0 2023-09-19T11:27:28.926Z cpu22:2097425)ScsiVmas: 1074: Inquiry for VPD page 00 to device naa.51402ec010d30442 failed with error No connection 2023-09-19T11:27:28.930Z cpu17:2097424)HPP: HppIsDeviceAPD:5142: APD detected for HPP device "naa.51402ec010d30442". 2023-09-19T11:27:28.930Z cpu17:2097424)StorageDevice: 7060: End path evaluation for device naa.51402ec010d30442 2023-09-19T11:27:52.661Z cpu5:2097205)ScsiDeviceIO: 4176: Cmd(0x45b8d5516e08) 0x1a, CmdSN 0x80010005 from world 2152968 to dev "naa.51402ec010d30442" failed H:0x1 D:0x0 P:0x0