Skip to main content
NetApp Knowledge Base

How to troubleshoot frontend SAN issues using the fcHosts 3 command (part of supportdata collection)

Views:
2,195
Visibility:
Public
Votes:
1
Category:
fc-series
Specialty:
esg
Last Updated:

Applies to

Fibre channel

Description

Introduction

This article describes how the fcHosts 3 shell command can be used to find the bad component in a frontend SAN if too many bad FC frames are received. In such a case, the controller logs one of the following events in the Major Event Log (MEL):

0x1207 Fibre channel link errors - threshold exceeded
or
0x1206 Fibre channel link errors continue

The thresholds are defined in NVSRAM in Offset 0x38. The rule of thumb is, if the event mentioned above is seen in the Event Log, the user should also see an impact on the affected server(s), and therefore, the issue should be investigated.

Overview

  • The fcHosts 3 shell command is very old but a useful command. It is part of the supportdata collection (captured in statecapturedata.txt) if an FC host connection is found.
  • The output displays the history of the communication between the FC HBA and the controller port to which the HBA is logged.
  • The maximum number of events that are listed is 50.
  • The downside of the command is that the output lists only the time and not the date; therefore, the events could have occurred days ago. However, in case of an issue, a lot of events are usually logged within a short period of time and, most of the time, it is from the present day the support data was captured.

Example:

The following is an example of the information provided:

Executing fcHosts(3,0,0,0,0,0,0,0,0,0) on controller A:

<-snip->

=============== HOST 10 =====================

Hst-Role(Ch) PortId       PortWwn            NodeWwn      DstNPort  CmdRecv Label
10-Host( 2)  0e0000  10008c7c-ff2057ba  20008c7c-ff2057ba 0eaabf80   195719 SRV-MDC2-HBA-P0

PERMITS: 0x00000008 HsdPort
FLAGS:   0x00001406 Plogi Prli LoginRcvd Analyze
LastActivity: 11/19/13-17:15:47 (GMT)

HOST LOG==> logCtl:0eab1540  logIndex:   5  goodIoCount:88836
            dstNPort:0eaabf80  maxIndex:  50   logIoCount:1

                                        RepeatCounts -- IO Types
                                   (R=read,W=write,O=other,N=nonScsi)
Num Time       LogCode  Qualifier   LogCode     GoodIo      Outstand
                                    Cnt Type    Cnt Type    Cnt Type
  1 11:58:40     First   00000000     1 ----  >100K RWO-      1 ----
  2 13:45:47  RscnRecv   000e0000     1 ----      1 ----      1 ----
  3 13:45:47    Logout    RscnMis     1 ----      1 ----      1 ----
  4 13:49:51     Login   ff2057ba     1 ----   <100 R---      1 ----
  5 13:49:51   ChkCond   06290400   <10 R---   <100 R---      1 R---
  6 13:49:51  RscnRecv   000e0000     1 ----  <100K RWO-      1 ----

Explanation:

  • 10-Host   : 10 is the same as the ITN number in tditnall output of the same controller
  • Ch(X)     2 is the channel (Host Port) the HBA is logged into. Use fcAll/chall to find out the host port
  • PortId    : 0e0000 is the 24 bit address of the switch port the HBA is connected to
  • PortWwn   : 10008c7c-ff2057ba is the FC Port WWN of the HBA
  • NodeWwn   : 20008c7c-ff2057ba is the FC Node WWN of the HBA
  • CmdRecv   : 195719 shows how many SCSI command where received from this HBA
  • Label     : SRV-MDC2-HBA-P0 is the Alias of the HBA defined in Santricity
  • Time      : The Time when the event happened (there is no Date).
  • LogCode   : The event that happened
  • Qualifier : A qualifies of the event in case of a check condition (ChkCond) it is the sense data
  • LogCodeCnt: Number of consecutive occurrences of this logCode event
  • GoodIo Cnt: Number of IOs returned with good status after 1st occurrence
  • Outst. Cnt: Number of outstanding IOs when 1st occurrence logged

The following is seen from the example above:

The HBA with WWPN 10008c7c-ff2057ba is connected to the FC switch port 0x0e0000. It is connected to the controller through channel 2 (use fcAll/chall to find the real host port). The user has given the HBA in the Host mapping section of SANtricity the Alias SRV-MDC2-HBA-P0. From the History, it can be observed that the HBA First (beginning of the capture) sent multiple IOs without issues, and then sent a Rescan following an FC Port Logout and Login. The controller confirmed the Logout/Login by returning a check condition with a sense key of 06 asc 29 ascq 04 which decodes to "Device Internal Reset". The HBA then sent another Rescan. Overall, there is no indication of a communication issue between the HBA and the controller. A few Login/Logouts are usually not an issue.

List of LogCodes and Qualifiers

LogCodes:

  • AbtsRecv =  Session Abort received (is an indication of a path issue)
  • ChkCond  =  Controller send SCSI check condition (sense data) to HBA (see Qualifier for details)
  • First    =  Start of capture
  • GoodIo   =  HBA send good IO
  • Login    =  HBA did a Port Login into the controller
  • Logout   =  HBA logged out of the controller
  • LinkDown =  Connection to the HBA is down
  • Qfull    =  Queue full condition met
  • ResvCon  =  Controller returned a reservation conflict to the HBA (could be normal in a cluster configuration!)
  • RscnRecv =  HBA send a Rescan
  • ScsiStat =  Other SCSI status occurred

Qualifiers (most common only)

  • Count    =  Count (Lowlevel FC error. Indication of a path issue)
  • Discnct  =  Disconnect
  • FreezeTO =  Freeze Timeout
  • Logo     =  Logout
  • Observed =  Event observed
  • ReplyTO  =  Replay Timeout (Indication of a path issue)
  • RscnMis  =  Rescan device missing

The following are two examples of an HBA having issues talking to the controller:

                                        RepeatCounts -- IO Types
                                   (R=read,W=write,O=other,N=nonScsi)
Num Time       LogCode  Qualifier   LogCode     GoodIo      Outstand  
                                    Cnt Type    Cnt Type    Cnt Type42 15:20:40    GoodIo   00000000     1 ----   <100 -W--   1 -W-- 
 43 15:21:16  SetError    ReplyTO     1 -W--   <100 -W--      1 -W-- 
 44 15:21:21   ChkCond   0b470000    <5 -W--   <100 -W--     <5 -W-- 
 45 15:21:26  SetError    ReplyTO     1 -W--    <10 RW--      1 -W-- 
 46 15:21:28   ChkCond   0b470000    <5 -W--   <100 -W--      1 -W-- 
 47 15:22:36  AbtsRecv   00000000     1 -W--    <1K RW--      1 -W-- 
 48 15:22:42   ChkCond   0b470000   <10 -W--    <1K RW--      1 -W-- 
 49 15:23:11    GoodIo   00000000     1 ----    <10 -W--      1 -W-- 
 50 15:23:12   ChkCond   0b470000   <10 -W--    <1K RW--      1 -W-- 
The sense 0b/47/00 decodes to "SCSI Parity Error".
In a FC work this means that the the controller received a FC frame with incorrect CRC.

and:

 42 17:13:11  SetError    ReplyTO    <5 -W--    <1K -W--     <5 -W-- 
 43 17:13:14    GoodIo   00000000     1 ----    <1K -W--    <10 -W-- 
 44 17:13:14  SetError    ReplyTO    <5 -W--    <1K -W--     <5 -W-- 
 45 17:13:16    GoodIo   00000000     1 ----    <1K -W--     <5 -W-- 
 46 17:13:19  SetError    ReplyTO     1 -W--  <100K RW--      1 -W-- 
 47 17:13:55  SetError      Count    <5 -W--    <1K -W--      1 -W-- 
 48 17:13:56    GoodIo   00000000     1 ----    <1K -W--     <5 -W-- 
 49 17:14:05  SetError    ReplyTO    <5 -W--   <100 R---     <5 -W-- 
 50 17:14:12    GoodIo   00000000     1 ----  <100K RW--      1 R--- 

Note: If there is an HBA with a lot of the above errors, it does NOT automatically mean the HBA is faulty. It means the fc frames are corrupted or dropped somewhere between this HBA and the Controller. A bad HBA is just one possible candidate causing the issue.

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.