- Brocade switch
What is the information given from a portErrShow output on a Brocade switch?
porterrshow are cumulative since the last time errors were cleared. For proper analysis the stats for the ports should be cleared (portstatsclear or statsclear) and the switch run for some time period in order to see what errors are currently accumulating if any.
The output displays message similar to the following:
frames enc crc crc too too bad enc disc link loss loss frjt fbsy c3timeout
tx rx in err g_eof shrt long eof out c3 fail sync sig tx rx
0: 219.5m 204.4m 0 0 0 0 0 0 1 0 2 0 2 0 0 0 0
1: 2.5g 542.3m 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0
2: 4.0g 563.1m 0 0 0 0 0 0 8 0 0 0 2 0 0 0 0
3: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4: 1.8g 3.9g 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Frames tx/rx N/A counters representing the number of frames transmitted:
Enc_in: 8bit/10bit encoding errors inside frame. Words inside of frames are encoded, if this encoding is corrupted or an error is detected, enc_in is generated. Minimum compliance with the link bit error rate specification on a link continuously receiving frames would cause approximately one error every 20 minutes. Reinitialisation/reboots of the associated Nx-port can also cause these errors. Everything hitting the wire is encoded using 8/10b encoding. The Bit Error Rate (BER) formula is BER= Nerr/Nbits. The BER is calculated by comparing the transmitted sequence of bits to the received bits and counting the number of errors. The ratio of how many bits received in error over the number of total bits received is the BER. This measured ratio is affected by many factors including: signal to noise, distortion, and jitter.
Crc_err: crc errors - A mathematical formula generates counters at sending port. Receiving port uses the same formula to check and compare. Statistically, crc_err and enc_out errors together imply a GBIC/SFP problem. Also see
bad_eofbelow. CRC and ENC_IN are pointing to a SFP and/or ASIC issue. ENC_out may be seen on loops connecting to a fabric (FMC for example) if a disk is changed or the loop initializes for any other reason. This loop initialization may not be noticeable from ONTAP. Therefore, it is important to know to what a connection is being made and what is to be expected of this connection. Generally speaking CRC_errs indicate an issue with the SFP.
Too_long: FC frames are 2148 bytes maximum (frames that were longer than the FC maximum - SOF+header+2112bytes+CRC+EOF). If an eof is corrupted or data generation is incorrect, a too_long error is reported.
Too_short: The too_short error statistics counter is incremented whenever a frame, bounded by an SOF and EOF is received, and the number of words between the SOF and EOF is less than 7 words (6 words header plus 1 word CRC), i.e. 38 bytes (not 48) including the SOF and EOF. This could be caused by the transmitter or an unreliable link.
Bad_eof: After a loss of synchronization error, continuous-mode alignment allows the receiver to re-establish word alignment at any point in the incoming bit stream while the receiver is operational. If such a re-alignment occurs, detection of the resulting error condition is dependent upon higher level functions (such as invalid CRC or missing EOF).
Enc_out: 8bit/10bit encoding errors occurred in words (ordered sets) outside of the FC frame. Words outside of frames are encoded. If this encoding is corrupted or an error is detected, enc_out is generated. It indicates a problem if it increments faster than the link-bit error rate allows, approximately once every 20 minutes for 1 Gbit/s. Statistically, enc_out errors on their own imply a cable/connector problem. Enc_out errors and crc_err together imply a GBIC/SFP problem. Such errors are also expected every time a user brings a port down and up (reboot host, power-cycle storage subsystem, unplug/plug cable or portdisable/portenable, and so on). Such errors will also be generated on a link which has a 1Gbit/s port connected to a 2Gbit/s port when autonegotiation is turned off. Crc and enc_in are more likely to be an SFP and/or ASIC issue. Enc_out is more likely sfp/cable. Also, if connecting to a disk loop (FMC), it's more likely to see them rising, which may not necessarily indicate an issue. To spot a possible issue, investigate other counters that are not part of portErrShow.
Disc c3: Discard class 3 errors could be generated by the switch when devices send frames without FLOGIing first or with an invalid destination. This error is just reporting that a discard occurred. A frame can be discarded for a number of reasons; Timeout, destination unreachable, zone discard, or other reasons for discard. Most of the time you will see timeout, which means a frame is longer than E_D_TOV in the buffer. Disc/c3 is not trivial to troubleshoot as it is not always the port discarding the frame that is causing the issue.
Link-fail: If a port remains in the LR Receive State for a period of time greater than a timeout period (R_A_TOV), a link reset protocol timeout will be detected, which results in a link failure condition (enter the NOS transmit state). The link failure also indicates that loss of signal or loss of sync lasting longer than the R_ATOV value was
- detected while not in the offline state.
Loss sync: Synchronization failures on either bit or transmission-word boundaries are not separately identifiable and cause loss-of-synchronization errors. Such errors are also expected every time a user brings a port down and up ( reboot host, power-cycle storage subsystem, unplug/plug cable or portdisable/portenable, and so on).
Loss sig: Occurs when a signal is transmitted but none is being received on the same port. Such errors are also expected every time a user brings a port down and up (reboot host, power-cycle storage subsystem, unplug/plug cable or portdisable/portenable, and so on).
Frjt: If the fabric cannot process a class 2 frame, an F_RJT is returned
Frbsy: If a fabric cannot deliver a class 2 frame within E_D_TOV a F_BSY will be returned.
c3-timeout tx: The number of transmit class 3 frames discarded at the transmission port due to timeout (platform- and port-specific). This indicates an issue with the device connected to the switch.
c3-timeout rx: The number of receive class 3 frames received at this port and discarded at the transmission port due to timeout (platform- and port-specific). This indicates an issue with the port on the switch.
Note: These errors should always be seen in relation to each other and in relation to the device that is being connected. There is a difference between a Loop with 28 disks being connected and a HBA in fabric mode. Additionally, CRCs by themselves with no other errors likely have a different cause than CRCs that are accompanied by enc_out errors.