Skip to main content
NetApp Knowledge Base

Single SSD causing performance issue

Views:
3,386
Visibility:
Public
Votes:
2
Category:
ontap-9
Specialty:
perf
Last Updated:

Applies to

  • AFF, ASA and C-Series systems
  • ONTAP versions without a fix for Bug ID 1479263

Issue

  • A single problematic SSD drive can cause performance issues on an aggregate due to read/write I/O latency.
  • If the disk is partitioned, the disk can impact both HA controller partners and more than one aggregate.
  • You see high latency on a single SSD (For example: disk 0c.01.5):
node> statit -e  
                       Disk Statistics (per second)  
        ut% is the percent of time the disk was busy.  
        xfers is the number of data-transfer commands issued per second.  
        xfers = ureads + writes + cpreads + greads + gwrites  
        chain is the average number of 4K blocks per command.  
        usecs is the average disk round-trip time per 4K block.  
disk             ut%  xfers  ureads--chain-usecs writes--chain-usecs cpreads-chain-usecs greads--chain-usecs gwrites-chain-usecs
/aggr1/plex0/rg0:

0a.00.9            2 275.84    0.00   ....     .  95.38  36.71    32 180.46  18.28    41   0.00   ....     .   0.00   ....     .
0a.00.1            2 276.54    0.50   1.40   120  95.88  36.57    31 180.16  18.30    40   0.00   ....     .   0.00   ....     .
0a.00.3            1 2659.57  2030.59   3.70   131 266.35   7.80    89 362.63   2.86   210   0.00   ....     .   0.00   ....     .
3d.00.4            1 2667.07  2047.99   3.79   112 261.65   8.27    56 357.43   2.93   143   0.00   ....     .   0.00   ....     .
0a.00.5            1 2733.05  2096.08   3.72   108 271.35   8.25    89 365.63   2.95   153   0.00   ....     .   0.00   ....     .
3d.00.6            1 2506.70  1916.42   3.43   124 243.45   8.19    66 346.83   2.85   146   0.00   ....     .   0.00   ....     .
0a.00.7            1 2450.61  1897.82   3.47   109 224.46   8.40    84 328.33   2.84   150   0.00   ....     .   0.00   ....     .
3d.00.8            1 2462.91  1902.72   3.58   117 228.55   8.35    69 331.63   2.89   149   0.00   ....     .   0.00   ....     .
3d.00.10           1 2500.00  1913.12   3.45   117 238.25   7.96    78 348.63   2.76   152   0.00   ....     .   0.00   ....     .
3d.00.2            1 2428.81  1839.93   3.54   117 243.75   7.98    88 345.13   2.92   149   0.00   ....     .   0.00   ....     .
3d.00.0            1 2451.11  1877.52   3.44   120 237.35   8.17    97 336.23   2.89   153   0.00   ....     .   0.00   ....     .
0c.01.5           95 2352.92  1538.77   6.53  2579 385.19  12.08  2353 428.96   3.56  2176   0.00   ....     .   0.00   ....     .
  • The EMS logs show disk errors similar to the following:

Tue May 17 08:06:00 +0000 [node1: scsi_cmdblk_strthr_admin: scsi.cmd.checkCondition:error]: Disk device 0c.01.5: Check Condition: CDB 0x8a:000000019b222800:00000120: Sense Data SCSI:aborted command -  (0xb - 0x2f 0x14 0x0)(4509).
Tue May 17 08:06:00 +0000 [node1: scsi_cmdblk_strthr_admin: scsi.cmd.checkCondition:error]: Disk device 0c.01.5: Check Condition: CDB 0x8a:000000019b222928:00000010: Sense Data SCSI:aborted command -  (0xb - 0x2f 0x14 0x0)(4512).
Tue May 17 08:06:00 +0000 [node1: scsi_ecmdblk_strthr_admin: scsi.cmd.checkCondition:error]: Disk device 0c.01.5: Check Condition: CDB 0x8a:000000019b222940:000000c0: Sense Data SCSI:aborted command -  (0xb - 0x2f 0x14 0x0)(4514).

  • The drive recovers a short time later:

Tue May 17 08:06:00 +0000 [node1: scsi_cmdblk_strthr_admin: scsi.cmd.retrySuccess:debug]: Disk device 0c.01.5: request successful after retry #0/#1: cdb 0x8a:000000019b222800:00000120 (5017).
Tue May 17 08:06:00 +0000 [node1: scsi_cmdblk_strthr_admin: scsi.cmd.retrySuccess:debug]: Disk device 0c.01.5: request successful after retry #0/#1: cdb 0x8a:000000019b222940:000000c0 (5017).

  • This disk may be partitioned which will impact more than one aggregate as seen below:

Tue May 17 08:06:00 +0000 [node1: wafl_exempt00: wafl.cp.toolong:error]: Aggregate aggr1 experienced a long CP.
Tue May 17 08:06:45 +0000 [node1: wafl_exempt00: wafl.cp.toolong:error]: Aggregate aggr2 experienced a long CP.

 

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.