Skip to main content
NetApp Knowledge Base

StorageGRID compute node reports overheating due to no fans installed

Views:
54
Visibility:
Public
Votes:
0
Category:
storagegrid
Specialty:
sgrid
Last Updated:

Applies to

  • NetApp StorageGRID
  • Appliance Model SG1000

Issue

CPU overheating detected on StorageGRID Appliance node which can lead to a reboot due to damaged components:

[  608.522299] CPU79: Package temperature above threshold, cpu clock throttled (total events = 118108)
[  608.522327] CPU78: Package temperature above threshold, cpu clock throttled (total events = 118110)
[  608.525326] CPU79: Package temperature/speed normal
[  608.525331] CPU77: Package temperature/speed normal
[  608.525340] CPU78: Package temperature/speed normal
[  608.689771] CPU74: Package temperature above threshold, cpu clock throttled (total events = 118058)
[  608.692332] CPU58: Package temperature above threshold, cpu clock throttled (total events = 176869)
[  608.692341] CPU57: Package temperature above threshold, cpu clock throttled (total events = 176870)
[  608.692344] CPU56: Package temperature above threshold, cpu clock throttled (total events = 176869)
[  608.692359] CPU59: Package temperature above threshold, cpu clock throttled (total events = 176866)
[  608.692369] CPU55: Package temperature above threshold, cpu clock throttled (total events = 176867)
[  608.693295] CPU56: Package temperature/speed normal
[  608.693301] CPU59: Package temperature/speed normal
[  608.693305] CPU58: Package temperature/speed normal
[  608.693324] CPU57: Package temperature/speed normal
[  608.693328] CPU55: Package temperature/speed normal
[ 1070.198284] mlx5_core 0000:af:00.0: device's health compromised - reached miss count
[ 1070.206809] mlx5_core 0000:af:00.0: assert_var[0] 0x00000073
[ 1070.213230] mlx5_core 0000:af:00.0: assert_var[1] 0x00000073
[ 1070.219391] mlx5_core 0000:af:00.0: assert_var[2] 0x00000000
[ 1070.225546] mlx5_core 0000:af:00.0: assert_var[3] 0x00000000
[ 1070.231663] mlx5_core 0000:af:00.0: assert_var[4] 0x00000000
[ 1070.237782] mlx5_core 0000:af:00.0: assert_exit_ptr 0x00a4557c
[ 1070.244029] mlx5_core 0000:af:00.0: assert_callra 0x009a4d90
[ 1070.250132] mlx5_core 0000:af:00.0: fw_ver 16.25.1020
[ 1070.255621] mlx5_core 0000:af:00.0: hw_id 0x0000020d
[ 1070.261080] mlx5_core 0000:af:00.0: irisc_index 0
[ 1070.266316] mlx5_core 0000:af:00.0: synd 0x10: High temperature
[ 1070.272730] mlx5_core 0000:af:00.0: ext_synd 0x0000
[ 1070.278119] mlx5_core 0000:af:00.0: raw fw_ver 0x101903fc
[ 1070.710279] mlx5_core 0000:af:00.1: device's health compromised - reached miss count
[ 1070.718920] mlx5_core 0000:af:00.1: assert_var[0] 0x00000073
[ 1070.725245] mlx5_core 0000:af:00.1: assert_var[1] 0x00000073
[ 1070.731315] mlx5_core 0000:af:00.1: assert_var[2] 0x00000000
[ 1070.737395] mlx5_core 0000:af:00.1: assert_var[3] 0x00000000
[ 1070.743471] mlx5_core 0000:af:00.1: assert_var[4] 0x00000000
[ 1070.749583] mlx5_core 0000:af:00.1: assert_exit_ptr 0x00a4557c
[ 1070.755911] mlx5_core 0000:af:00.1: assert_callra 0x009a4d90
[ 1070.762109] mlx5_core 0000:af:00.1: fw_ver 16.25.1020
[ 1070.767723] mlx5_core 0000:af:00.1: hw_id 0x0000020d
[ 1070.773252] mlx5_core 0000:af:00.1: irisc_index 0
[ 1070.778607] mlx5_core 0000:af:00.1: synd 0x10: High temperature
[ 1070.785116] mlx5_core 0000:af:00.1: ext_synd 0x0000

 

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.