Skip to main content
NetApp Knowledgebase

How does the system determine the fan speed for FAS22xx systems?

Views:
143
Visibility:
Public
Votes:
0
Category:
not set
Specialty:
not set
Last Updated:

 

Applies to

FAS22XX Systems

Answer

Factors that affect fan speeds

The FAS22xx series storage systems combine a shelf and Processor Control Modules (PCMs) into a shelf with the supporting hardware necessary to power a system. By the time air from the outside has reached the PCM where the decision making regarding the fan speeds occur, it has already passed through the disk drives, over the shelf and through the mechanics of the canister holding the board. The workload of the board will also influence the components and will result in temperature readings that can be very different from the ambient temperature around the storage system. The following are a list of factors to take into account when understanding why the system is operating the fans at the current speed:

  • Workload
    If the system is performing non-cached reads or writes, more activity occurs on the disks, resulting in more heat being produced. Since air passes over the drives first, it will be warmer when it reaches the Central Processing Unit (CPU) and sensors controlling the fan speeds.
  • Drive Population/Density
    More drives in the shelf will result in a greater amount of heat in the air before it reaches the Service Processor (SP). 
  • Ambient Temperature
    The temperature of the air being pulled into the system is important, the cooler the better, as cool air can absorb more heat as it passes through the device.
  • Surrounding Air Flow
    Storage systems radiate heat through the components themselves.
How does the SP manage fan speeds

The SP uses In_Flow_Temp to determine the settings for the fans in the chassis. When the temperature crosses a certain limit, the SP sends a message to change the fan speed according to the thermal management plan. 

During startup, the system runs the fans at the highest speeds, because it has no information about the state of the system or the environment. The SP starts the fans at the lowest speed and after a few seconds it will have gathered enough information to request the fan settings using the temperature from the environmental sensors. For most systems, this will result in the system starting at a low speed, and then increasing to cool the system.

The following are the temperature thresholds for In_Flow_Temp (values are in Celsius):

Fan Speed

Increasing Temperature

Decreasing Temperature

Low

 

35 (FAS2220)
35 (FAS2240)

Medium

  43 (FAS2220)
  43 
(FAS2240)

45 (FAS2220)
43 
(FAS2240)

High

48 (FAS2220)
49 
(FAS2240)

 

The increasing and decreasing temperature concepts are important. When the system adjusts the fan speeds upward, it will use a different temperature threshold to adjust the fan speeds downward. This means that once the system has reached a certain state in the environment of increasing temperatures, it will not use the same temperature to decrease the fan speed.

In order to prevent the fans from adjusting to incorrect temperature readings, or from small variations in temperature, the SP employs a moving average of  In_Flow_Temp to make decisions regarding the fan settings. A moving average is an exponential smoothing technique, well-suited for time series data measuring a physical phenomenon like temperature. In most cases, the value of the moving average is equal to the In_Flow_Temp sensor, but this will lag the actual sensor readings when the sensor reports increasing or decreasing values. In cases where In_Flow_Temp is at the threshold, the averaging system will prevent the fan speeds from unnecessarily oscillating between two settings.

The table above shows that when a system boosts the fan speed, it must cool the system to a different threshold to decrease the speed of the fans. For instance if a system crossed the 'low to medium' threshold with a temperature of 43 degrees Celsius, it must return to a temperature of less than or equal to 35 degrees Celsius (using the moving average) to turn the fans down. If the system is cooled to 37 degrees Celsius, the temperature is below the threshold value that resulted in the fan speed increasing, but still above the threshold for decreasing the speed.

 

High temperature overrides

As an additional safety measure, the SP will boost the fan temperatures to high when either Out_Flow_Temp or CPU0_Temp_Margin reach their lower non-critical values.  The following are the values for the 2U and 4U systems (values are in Celsius).

Sensor

FAS2220 FAS2240

Out_Flow_Temp

55

55

CPU0_Temp_Margin

-5

-5

Out_Flow_Temp is a sensor that is positioned to sample the air as it exits the PCM canister. It is not the temperature of the air as it exits the storage system.  CPU0_Temp_Margin is a sensor on the CPU running ONTAP, which reports how close the temperature is to the critically hot value, so it is expected that the system is running at a number well below zero.

The sensors above follow the 'bang-bang' control model, so the values reported here will immediately boost the fans to the highest temperature setting, overriding the fan setting due to In_Flow_Temp; similarly, when the values fall below the thresholds, they will no longer influence the control of the fans. 

Relationship to ONTAP sensors

The In_Flow_Temp sensor used by the SP for making decisions regarding the fan speed is reported by ONTAP as 'In Flow Temp' in the sensor listings. Similarly, the CPU0_Temp_Margin sensor used by the SP is reported by ONTAP as 'CPU Temp Margin' and the SP's 'Out_Flow_Temp' is ONTAP's 'Out Flow Temp'. Without reporting a lag, these sensors will report the same values. 

Notes in the SP Logs

The SP creates a log of the fan settings that can be viewed through the SP logs. Check with NetApp customer support on how to access the log.

When the SP starts, the following entries will be displayed:

Note: The SP first has to determine the type of chassis in which it resides before making any decisions regarding setting the fan speeds.

Jan  1 00:00:51 (none) : [383 INFO]First time thermal code run
Jan  1 00:00:51 (none) : [383 INFO]Setting fan GPIOs 67 low, 70 low, 71 low, requested low
Jan  1 00:00:51 (none) : [383 INFO]Using 2U thermal policy
Jan  1 00:00:51 (none) : [383 INFO]thermal thresholds: low to mid (43), mid to high (48) high to mid (45) mid to low (35)
Jan  1 00:00:51 (none) : [383 INFO]Setting fan GPIOs 67 low, 70 low, 71 low, requested low

What these messages mean is that the SP finds that it is in a 2U system and will be using that thermal policy for the remainder of the time it is running. The thermal thresholds are displayed and the system will set the fans to low speed. The SP configures the fans at the lowest speed (the SP communicates the fan speed to the shelf using three General Purpose Input/Output (GPIOs), these are shown in the log to aid debugging), will sample In_Flow_Temp every second, and make a decision regarding configuring the fans.

When the temperature changes enough to trigger a change in the fan speed, the SP will report the following:
Jan  1 00:00:51 (none) : [383 INFO]Setting fan GPIOs 67 XXX, 70 XXX, 71 XXX, requested FAN_SPEED      

Where XXX is high or low, and FAN_SPEED is the requested fan speed - low, medium, or high. 

In the event that the fan speed is boosted because of Out_Flow_Temp or CPU0_Temp_Margin going over the threshold value, no record is made in the SP logs, because the system’s System Event Log (SEL) will contain entries for these events.

Additional Information

Add your text here.