What is database logging due to misaligned I/O?
Applies to
Ontap OS
Answer
In regard to performance issues with NetApp filers, one of the common causes of slow filer performance is partial writes. Data ONTAP and WAFL can normally complete user write requests quickly because it is guaranteed the write is successfully stored once it is written to NVRAM. This operation commonly takes less than 1 millisecond, providing fast write performance to NAS and SAN users.
This assumes the write request begins on a 4k WAFL block boundary and is divisible by 4k, and is not less than 4k in size. The result of this criteria not being met is partial writes, which involves a read, a merge of the data to be written with the data read, and then writing the result to a new block. This process can cause write to suspend where it otherwise might not, adding time to the processing of the operation and increasing latency.
A common source of partial writes is a misaligned LUN I/O. The indicator of a misaligned I/O is a LUN found in the write alignment histograms in the tats perfstat_lun
section of perfstat.
OEM Note
For more information on misaligned I/Os, see KB: What is an unaligned I/O?
This article is intended to discuss the case where a misaligned I/O is caused by MS SQL (or other database) log writes, and the method to determine whether they arecausing a performance impact on the filer.
This type of I/O is commonly seen as writes that arrive in multiple non-0 buckets, with a certain percentage of write_partial_blocks
, as well:
lun:/vol/sql_log/lundggtHJeWTKaT:write_align_histo.0:0%
lun:/vol/sql_log/lundggtHJeWTKaT:write_align_histo.
1:4%
lun:/vol/sql_log/lundggtHJeWTKaT:write_align_histo.2:0%
lun:/vol/sql_log/lundggtHJeWTKaT:write_align_histo.
3:6%
lun:/vol/sql_log/lundggtHJeWTKaT:write_align_histo.4:0%
lun:/vol/sql_log/lundggtHJeWTKaT:write_align_histo.
5:57%
lun:/vol/sql_log/lundggtHJeWTKaT:write_align_histo.
6:21%
lun:/vol/sql_log/lundggtHJeWTKaT:write_align_histo.7:0%
lun:/vol/sql_log/lundggtHJeWTKaT:read_partial_blocks:
4%
lun:/vol/sql_log/lundggtHJeWTKaT:
write_partial_blocks:6%
Commonly a histogram that appears as above will be misinterpreted as a misaligned LUN, but further investigation of the LUN type will reveal the LUN is the correct type for the host application. In fact, the pattern above is common in database log applications. In this case, while an MS SQL Server transaction logging is discussed, these principals might apply to other database applications and other workloads as well.
This article serves to answer the question that often arises in this situation:
'When the LUN is of the correct type and proper alignment on the host side has been ensured, why are there misaligned I/Os on this LUN?'
For this example, a Windows 2008 server is considered. The NetApp LUN type should be windows_2008. I/O to the NTFS filesystem created on this LUN should be automatically aligned because the starting offset will be corrected by default, and the NTFS uses system buffer cache, assuring writes operations are divisible by 4k.
So how can SQL Server logging on this NTFS partition cause a misaligned I/O if NTFS assures properly aligned writes?
The answer is, the Windows CreateFile function provides the FILE_FLAG_NO_BUFFERING
flag which disables this system cache when reading or writing a file. Currently available versions of SQL Server use this flag and do not always attend to alignment concerns so writes can arrive out of alignment. For more information on this flag, see Microsoft's article on File Buffering.
Though SQL Server transaction logging is seen to create a misaligned I/O, it does not necessarily create a significant impact on the filer performance. Take for example the illustration below which involves a write of a 512k SAN block consisting of a batch of SQL logs:
Note: A512k SAN write consumes 128 4k WAFL blocks. If the SAN block arrives misaligned, starting in the middle of a WAFL block as seen in the above operation which starts in bucket 5 of the first WAFL block, that operation is considered to be misaligned and this operation will show up in write_align_histo.5
(bucket 5). Also, the first and last WAFL block used to service this request will be partial writes. In this case, however, for those two partial blocks, there are 126 non-partial blocks, so the impact in this scenario is minimal.
When evaluating write alignment histograms, the average LUN operation size should be considered. Based on the above scenario, larger operation sizes should create less impact than smaller operation sizes.
In order to determine the degree of impact actually being experienced on the filer, observe the following statistics from wafl_susp –w:
pw.over_limit
is the number of occurrences when thewp.partial_write
count exceeds thepartial_write_limit
.partial
_write_limit
was once fixed at 50, but in newer versions of Data ONTAP it is platform dependent.WAFL_WRITE
is the actual number of newwafl_write
messages received by the filer during the iteration.
Though the relationship between pw.over_limit
and wafl_write
new messages is not absolute, as a rule of thumb, the impact can be determined by calculating the ratio of pw.over_limit
to new WAFL_WRITE
messages.
For example:
pw.over_limit = 90145
WAFL_WRITE (from â??New Messagesâ?? section) = 603444
90145 / 603444\xa0= .15 or 15%
In this case, around 15 of every 100 new WAFL writes must synchronously read blocks in order to merge the data to complete the write, as compared to simply writing the data to NVRAM. While WAFL will handle a small number of partial writes asynchronously without blocking the write acknowledgement, once WAFL has more than 50 partial writes outstanding, it begins handling these operations synchronously, which can significantly increase the time required to complete a write. These writes are reflected by the pw.over_limit
counter. The higher the above percentage, the more likely that it causes a performance impact. How high the percentage must be to cause performance impact is dependent on any number of factors, thus a pw.over_limit/wafl_write '
high watermark' cannot be declared arrived at.
Additional Information
additionalInformation_text