Skip to main content
NetApp Knowledge Base

Why is metadata consumption abnormally high when compared to the actual object ingest with EC scheme and Multipart uploads?

Views:
125
Visibility:
Public
Votes:
0
Category:
storagegrid
Specialty:
sgrid
Last Updated:

Applies to

  • StorageGRID 11.6
  • ILM rule as Erasure Coding
  • Multipart Uploads

Answer

Metadata entries does not co-relate with the size of the objects ingested to the GRID as they are information about the object. Each metadata entry takes up similar size / space in Cassandra database.
 
The Metadata space utilization can be high if there is an ILM rule performing Erasure Coding (EC). Taking example of EC 2+1:
  • Each object is broken into 2 parts of same size and 1 parity object is generated. i.e. 10 MB object will be 5MB + 5MB part and 1 parity which will take space as per disk space = object size + (object size * storage overhead) formula.
  • This will create 3 metadata entries in Cassandra as each part is treated as a separate object in GRID.
  • This causes each object upload to take x3 times metadata entries.
While doing multipart on top of Erasure coding, it will have a significant increase in Metadata consumption. Taking the same example of EC 2+1 with 81 multipart uploads.
  • Each multipart upload will be dual copied to the GRID if the ILM rule is dual commit or balanced. However, in balanced, if the GRID is able to create EC copies on time, it will do it else it will perform dual commit first and later will convert it into EC which is same as dual commit.
  • Each multipart is joined inside GRID and segmented based on the segemntation size (1GB default).
  • Therefore, the metadata consumption would be number of segments (after joining multiparts internally) * 3 (EC 2+1) metadata entries.
  • Metadata entries may vary if GRID does dual copy initially.

 

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.