Why is metadata consumption abnormally high when compared to the actual object ingest with EC scheme and Multipart uploads?
Applies to
- StorageGRID 11.6
- ILM rule as Erasure Coding
- Multipart Uploads
Answer
Metadata entries does not co-relate with the size of the objects ingested to the GRID as they are information about the object. Each metadata entry takes up similar size / space in Cassandra database.
The Metadata space utilization can be high if there is an ILM rule performing Erasure Coding (EC). Taking example of EC 2+1:
- Each object is broken into 2 parts of same size and 1 parity object is generated. i.e. 10 MB object will be 5MB + 5MB part and 1 parity which will take space as per disk space =
object size + (object size * storage overhead)
formula. - This will create 3 metadata entries in Cassandra as each part is treated as a separate object in GRID.
- This causes each object upload to take x3 times metadata entries.
While doing multipart on top of Erasure coding, it will have a significant increase in Metadata consumption. Taking the same example of EC 2+1 with 81 multipart uploads.
- Each multipart upload will be dual copied to the GRID if the ILM rule is dual commit or balanced. However, in balanced, if the GRID is able to create EC copies on time, it will do it else it will perform dual commit first and later will convert it into EC which is same as dual commit.
- Each multipart is joined inside GRID and segmented based on the segemntation size (1GB default).
- Therefore, the metadata consumption would be
number of segments (after joining multiparts internally) * 3 (EC 2+1)
metadata entries. - Metadata entries may vary if GRID does dual copy initially.