Closed
Description
We use ByteSizeValue to represent settings and API parameters that are byte values. This allows the user to specify the values in a human readable form and Elasticsearch to interpret it in the units most convenient to what it wants to do with the value (e.g. the user specifies the index buffer size as "56MB"
but Elasticsearch converts this to 58720256 bytes and uses that value.
However, the current implementation has a few problems:
- The
toString()
method outputs fractional values (to 1 decimal place) preventing round trip parsing - Parsing is sensitive to double rounding errors
- The parsing always converts the provided value to bytes so can overflow which results in the value being Long.MAX_VALUE bytes for large values (in PB range)
- The unit is not serialised so on deserialisation the value is the bytes value
We should instead prevent fractional values from being used and store the value in the original units.
@jasontedor and I spoke about this an propose the following approach:
- In 6.x
- Deprecate fractional parsing
- Rename ByteSizeValue -> LegacyByteSizeValue
- Add
@Deprecated
to LegacyByteSizeValue - Introduce ByteSizeValue with desired behavior:
- Add a
getStringRep()
method which like TimeValue will output a String with the current value and unit - Throw an exception in the
parseByteSizeValue()
method if the input string contains a fractional value - Keep the original unit when parsing (e.g. 20MB is represented with units MB rather than converting it to bytes)
- Add a
- Use ByteSizeValue (the new one) in new APIs/settings
- In 7.0:
- Auto-upgrade cluster and index settings from LegacyByteSizeValue to ByteSizeValue (the new one) and remove LegacyByteSizeValue. Fractional bytes values in elasticsearch.yml will need to be manually changed by the user when upgrading to 7.0 or can be changed before upgrading.