-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Preserve integer-exact representation of numbers and allow conversion of floating point values to integers #128540
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@efd6 please enable the option "Allow edits and access to secrets by maintainers" on your PR. For more information, see the documentation. |
324e01d
to
6bcb33c
Compare
841b7ae
to
3bc17b7
Compare
modules/ingest-common/src/main/java/org/elasticsearch/ingest/common/ConvertProcessor.java
Show resolved
Hide resolved
modules/ingest-common/src/main/java/org/elasticsearch/ingest/common/ConvertProcessor.java
Outdated
Show resolved
Hide resolved
@@ -149,6 +157,24 @@ public static Type fromString(String processorTag, String propertyName, String t | |||
); | |||
} | |||
} | |||
|
|||
private static boolean isExactIntegerFloat(Object value) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this (and the other one) would be better as isExactLongFloat
? There's the mathematical notion of integers, of course, and then the actual java int
type, but since this is using long
for the work it seems like maybe Long
would be better in the name. This is just a thought, though -- feel free to push back on it. (For example, another satisfactory solution might be a docstring.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, "integer" here is the mathematical concept. That long
is used as the integer type to hold the value is an implementation detail that I don't think anyone should care about beyond the type being wider than the mantissa of the floating point type that's being checked (float
could use int
, but I kept them the same for simplicity).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want this to be 'can we losslessly convert the f.p. type to the integral type' then I think we need four methods, one for each of the combinations, don't we? If we have a value which can be losslessly converted to a long, but is out of range for an int, we want the LONG.convert
to succeed and INTEGER.convert
to fail, don't we?
Pinging @elastic/es-data-management (Team:Data Management) |
bc5fde8
to
d6072d8
Compare
This comment was marked as resolved.
This comment was marked as resolved.
Hi @efd6, I've created a changelog YAML for you. |
@@ -34,7 +34,7 @@ enum Type { | |||
@Override | |||
public Object convert(Object value) { | |||
try { | |||
String strValue = value.toString(); | |||
String strValue = (String) STRING.convert(value); | |||
if (strValue.startsWith("0x") || strValue.startsWith("-0x")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth noting that this (the existing code, strValue.startsWith("0x") || strValue.startsWith("-0x")
) incorrectly classifies hex float as hex integer and will fail if value is a string representing a hex floating point value. I imagine that this will affect ~0 people.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be fixed by doing something like also checking it doesn't contain either p
or P
, right? Although if you wanted to actually handle hex floats you'd also need to add support for that case.
9876f1e
to
076ca12
Compare
This comment was marked as resolved.
This comment was marked as resolved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
diff --git a/modules/ingest-common/src/main/java/org/elasticsearch/ingest/common/ConvertProcessor.java b/modules/ingest-common/src/main/java/org/elasticsearch/ingest/common/ConvertProcessor.java
index e67863be571..eb38fcb2b49 100644
--- a/modules/ingest-common/src/main/java/org/elasticsearch/ingest/common/ConvertProcessor.java
+++ b/modules/ingest-common/src/main/java/org/elasticsearch/ingest/common/ConvertProcessor.java
@@ -101,11 +101,17 @@ public final class ConvertProcessor extends AbstractProcessor {
STRING {
@Override
public Object convert(Object value) {
- if (isExactIntegerDouble(value)) {
- return String.valueOf(((Double) value).longValue());
+ if (value instanceof Float f) {
+ long l = f.longValue(); // narrowing conversion
+ if (f == l) { // widening conversion
+ return String.valueOf(l);
+ }
}
- if (isExactIntegerFloat(value)) {
- return String.valueOf(((Float) value).longValue());
+ if (value instanceof Double d) {
+ long l = d.longValue(); // narrowing conversion
+ if (d == l) { // widening conversion
+ return String.valueOf(l);
+ }
}
return value.toString();
}
@@ -154,24 +160,6 @@ public final class ConvertProcessor extends AbstractProcessor {
);
}
}
-
- private static boolean isExactIntegerFloat(Object value) {
- final float ABS_MAX_EXACT_FLOAT = (float) 0x1p24 - 1;
- if (value instanceof Float == false) {
- return false;
- }
- float v = ((Float) value).floatValue();
- return v == (long) v && -ABS_MAX_EXACT_FLOAT <= v && v <= ABS_MAX_EXACT_FLOAT;
- }
-
- private static boolean isExactIntegerDouble(Object value) {
- final double ABS_MAX_EXACT_DOUBLE = 0x1p53 - 1;
- if (value instanceof Double == false) {
- return false;
- }
- double v = ((Double) value).doubleValue();
- return v == (long) v && -ABS_MAX_EXACT_DOUBLE <= v && v <= ABS_MAX_EXACT_DOUBLE;
- }
}
public static final String TYPE = "convert";
This passes tests and seems a bit less magical to me. I don't think it would be significantly faster or significantly slower, I just think it's easier to understand.
What do you think?
That logic is not correct for Demonstrated in Go for my convenience: https://go.dev/play/p/GKBDf1wgy92
The reason for the conservatism in the inequality is that |
… to string This differentiates floating point values that are within the range of exact integer correspondence and renders them as integers rather than using the default java toString method which formats floating point values in E-notation when the absolute value is at least 1e6.
This was introduced from testConvertDouble where it is also unused.
a23454b
to
e4c0875
Compare
Oh, yep, I see it now! |
@@ -0,0 +1,6 @@ | |||
pr: 128540 | |||
summary: Preserve integer-exact representation of numbers and allow conversion of | |||
floating point values to integers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This summary is missing context, it needs to say "in the convert
ingest processor" or something like that.
@@ -34,7 +34,7 @@ enum Type { | |||
@Override | |||
public Object convert(Object value) { | |||
try { | |||
String strValue = value.toString(); | |||
String strValue = (String) STRING.convert(value); | |||
if (strValue.startsWith("0x") || strValue.startsWith("-0x")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be fixed by doing something like also checking it doesn't contain either p
or P
, right? Although if you wanted to actually handle hex floats you'd also need to add support for that case.
} | ||
if (isExactIntegerFloat(value)) { | ||
return String.valueOf(((Float) value).longValue()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer not to make this change.
My reasoning is that this is altering the behaviour of the convert
processor when the source field is a float
or double
and the target type is string
— it will produce 123
where before it would have produced 123.0
— and that feels like a breaking change, or dangerously close to one.
I believe that thing you're actually trying to do is to make the float
or double
to integer
or long
succeed where it would previously have failed, which is a much safer thing to do. The change in the float
or double
to string
conversion was only ever a side-effect of the way you'd implemented it — and, in my view, an undesirable one.
I think my preferred approach to achieve your goal (as I understand it) would be to make INTEGER.convert()
and LONG.convert()
do explicit special-casing where they make these two checks before falling back to calling toString()
like they did before. This also has the advantage that, in the case where we know we e.g. have a double
and want a long
, we can just do ((Double) value).longValue()
and skip converting it to a string and back again.
@@ -149,6 +154,24 @@ public static Type fromString(String processorTag, String propertyName, String t | |||
); | |||
} | |||
} | |||
|
|||
private static boolean isExactIntegerFloat(Object value) { | |||
final float ABS_MAX_EXACT_FLOAT = (float) 0x1p24 - 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're going to do stuff like this, please make this a class-level constant rather than a local variable which has to be allocated on the stack and initialized every time the method is run.
Also, you can do 0x1p24f
rather than casting to float.
However, see below...
if (value instanceof Double == false) { | ||
return false; | ||
} | ||
double v = ((Double) value).doubleValue(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no need to call doubleValue()
here, it will automatically unbox the Double
to a double
.
} | ||
double v = ((Double) value).doubleValue(); | ||
return v == (long) v && -ABS_MAX_EXACT_DOUBLE <= v && v <= ABS_MAX_EXACT_DOUBLE; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that the most efficient way to check whether a double is a (mathematical) integer is:
- Return true if it
v == 0.0
. (N.B. This will succeed for -0.0 as well as==
is true there.) - Define
int exponent = Math.getExponent(v)
. - Return false if
exponent > Double.MAX_EXPONENT
, which eliminatesPOSITIVE_INFINITY
,NEGATIVE_INFINITY
, andNaN
. - Return false if
exponent < Double.MIN_EXPONENT
, which eliminates subnormal values. - Define
significand = (Double.doubleToLongBits(v) & SIGNIFICAND_MASK) | IMPLICIT_BIT
, whereSIGNIFICAND_MASK = 0x000fffffffffffffL
andIMPLICIT_BIT = SIGNIFICAND_MASK + 1
. (This is only true for normal values, but we've already checked for normality.) - Return
SIGNIFICAND_BITS - Long.numberOfTrailingZeros(significand) <= exponent
whereSIGNIFICAND_BITS = 52
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'll still need to do a check that the value is within bounds for the target integral type on top of the above, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should test this thoroughly. You're just hard-coding values of 1e7 or 1e9 right now. I'd like to see a lot more than that. It may make sense to give this method package visibility and test it directly, rather than doing all of that through the public API (though others may disagree with that approach).
To take the double-to-int case, for example, I would maybe test with inputs of (double) randomInt()
, 0.0, -0.0, a few integers such as 1.0, a few non-integers such as 0.1, 1.1, some negative values, Double.POSITIVE_INFINITY
, NEGATIVE_INFINITY
, NaN
, MIN_VALUE
, MIN_NORMAL
, (double) Integer.MAX_VALUE
, (double) Integer.MAX_VALUE + 1
, (double) Integer.MIN_VALUE
, (double) Integer.MIN_VALUE - 1
... and so on. (I have deliberately chosen the easiest case there. The other cases are going to be harder because, for example, Long.MAX_VALUE
is not representable as an exact double, so (double) Long.MAX_VALUE
is problematic... You can do tricks based around the fact that 0x1p63
is exactly Long.MAX_VALUE + 1
, and you can use Math.nextUp
and nextDown
to get neighbouring values and stuff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear on that last point: I think that the conversion of 0x1p63
to a long should fail, because it is (just) out of range. I think that the conversion of Math.nextDown(0x1p63)
should succeed, because it is an integer (in this range, all doubles are integers, because double has only 53-bit precision) and it is in range. (Please check my reasoning there, though!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Return false if
exponent < Double.MIN_EXPONENT
, which eliminates subnormal values.
Actually, we can return false if exponent < 0
, right? That eliminates all values with absolute value less than one. (This stricter check is not required for correctness, but it will make it slightly faster in this case: there is no way that the final SIGNIFICAND_BITS - Long.numberOfTrailingZeros(significand) <= exponent
can be true if exponent < 0
so we're just short-circuiting the computation of the significand.)
@@ -149,6 +157,24 @@ public static Type fromString(String processorTag, String propertyName, String t | |||
); | |||
} | |||
} | |||
|
|||
private static boolean isExactIntegerFloat(Object value) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want this to be 'can we losslessly convert the f.p. type to the integral type' then I think we need four methods, one for each of the combinations, don't we? If we have a value which can be losslessly converted to a long, but is out of range for an int, we want the LONG.convert
to succeed and INTEGER.convert
to fail, don't we?
This differentiates integral floating point values that are within the range of exact integer correspondence and renders them as integers rather than using the default java toString method which formats floating point values in E-notation when the absolute value is at least 1e7.
The new formatting is used as the input string for the INTEGER and LONG convert methods enabling conversion from floating point to integer numbers.
ref: elastic/beats#43659
gradle check
?If you are submitting this code for a class then read our policy for that.Questions
Currently only successful
convert
operations are tested for the new change. Do we want tests for long/int converts from floating point inputs that have fractional parts or are outside the exact integer representation range? These are expected (intended) to error out. For the string convert from the same floating point values, we just expect the string to either include a decimal or be in E-notation respectively.