Skip to content

Preserve integer-exact representation of numbers and allow conversion of floating point values to integers #128540

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

efd6
Copy link

@efd6 efd6 commented May 28, 2025

This differentiates integral floating point values that are within the range of exact integer correspondence and renders them as integers rather than using the default java toString method which formats floating point values in E-notation when the absolute value is at least 1e7.

The new formatting is used as the input string for the INTEGER and LONG convert methods enabling conversion from floating point to integer numbers.

ref: elastic/beats#43659

  • Have you signed the contributor license agreement? ✅
  • Have you followed the contributor guidelines? ✅
  • If submitting code, have you built your formula locally prior to submission with gradle check?
  • If submitting code, is your pull request against main? Unless there is a good reason otherwise, we prefer pull requests against main and will backport as needed. ✅
  • If submitting code, have you checked that your submission is for an OS and architecture that we support? ✅
  • If you are submitting this code for a class then read our policy for that.

Questions

Currently only successful convert operations are tested for the new change. Do we want tests for long/int converts from floating point inputs that have fractional parts or are outside the exact integer representation range? These are expected (intended) to error out. For the string convert from the same floating point values, we just expect the string to either include a decimal or be in E-notation respectively.

@efd6 efd6 self-assigned this May 28, 2025
@elasticsearchmachine
Copy link
Collaborator

@efd6 please enable the option "Allow edits and access to secrets by maintainers" on your PR. For more information, see the documentation.

@elasticsearchmachine elasticsearchmachine added v9.1.0 external-contributor Pull request authored by a developer outside the Elasticsearch team labels May 28, 2025
@efd6 efd6 force-pushed the integer_precise_convert branch 9 times, most recently from 324e01d to 6bcb33c Compare May 28, 2025 07:40
@efd6 efd6 changed the title Preserve integer-exact representation of numbers and allow conversion of floating point values to integers Preserve integer-exact representation of numbers in convert processor to string May 28, 2025
@efd6 efd6 force-pushed the integer_precise_convert branch 2 times, most recently from 841b7ae to 3bc17b7 Compare May 28, 2025 11:03
@efd6 efd6 marked this pull request as ready for review May 28, 2025 12:13
@elasticsearchmachine elasticsearchmachine added the needs:triage Requires assignment of a team area label label May 28, 2025
@efd6 efd6 requested a review from joegallo May 28, 2025 12:16
@@ -149,6 +157,24 @@ public static Type fromString(String processorTag, String propertyName, String t
);
}
}

private static boolean isExactIntegerFloat(Object value) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this (and the other one) would be better as isExactLongFloat? There's the mathematical notion of integers, of course, and then the actual java int type, but since this is using long for the work it seems like maybe Long would be better in the name. This is just a thought, though -- feel free to push back on it. (For example, another satisfactory solution might be a docstring.)

Copy link
Author

@efd6 efd6 May 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, "integer" here is the mathematical concept. That long is used as the integer type to hold the value is an implementation detail that I don't think anyone should care about beyond the type being wider than the mantissa of the floating point type that's being checked (float could use int, but I kept them the same for simplicity).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want this to be 'can we losslessly convert the f.p. type to the integral type' then I think we need four methods, one for each of the combinations, don't we? If we have a value which can be losslessly converted to a long, but is out of range for an int, we want the LONG.convert to succeed and INTEGER.convert to fail, don't we?

@joegallo joegallo added :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP v8.19.0 labels May 28, 2025
@joegallo joegallo self-assigned this May 28, 2025
@joegallo joegallo added the Team:Data Management Meta label for data/management team label May 28, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@elasticsearchmachine elasticsearchmachine removed the needs:triage Requires assignment of a team area label label May 28, 2025
@efd6 efd6 force-pushed the integer_precise_convert branch from bc5fde8 to d6072d8 Compare May 28, 2025 21:53
@efd6

This comment was marked as resolved.

@efd6 efd6 requested a review from joegallo May 29, 2025 00:30
@efd6 efd6 changed the title Preserve integer-exact representation of numbers in convert processor to string Preserve integer-exact representation of numbers and allow conversion of floating point values to integers May 29, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @efd6, I've created a changelog YAML for you.

@@ -34,7 +34,7 @@ enum Type {
@Override
public Object convert(Object value) {
try {
String strValue = value.toString();
String strValue = (String) STRING.convert(value);
if (strValue.startsWith("0x") || strValue.startsWith("-0x")) {
Copy link
Author

@efd6 efd6 May 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth noting that this (the existing code, strValue.startsWith("0x") || strValue.startsWith("-0x")) incorrectly classifies hex float as hex integer and will fail if value is a string representing a hex floating point value. I imagine that this will affect ~0 people.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be fixed by doing something like also checking it doesn't contain either p or P, right? Although if you wanted to actually handle hex floats you'd also need to add support for that case.

@efd6

This comment was marked as resolved.

Copy link
Contributor

@joegallo joegallo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

diff --git a/modules/ingest-common/src/main/java/org/elasticsearch/ingest/common/ConvertProcessor.java b/modules/ingest-common/src/main/java/org/elasticsearch/ingest/common/ConvertProcessor.java
index e67863be571..eb38fcb2b49 100644
--- a/modules/ingest-common/src/main/java/org/elasticsearch/ingest/common/ConvertProcessor.java
+++ b/modules/ingest-common/src/main/java/org/elasticsearch/ingest/common/ConvertProcessor.java
@@ -101,11 +101,17 @@ public final class ConvertProcessor extends AbstractProcessor {
         STRING {
             @Override
             public Object convert(Object value) {
-                if (isExactIntegerDouble(value)) {
-                    return String.valueOf(((Double) value).longValue());
+                if (value instanceof Float f) {
+                    long l = f.longValue(); // narrowing conversion
+                    if (f == l) { // widening conversion
+                        return String.valueOf(l);
+                    }
                 }
-                if (isExactIntegerFloat(value)) {
-                    return String.valueOf(((Float) value).longValue());
+                if (value instanceof Double d) {
+                    long l = d.longValue(); // narrowing conversion
+                    if (d == l) { // widening conversion
+                        return String.valueOf(l);
+                    }
                 }
                 return value.toString();
             }
@@ -154,24 +160,6 @@ public final class ConvertProcessor extends AbstractProcessor {
                 );
             }
         }
-
-        private static boolean isExactIntegerFloat(Object value) {
-            final float ABS_MAX_EXACT_FLOAT = (float) 0x1p24 - 1;
-            if (value instanceof Float == false) {
-                return false;
-            }
-            float v = ((Float) value).floatValue();
-            return v == (long) v && -ABS_MAX_EXACT_FLOAT <= v && v <= ABS_MAX_EXACT_FLOAT;
-        }
-
-        private static boolean isExactIntegerDouble(Object value) {
-            final double ABS_MAX_EXACT_DOUBLE = 0x1p53 - 1;
-            if (value instanceof Double == false) {
-                return false;
-            }
-            double v = ((Double) value).doubleValue();
-            return v == (long) v && -ABS_MAX_EXACT_DOUBLE <= v && v <= ABS_MAX_EXACT_DOUBLE;
-        }
     }
 
     public static final String TYPE = "convert";

This passes tests and seems a bit less magical to me. I don't think it would be significantly faster or significantly slower, I just think it's easier to understand.

What do you think?

@kortschak
Copy link

kortschak commented May 30, 2025

That logic is not correct for value above the exact integer representation cut over. It incorrectly assesses that values are equal above the threshold; they may be equal, but the expectation for the majority of values is that they will not be. Incorrectly assessing them as equal is a data corruption behaviour in this setting.

Demonstrated in Go for my convenience: https://go.dev/play/p/GKBDf1wgy92

package main

import "fmt"

func main() {
	for _, v := range []int64{
		int64(0x1p52),
		int64(0x1p53),
		int64(0x1p53) + 1,
		int64(0x1p54) + 1,
	} {

		f := float64(v)
		l := int64(f)

		fmt.Println(v)
		fmt.Println("\tv == l", v == l)
		fmt.Println("\tfloat64(l) == f", float64(l) == f)
		fmt.Println("\tl == uint64(f)", l == int64(f))
		fmt.Println("\tisIntegerExact(f)", isIntegerExact(f))
	}
}

func isIntegerExact(f float64) bool {
	const ABS_MAX_EXACT_DOUBLE = 0x1p53 - 1
	return f == float64(int64(f)) && -ABS_MAX_EXACT_DOUBLE <= f && f <= ABS_MAX_EXACT_DOUBLE
}
4503599627370496
	v == l true
	float64(l) == f true
	l == uint64(f) true
	isIntegerExact(f) true
9007199254740992
	v == l true
	float64(l) == f true
	l == uint64(f) true
	isIntegerExact(f) false
9007199254740993
	v == l false
	float64(l) == f true
	l == uint64(f) true
	isIntegerExact(f) false
18014398509481985
	v == l false
	float64(l) == f true
	l == uint64(f) true
	isIntegerExact(f) false

The reason for the conservatism in the inequality is that float64(int64(0x1p53)) == float64(int64(0x1p53) + 1), so we cannot know that 0x1p53 is not actually the next integer value.

efd6 added 7 commits June 2, 2025 06:10
… to string

This differentiates floating point values that are within the range of
exact integer correspondence and renders them as integers rather than
using the default java toString method which formats floating point
values in E-notation when the absolute value is at least 1e6.
This was introduced from testConvertDouble where it is also unused.
@joegallo
Copy link
Contributor

joegallo commented Jun 4, 2025

That logic is not correct for value above the exact integer representation cut over.

Oh, yep, I see it now!

@@ -0,0 +1,6 @@
pr: 128540
summary: Preserve integer-exact representation of numbers and allow conversion of
floating point values to integers
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This summary is missing context, it needs to say "in the convert ingest processor" or something like that.

@@ -34,7 +34,7 @@ enum Type {
@Override
public Object convert(Object value) {
try {
String strValue = value.toString();
String strValue = (String) STRING.convert(value);
if (strValue.startsWith("0x") || strValue.startsWith("-0x")) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be fixed by doing something like also checking it doesn't contain either p or P, right? Although if you wanted to actually handle hex floats you'd also need to add support for that case.

}
if (isExactIntegerFloat(value)) {
return String.valueOf(((Float) value).longValue());
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer not to make this change.

My reasoning is that this is altering the behaviour of the convert processor when the source field is a float or double and the target type is string — it will produce 123 where before it would have produced 123.0 — and that feels like a breaking change, or dangerously close to one.

I believe that thing you're actually trying to do is to make the float or double to integer or long succeed where it would previously have failed, which is a much safer thing to do. The change in the float or double to string conversion was only ever a side-effect of the way you'd implemented it — and, in my view, an undesirable one.

I think my preferred approach to achieve your goal (as I understand it) would be to make INTEGER.convert() and LONG.convert() do explicit special-casing where they make these two checks before falling back to calling toString() like they did before. This also has the advantage that, in the case where we know we e.g. have a double and want a long, we can just do ((Double) value).longValue() and skip converting it to a string and back again.

@@ -149,6 +154,24 @@ public static Type fromString(String processorTag, String propertyName, String t
);
}
}

private static boolean isExactIntegerFloat(Object value) {
final float ABS_MAX_EXACT_FLOAT = (float) 0x1p24 - 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're going to do stuff like this, please make this a class-level constant rather than a local variable which has to be allocated on the stack and initialized every time the method is run.

Also, you can do 0x1p24f rather than casting to float.

However, see below...

if (value instanceof Double == false) {
return false;
}
double v = ((Double) value).doubleValue();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no need to call doubleValue() here, it will automatically unbox the Double to a double.

}
double v = ((Double) value).doubleValue();
return v == (long) v && -ABS_MAX_EXACT_DOUBLE <= v && v <= ABS_MAX_EXACT_DOUBLE;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that the most efficient way to check whether a double is a (mathematical) integer is:

  • Return true if it v == 0.0. (N.B. This will succeed for -0.0 as well as == is true there.)
  • Define int exponent = Math.getExponent(v).
  • Return false if exponent > Double.MAX_EXPONENT, which eliminates POSITIVE_INFINITY, NEGATIVE_INFINITY, and NaN.
  • Return false if exponent < Double.MIN_EXPONENT, which eliminates subnormal values.
  • Define significand = (Double.doubleToLongBits(v) & SIGNIFICAND_MASK) | IMPLICIT_BIT, where SIGNIFICAND_MASK = 0x000fffffffffffffL and IMPLICIT_BIT = SIGNIFICAND_MASK + 1. (This is only true for normal values, but we've already checked for normality.)
  • Return SIGNIFICAND_BITS - Long.numberOfTrailingZeros(significand) <= exponent where SIGNIFICAND_BITS = 52.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll still need to do a check that the value is within bounds for the target integral type on top of the above, I think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should test this thoroughly. You're just hard-coding values of 1e7 or 1e9 right now. I'd like to see a lot more than that. It may make sense to give this method package visibility and test it directly, rather than doing all of that through the public API (though others may disagree with that approach).

To take the double-to-int case, for example, I would maybe test with inputs of (double) randomInt(), 0.0, -0.0, a few integers such as 1.0, a few non-integers such as 0.1, 1.1, some negative values, Double.POSITIVE_INFINITY, NEGATIVE_INFINITY, NaN, MIN_VALUE, MIN_NORMAL, (double) Integer.MAX_VALUE, (double) Integer.MAX_VALUE + 1, (double) Integer.MIN_VALUE, (double) Integer.MIN_VALUE - 1... and so on. (I have deliberately chosen the easiest case there. The other cases are going to be harder because, for example, Long.MAX_VALUE is not representable as an exact double, so (double) Long.MAX_VALUE is problematic... You can do tricks based around the fact that 0x1p63 is exactly Long.MAX_VALUE + 1, and you can use Math.nextUp and nextDown to get neighbouring values and stuff.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear on that last point: I think that the conversion of 0x1p63 to a long should fail, because it is (just) out of range. I think that the conversion of Math.nextDown(0x1p63) should succeed, because it is an integer (in this range, all doubles are integers, because double has only 53-bit precision) and it is in range. (Please check my reasoning there, though!)

Copy link
Member

@PeteGillinElastic PeteGillinElastic Jun 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Return false if exponent < Double.MIN_EXPONENT, which eliminates subnormal values.

Actually, we can return false if exponent < 0, right? That eliminates all values with absolute value less than one. (This stricter check is not required for correctness, but it will make it slightly faster in this case: there is no way that the final SIGNIFICAND_BITS - Long.numberOfTrailingZeros(significand) <= exponent can be true if exponent < 0 so we're just short-circuiting the computation of the significand.)

@@ -149,6 +157,24 @@ public static Type fromString(String processorTag, String propertyName, String t
);
}
}

private static boolean isExactIntegerFloat(Object value) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want this to be 'can we losslessly convert the f.p. type to the integral type' then I think we need four methods, one for each of the combinations, don't we? If we have a value which can be losslessly converted to a long, but is out of range for an int, we want the LONG.convert to succeed and INTEGER.convert to fail, don't we?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP >enhancement external-contributor Pull request authored by a developer outside the Elasticsearch team Team:Data Management Meta label for data/management team v8.19.0 v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants