Preserve integer-exact representation of numbers and allow conversion of floating point values to integers #128540

efd6 · 2025-05-28T00:05:33Z

This differentiates integral floating point values that are within the range of exact integer correspondence and renders them as integers rather than using the default java toString method which formats floating point values in E-notation when the absolute value is at least 1e7.

The new formatting is used as the input string for the INTEGER and LONG convert methods enabling conversion from floating point to integer numbers.

ref: elastic/beats#43659

Have you signed the contributor license agreement? ✅
Have you followed the contributor guidelines? ✅
If submitting code, have you built your formula locally prior to submission with gradle check?
If submitting code, is your pull request against main? Unless there is a good reason otherwise, we prefer pull requests against main and will backport as needed. ✅
If submitting code, have you checked that your submission is for an OS and architecture that we support? ✅
~~If you are submitting this code for a class then read our policy for that.~~

Questions

Currently only successful convert operations are tested for the new change. Do we want tests for long/int converts from floating point inputs that have fractional parts or are outside the exact integer representation range? These are expected (intended) to error out. For the string convert from the same floating point values, we just expect the string to either include a decimal or be in E-notation respectively.

elasticsearchmachine · 2025-05-28T00:06:22Z

@efd6 please enable the option "Allow edits and access to secrets by maintainers" on your PR. For more information, see the documentation.

modules/ingest-common/src/main/java/org/elasticsearch/ingest/common/ConvertProcessor.java

joegallo · 2025-05-28T13:44:37Z

modules/ingest-common/src/main/java/org/elasticsearch/ingest/common/ConvertProcessor.java

@@ -149,6 +157,24 @@ public static Type fromString(String processorTag, String propertyName, String t
                );
            }
        }
+
+        private static boolean isExactIntegerFloat(Object value) {


Maybe this (and the other one) would be better as isExactLongFloat? There's the mathematical notion of integers, of course, and then the actual java int type, but since this is using long for the work it seems like maybe Long would be better in the name. This is just a thought, though -- feel free to push back on it. (For example, another satisfactory solution might be a docstring.)

Yes, "integer" here is the mathematical concept. That long is used as the integer type to hold the value is an implementation detail that I don't think anyone should care about beyond the type being wider than the mantissa of the floating point type that's being checked (float could use int, but I kept them the same for simplicity).

If we want this to be 'can we losslessly convert the f.p. type to the integral type' then I think we need four methods, one for each of the combinations, don't we? If we have a value which can be losslessly converted to a long, but is out of range for an int, we want the LONG.convert to succeed and INTEGER.convert to fail, don't we?

elasticsearchmachine · 2025-05-28T13:47:53Z

Pinging @elastic/es-data-management (Team:Data Management)

elasticsearchmachine · 2025-05-29T03:04:42Z

Hi @efd6, I've created a changelog YAML for you.

efd6 · 2025-05-29T03:08:17Z

modules/ingest-common/src/main/java/org/elasticsearch/ingest/common/ConvertProcessor.java

@@ -34,7 +34,7 @@ enum Type {
            @Override
            public Object convert(Object value) {
                try {
-                    String strValue = value.toString();
+                    String strValue = (String) STRING.convert(value);
                    if (strValue.startsWith("0x") || strValue.startsWith("-0x")) {


Worth noting that this (the existing code, strValue.startsWith("0x") || strValue.startsWith("-0x")) incorrectly classifies hex float as hex integer and will fail if value is a string representing a hex floating point value. I imagine that this will affect ~0 people.

This could be fixed by doing something like also checking it doesn't contain either p or P, right? Although if you wanted to actually handle hex floats you'd also need to add support for that case.

joegallo

diff --git a/modules/ingest-common/src/main/java/org/elasticsearch/ingest/common/ConvertProcessor.java b/modules/ingest-common/src/main/java/org/elasticsearch/ingest/common/ConvertProcessor.java
index e67863be571..eb38fcb2b49 100644
--- a/modules/ingest-common/src/main/java/org/elasticsearch/ingest/common/ConvertProcessor.java
+++ b/modules/ingest-common/src/main/java/org/elasticsearch/ingest/common/ConvertProcessor.java
@@ -101,11 +101,17 @@ public final class ConvertProcessor extends AbstractProcessor {
         STRING {
             @Override
             public Object convert(Object value) {
-                if (isExactIntegerDouble(value)) {
-                    return String.valueOf(((Double) value).longValue());
+                if (value instanceof Float f) {
+                    long l = f.longValue(); // narrowing conversion
+                    if (f == l) { // widening conversion
+                        return String.valueOf(l);
+                    }
                 }
-                if (isExactIntegerFloat(value)) {
-                    return String.valueOf(((Float) value).longValue());
+                if (value instanceof Double d) {
+                    long l = d.longValue(); // narrowing conversion
+                    if (d == l) { // widening conversion
+                        return String.valueOf(l);
+                    }
                 }
                 return value.toString();
             }
@@ -154,24 +160,6 @@ public final class ConvertProcessor extends AbstractProcessor {
                 );
             }
         }
-
-        private static boolean isExactIntegerFloat(Object value) {
-            final float ABS_MAX_EXACT_FLOAT = (float) 0x1p24 - 1;
-            if (value instanceof Float == false) {
-                return false;
-            }
-            float v = ((Float) value).floatValue();
-            return v == (long) v && -ABS_MAX_EXACT_FLOAT <= v && v <= ABS_MAX_EXACT_FLOAT;
-        }
-
-        private static boolean isExactIntegerDouble(Object value) {
-            final double ABS_MAX_EXACT_DOUBLE = 0x1p53 - 1;
-            if (value instanceof Double == false) {
-                return false;
-            }
-            double v = ((Double) value).doubleValue();
-            return v == (long) v && -ABS_MAX_EXACT_DOUBLE <= v && v <= ABS_MAX_EXACT_DOUBLE;
-        }
     }
 
     public static final String TYPE = "convert";

This passes tests and seems a bit less magical to me. I don't think it would be significantly faster or significantly slower, I just think it's easier to understand.

What do you think?

kortschak · 2025-05-30T21:47:05Z

That logic is not correct for value above the exact integer representation cut over. It incorrectly assesses that values are equal above the threshold; they may be equal, but the expectation for the majority of values is that they will not be. Incorrectly assessing them as equal is a data corruption behaviour in this setting.

Demonstrated in Go for my convenience: https://go.dev/play/p/GKBDf1wgy92

package main

import "fmt"

func main() {
	for _, v := range []int64{
		int64(0x1p52),
		int64(0x1p53),
		int64(0x1p53) + 1,
		int64(0x1p54) + 1,
	} {

		f := float64(v)
		l := int64(f)

		fmt.Println(v)
		fmt.Println("\tv == l", v == l)
		fmt.Println("\tfloat64(l) == f", float64(l) == f)
		fmt.Println("\tl == uint64(f)", l == int64(f))
		fmt.Println("\tisIntegerExact(f)", isIntegerExact(f))
	}
}

func isIntegerExact(f float64) bool {
	const ABS_MAX_EXACT_DOUBLE = 0x1p53 - 1
	return f == float64(int64(f)) && -ABS_MAX_EXACT_DOUBLE <= f && f <= ABS_MAX_EXACT_DOUBLE
}

4503599627370496
	v == l true
	float64(l) == f true
	l == uint64(f) true
	isIntegerExact(f) true
9007199254740992
	v == l true
	float64(l) == f true
	l == uint64(f) true
	isIntegerExact(f) false
9007199254740993
	v == l false
	float64(l) == f true
	l == uint64(f) true
	isIntegerExact(f) false
18014398509481985
	v == l false
	float64(l) == f true
	l == uint64(f) true
	isIntegerExact(f) false

The reason for the conservatism in the inequality is that float64(int64(0x1p53)) == float64(int64(0x1p53) + 1), so we cannot know that 0x1p53 is not actually the next integer value.

… to string This differentiates floating point values that are within the range of exact integer correspondence and renders them as integers rather than using the default java toString method which formats floating point values in E-notation when the absolute value is at least 1e6.

This was introduced from testConvertDouble where it is also unused.

…vert

joegallo · 2025-06-04T21:34:55Z

That logic is not correct for value above the exact integer representation cut over.

Oh, yep, I see it now!

PeteGillinElastic · 2025-06-05T16:04:03Z

docs/changelog/128540.yaml

@@ -0,0 +1,6 @@
+pr: 128540
+summary: Preserve integer-exact representation of numbers and allow conversion of
+  floating point values to integers


This summary is missing context, it needs to say "in the convert ingest processor" or something like that.

PeteGillinElastic · 2025-06-05T16:08:48Z

modules/ingest-common/src/main/java/org/elasticsearch/ingest/common/ConvertProcessor.java

@@ -34,7 +34,7 @@ enum Type {
            @Override
            public Object convert(Object value) {
                try {
-                    String strValue = value.toString();
+                    String strValue = (String) STRING.convert(value);
                    if (strValue.startsWith("0x") || strValue.startsWith("-0x")) {


This could be fixed by doing something like also checking it doesn't contain either p or P, right? Although if you wanted to actually handle hex floats you'd also need to add support for that case.

PeteGillinElastic · 2025-06-05T16:39:48Z

modules/ingest-common/src/main/java/org/elasticsearch/ingest/common/ConvertProcessor.java

+                }
+                if (isExactIntegerFloat(value)) {
+                    return String.valueOf(((Float) value).longValue());
+                }


I would prefer not to make this change.

My reasoning is that this is altering the behaviour of the convert processor when the source field is a float or double and the target type is string — it will produce 123 where before it would have produced 123.0 — and that feels like a breaking change, or dangerously close to one.

I believe that thing you're actually trying to do is to make the float or double to integer or long succeed where it would previously have failed, which is a much safer thing to do. The change in the float or double to string conversion was only ever a side-effect of the way you'd implemented it — and, in my view, an undesirable one.

I think my preferred approach to achieve your goal (as I understand it) would be to make INTEGER.convert() and LONG.convert() do explicit special-casing where they make these two checks before falling back to calling toString() like they did before. This also has the advantage that, in the case where we know we e.g. have a double and want a long, we can just do ((Double) value).longValue() and skip converting it to a string and back again.

PeteGillinElastic · 2025-06-05T16:52:53Z

modules/ingest-common/src/main/java/org/elasticsearch/ingest/common/ConvertProcessor.java

@@ -149,6 +154,24 @@ public static Type fromString(String processorTag, String propertyName, String t
                );
            }
        }
+
+        private static boolean isExactIntegerFloat(Object value) {
+            final float ABS_MAX_EXACT_FLOAT = (float) 0x1p24 - 1;


If you're going to do stuff like this, please make this a class-level constant rather than a local variable which has to be allocated on the stack and initialized every time the method is run.

Also, you can do 0x1p24f rather than casting to float.

However, see below...

PeteGillinElastic · 2025-06-05T16:53:48Z

modules/ingest-common/src/main/java/org/elasticsearch/ingest/common/ConvertProcessor.java

+            if (value instanceof Double == false) {
+                return false;
+            }
+            double v = ((Double) value).doubleValue();


There's no need to call doubleValue() here, it will automatically unbox the Double to a double.

PeteGillinElastic · 2025-06-05T17:17:26Z

modules/ingest-common/src/main/java/org/elasticsearch/ingest/common/ConvertProcessor.java

+            }
+            double v = ((Double) value).doubleValue();
+            return v == (long) v && -ABS_MAX_EXACT_DOUBLE <= v && v <= ABS_MAX_EXACT_DOUBLE;
+        }


I believe that the most efficient way to check whether a double is a (mathematical) integer is:

Return true if it v == 0.0. (N.B. This will succeed for -0.0 as well as == is true there.)

Define int exponent = Math.getExponent(v).

Return false if exponent > Double.MAX_EXPONENT, which eliminates POSITIVE_INFINITY, NEGATIVE_INFINITY, and NaN.

Return false if exponent < Double.MIN_EXPONENT, which eliminates subnormal values.

Define significand = (Double.doubleToLongBits(v) & SIGNIFICAND_MASK) | IMPLICIT_BIT, where SIGNIFICAND_MASK = 0x000fffffffffffffL and IMPLICIT_BIT = SIGNIFICAND_MASK + 1. (This is only true for normal values, but we've already checked for normality.)

Return SIGNIFICAND_BITS - Long.numberOfTrailingZeros(significand) <= exponent where SIGNIFICAND_BITS = 52.

You'll still need to do a check that the value is within bounds for the target integral type on top of the above, I think.

We should test this thoroughly. You're just hard-coding values of 1e7 or 1e9 right now. I'd like to see a lot more than that. It may make sense to give this method package visibility and test it directly, rather than doing all of that through the public API (though others may disagree with that approach).

To take the double-to-int case, for example, I would maybe test with inputs of (double) randomInt(), 0.0, -0.0, a few integers such as 1.0, a few non-integers such as 0.1, 1.1, some negative values, Double.POSITIVE_INFINITY, NEGATIVE_INFINITY, NaN, MIN_VALUE, MIN_NORMAL, (double) Integer.MAX_VALUE, (double) Integer.MAX_VALUE + 1, (double) Integer.MIN_VALUE, (double) Integer.MIN_VALUE - 1... and so on. (I have deliberately chosen the easiest case there. The other cases are going to be harder because, for example, Long.MAX_VALUE is not representable as an exact double, so (double) Long.MAX_VALUE is problematic... You can do tricks based around the fact that 0x1p63 is exactly Long.MAX_VALUE + 1, and you can use Math.nextUp and nextDown to get neighbouring values and stuff.

To be clear on that last point: I think that the conversion of 0x1p63 to a long should fail, because it is (just) out of range. I think that the conversion of Math.nextDown(0x1p63) should succeed, because it is an integer (in this range, all doubles are integers, because double has only 53-bit precision) and it is in range. (Please check my reasoning there, though!)

Return false if exponent < Double.MIN_EXPONENT, which eliminates subnormal values.

Actually, we can return false if exponent < 0, right? That eliminates all values with absolute value less than one. (This stricter check is not required for correctness, but it will make it slightly faster in this case: there is no way that the final SIGNIFICAND_BITS - Long.numberOfTrailingZeros(significand) <= exponent can be true if exponent < 0 so we're just short-circuiting the computation of the significand.)

PeteGillinElastic · 2025-06-05T17:33:55Z

modules/ingest-common/src/main/java/org/elasticsearch/ingest/common/ConvertProcessor.java

@@ -149,6 +157,24 @@ public static Type fromString(String processorTag, String propertyName, String t
                );
            }
        }
+
+        private static boolean isExactIntegerFloat(Object value) {


If we want this to be 'can we losslessly convert the f.p. type to the integral type' then I think we need four methods, one for each of the combinations, don't we? If we have a value which can be losslessly converted to a long, but is out of range for an int, we want the LONG.convert to succeed and INTEGER.convert to fail, don't we?

efd6 self-assigned this May 28, 2025

elasticsearchmachine added v9.1.0 external-contributor Pull request authored by a developer outside the Elasticsearch team labels May 28, 2025

efd6 force-pushed the integer_precise_convert branch 9 times, most recently from 324e01d to 6bcb33c Compare May 28, 2025 07:40

efd6 changed the title ~~Preserve integer-exact representation of numbers and allow conversion of floating point values to integers~~ Preserve integer-exact representation of numbers in convert processor to string May 28, 2025

efd6 force-pushed the integer_precise_convert branch 2 times, most recently from 841b7ae to 3bc17b7 Compare May 28, 2025 11:03

efd6 added the >enhancement label May 28, 2025

efd6 marked this pull request as ready for review May 28, 2025 12:13

elasticsearchmachine added the needs:triage Requires assignment of a team area label label May 28, 2025

efd6 commented May 28, 2025

View reviewed changes

modules/ingest-common/src/main/java/org/elasticsearch/ingest/common/ConvertProcessor.java Show resolved Hide resolved

efd6 requested a review from joegallo May 28, 2025 12:16

joegallo reviewed May 28, 2025

View reviewed changes

modules/ingest-common/src/main/java/org/elasticsearch/ingest/common/ConvertProcessor.java Outdated Show resolved Hide resolved

joegallo reviewed May 28, 2025

View reviewed changes

joegallo added :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP v8.19.0 labels May 28, 2025

joegallo self-assigned this May 28, 2025

joegallo added the Team:Data Management Meta label for data/management team label May 28, 2025

elasticsearchmachine removed the needs:triage Requires assignment of a team area label label May 28, 2025

efd6 force-pushed the integer_precise_convert branch from bc5fde8 to d6072d8 Compare May 28, 2025 21:53

This comment was marked as resolved.

Sign in to view

efd6 requested a review from joegallo May 29, 2025 00:30

efd6 changed the title ~~Preserve integer-exact representation of numbers in convert processor to string~~ Preserve integer-exact representation of numbers and allow conversion of floating point values to integers May 29, 2025

github-actions bot deployed to docs-preview May 29, 2025 03:05 View deployment

efd6 commented May 29, 2025

View reviewed changes

efd6 force-pushed the integer_precise_convert branch from 9876f1e to 076ca12 Compare May 29, 2025 04:20

github-actions bot deployed to docs-preview May 29, 2025 04:21 View deployment

This comment was marked as resolved.

Sign in to view

github-actions bot deployed to docs-preview May 30, 2025 14:27 View deployment

joegallo requested changes May 30, 2025

View reviewed changes

efd6 added 7 commits June 2, 2025 06:10

add tests

6575727

address pr comment

4cf4012

remove unused var

9a60071

This was introduced from testConvertDouble where it is also unused.

try using STRING.convert in place of toString in INTEGER and LONG con…

4490ca6

…vert

add tests

aef9299

Update docs/changelog/128540.yaml

e4c0875

efd6 force-pushed the integer_precise_convert branch from a23454b to e4c0875 Compare June 1, 2025 20:41

github-actions bot deployed to docs-preview June 1, 2025 20:41 View deployment

joegallo added 2 commits June 2, 2025 09:59

Fix some errant whitespace

b6923fd

Silence some warnings from IntelliJ

e04dadd

github-actions bot deployed to docs-preview June 2, 2025 14:01 View deployment

efd6 requested a review from joegallo June 4, 2025 20:53

PeteGillinElastic reviewed Jun 5, 2025

View reviewed changes

Preserve integer-exact representation of numbers and allow conversion of floating point values to integers #128540

Are you sure you want to change the base?

Preserve integer-exact representation of numbers and allow conversion of floating point values to integers #128540

Conversation

efd6 commented May 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Questions

Uh oh!

elasticsearchmachine commented May 28, 2025

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

efd6 May 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

elasticsearchmachine commented May 28, 2025

Uh oh!

This comment was marked as resolved.

elasticsearchmachine commented May 29, 2025

Uh oh!

efd6 May 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

This comment was marked as resolved.

joegallo left a comment

Choose a reason for hiding this comment

Uh oh!

kortschak commented May 30, 2025 • edited by efd6 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

joegallo commented Jun 4, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

PeteGillinElastic Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

efd6 commented May 28, 2025 •

edited

Loading

efd6 May 28, 2025 •

edited

Loading

efd6 May 29, 2025 •

edited

Loading

kortschak commented May 30, 2025 •

edited by efd6

Loading

PeteGillinElastic Jun 5, 2025 •

edited

Loading