Move Text class to libs/xcontent #128780

jordan-powers · 2025-06-02T18:31:54Z

This PR is a precursor to #126492.

It does three things:

Move org.elasticsearch.common.text.Text from :server to org.elasticsearch.xcontent.Text in :libs:x-content.
Refactor the Text class to use a new EncodedBytes record instead of the elasticsearch BytesReference.
Add the XContentString interface, with the Text class implement that interface.

These changes were originally implemented in #127666 and #128316, however they were reverted in #128484 due to problems caused by the mutable nature of java ByteBuffers. This is resolved by instead using a new immutable EncodedBytes record.

elasticsearchmachine · 2025-06-02T18:32:23Z

Pinging @elastic/es-core-infra (Team:Core/Infra)

prdoyle · 2025-06-03T13:30:22Z

libs/x-content/src/test/java/org/elasticsearch/xcontent/TextTests.java

+
+import java.nio.charset.StandardCharsets;
+
+public class TextTests extends ESTestCase {


This is a pretty good test suite!

Optional: Perhaps consider some tests that use randomization to do a sequence of operations (like string(), stringLength(), bytes()) and assert that the result is right?

This is probably not necessary, since you've already added regression tests for the sequences that bit us before.

prdoyle · 2025-06-03T13:31:29Z

server/src/main/java/org/elasticsearch/common/io/stream/StreamInput.java

+            readBytes(bytes, 0, length);
+        }
+        var encoded = new XContentString.EncodedBytes(bytes);
+        return new Text(encoded);


This looks like repeated code that could benefit from being pulled into a helper method, like we did with readBytesReference.

prdoyle · 2025-06-03T13:32:17Z

For the record: why add a new EncodedBytes record instead of moving BytesReference into xcontent?

jordan-powers · 2025-06-03T14:16:48Z

We can't move BytesReference into libs/xcontent because it has several references to lucene's BytesRef, and lucene is not included in the xcontent project.

prdoyle

All my comments are either optional, or pertain to javadocs, or don't affect production. I'd like to see the javadocs fixed up, but otherwise this is mergeable.

prdoyle · 2025-06-04T14:20:52Z

libs/x-content/src/main/java/org/elasticsearch/xcontent/Text.java

+    private int stringLength = -1;
+
+    /**
+     * Construct a Text from a UTF-8 encoded ByteBuffer. Since no string length is specified, {@link #stringLength()}


Is this comment correct? It's not using a ByteBuffer now.

~~Also I wonder if the fact that it's UTF-8 should be reflected somewhere in the naming? Perhaps EncodedBytes could be Utf8Bytes? I'm not sure whether that's appropriate but it's a thought.~~

UTF8Bytes makes sense to me! I'll rename it

Looking at the implementation of EncodedBytes, that class does not seem to be aware of UTF-8 in any way, so renaming it is probably not appropriate after all.

Yeah I don't really have an opinion anymore either way TBH.

I'm still leaning towards calling it UTF8Bytes because its main purpose is as a return type from XContentString#bytes(), which specifies in the javadoc that it's always UTF-8 encoded.

prdoyle · 2025-06-04T14:23:12Z

libs/x-content/src/main/java/org/elasticsearch/xcontent/Text.java

-    public Text(BytesReference bytes) {
+    /**
+     * Construct a Text from a UTF-8 encoded ByteBuffer and an explicit string length. Used to avoid string conversion
+     * in {@link #stringLength()}.


I think the javadocs here should answer this question: Is there a requirement that the stringLength matches the value that would have been produced by the other constructor? Or is the caller free to specify a different length?

prdoyle · 2025-06-04T14:24:10Z

libs/x-content/src/main/java/org/elasticsearch/xcontent/Text.java

-        return text == null ? bytes.utf8ToString() : text;
+        if (text == null) {
+            var byteBuff = ByteBuffer.wrap(bytes.bytes(), bytes.offset(), bytes.length());
+            text = StandardCharsets.UTF_8.decode(byteBuff).toString();


Re my comment above: I think we could follow this with:

assert (stringLength < 0) || (stringLength == text.length());

This would catch cases where the wrong string length is specified at construction time.

prdoyle · 2025-06-04T14:32:02Z

libs/x-content/src/main/java/org/elasticsearch/xcontent/Text.java

@@ -36,31 +31,45 @@ public static Text[] convertFromStringArray(String[] strings) {
        return texts;
    }

-    private BytesReference bytes;
+    private EncodedBytes bytes;
    private String text;


(I wonder why this is called text when we refer to it as "string" everywhere else?)

prdoyle · 2025-06-04T14:33:58Z

libs/x-content/src/main/java/org/elasticsearch/xcontent/XContentString.java

+
+        @Override
+        public int compareTo(EncodedBytes o) {
+            return ByteBuffer.wrap(bytes, offset, length).compareTo(ByteBuffer.wrap(o.bytes, o.offset, o.length));


Random thought: Is there any value in peeling off a fast-path case for when all the fields are identical by ==? Not sure whether this ever actually happens, but it would avoid two object allocations.

Make sense to me, especially since equal() delegates to compareTo(), so this could happen fairly frequently.

Yeah, essentially you're making sure you're no slower than the built-in Record.equals() for the case that it returns true. Could be a little slower for the false case.

prdoyle · 2025-06-04T14:38:20Z

libs/x-content/src/main/java/org/elasticsearch/xcontent/XContentString.java

+    String string();
+
+    /**
+     * Returns a UTF8-encoded {@link ByteBuffer} view of the data.


Not a ByteBuffer.

prdoyle · 2025-06-04T14:42:07Z

libs/x-content/src/test/java/org/elasticsearch/xcontent/TextTests.java

+
+import java.nio.charset.StandardCharsets;
+
+public class TextTests extends ESTestCase {


This is a pretty good test suite!

Optional: Perhaps consider some tests that use randomization to do a sequence of operations (like string(), stringLength(), bytes()) and assert that the result is right?

This is probably not necessary, since you've already added regression tests for the sequences that bit us before.

prdoyle · 2025-06-04T14:43:13Z

libs/x-content/src/test/java/org/elasticsearch/xcontent/TextTests.java

+        assertEquals(value, text.string());
+        assertEquals(encoded, text.bytes());
+
+        assertSame(text.bytes(), text.bytes());


prdoyle · 2025-06-04T14:43:47Z

libs/x-content/src/test/java/org/elasticsearch/xcontent/TextTests.java

+        assertEquals(value, text.string());
+        assertEquals(encoded, text.bytes());
+
+        assertSame(encoded, text.bytes());


Probably we could just change all prior assertEquals on encoded to be assertSame instead?

I want to keep the assertEquals to verify the equals() method. I could see a situation where equals() is wrong or messes up the internal state somehow.

prdoyle · 2025-06-04T14:51:12Z

Oh one last question:

Did you ever try your new unit test on the old version of Text from before your changes? (The goal being to ensure the behaviour us unchanged, as opposed to merely reasonable.)

jordan-powers · 2025-06-04T14:58:59Z

@prdoyle Thanks for the review! I haven't yet, but I agree it makes sense to run the new test against the old implementation, so I'll be sure to do that.

This reverts commit 7690f46.

We actually can't use the native java ByteBuffer class here because it's a mutable class (since it keeps track of how many bytes have been consumed), which creates concurrency issues.

jordan-powers · 2025-06-04T16:18:56Z

Ok, I rebased the PR so that the TextTests were added before the updates to the Text class. I then checked out f72a133 and ran the tests against the old implementation and everything passed.

jordan-powers requested review from prdoyle and ldematte June 2, 2025 18:31

jordan-powers added the >non-issue label Jun 2, 2025

jordan-powers requested a review from a team as a code owner June 2, 2025 18:31

jordan-powers added :Core/Infra/Core Core issues without another label auto-backport Automatically create backport pull requests when merged v8.19.0 v9.1.0 labels Jun 2, 2025

jordan-powers self-assigned this Jun 2, 2025

elasticsearchmachine added the Team:Core/Infra Meta label for core/infra team label Jun 2, 2025

prdoyle reviewed Jun 3, 2025

View reviewed changes

This was referenced Jun 3, 2025

[CI] SearchHitsTests testConcurrentEquals failing #127971

Closed

[CI] SearchHitsTests testConcurrentSerialization failing #128029

Closed

prdoyle approved these changes Jun 4, 2025

View reviewed changes

jordan-powers added 7 commits June 4, 2025 09:16

Add TextTests

f72a133

Reapply changes to Text class

614ec17

This reverts commit 7690f46.

Use immutable XContentText.EncodedBytes record for encoded Text

dcbf2cb

We actually can't use the native java ByteBuffer class here because it's a mutable class (since it keeps track of how many bytes have been consumed), which creates concurrency issues.

Update TextTests

f4f6da6

Pull common StreamInput#readText code into helper method

7bb2f17

Rename to UTF8Bytes and clean up javadoc

0af1611

Add randomized test to TextTests

4a0ddcd

jordan-powers force-pushed the move-text-class-2 branch from 6e9de13 to 4a0ddcd Compare June 4, 2025 16:17


		import java.nio.charset.StandardCharsets;

		public class TextTests extends ESTestCase {

Move Text class to libs/xcontent #128780

Are you sure you want to change the base?

Move Text class to libs/xcontent #128780

Conversation

jordan-powers commented Jun 2, 2025

Uh oh!

elasticsearchmachine commented Jun 2, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

prdoyle commented Jun 3, 2025

Uh oh!

jordan-powers commented Jun 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

prdoyle left a comment

Choose a reason for hiding this comment

Uh oh!

prdoyle Jun 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

prdoyle Jun 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

prdoyle commented Jun 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jordan-powers commented Jun 4, 2025

Uh oh!

jordan-powers commented Jun 4, 2025

Uh oh!

Uh oh!

jordan-powers commented Jun 3, 2025 •

edited

Loading

prdoyle Jun 4, 2025 •

edited

Loading

prdoyle Jun 4, 2025 •

edited

Loading

prdoyle commented Jun 4, 2025 •

edited

Loading