-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Move Text class to libs/xcontent #128780
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Move Text class to libs/xcontent #128780
Conversation
Pinging @elastic/es-core-infra (Team:Core/Infra) |
|
||
import java.nio.charset.StandardCharsets; | ||
|
||
public class TextTests extends ESTestCase { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a pretty good test suite!
Optional: Perhaps consider some tests that use randomization to do a sequence of operations (like string()
, stringLength()
, bytes()
) and assert that the result is right?
This is probably not necessary, since you've already added regression tests for the sequences that bit us before.
readBytes(bytes, 0, length); | ||
} | ||
var encoded = new XContentString.EncodedBytes(bytes); | ||
return new Text(encoded); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like repeated code that could benefit from being pulled into a helper method, like we did with readBytesReference
.
For the record: why add a new |
We can't move |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All my comments are either optional, or pertain to javadocs, or don't affect production. I'd like to see the javadocs fixed up, but otherwise this is mergeable.
private int stringLength = -1; | ||
|
||
/** | ||
* Construct a Text from a UTF-8 encoded ByteBuffer. Since no string length is specified, {@link #stringLength()} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this comment correct? It's not using a ByteBuffer
now.
Also I wonder if the fact that it's UTF-8 should be reflected somewhere in the naming? Perhaps EncodedBytes
could be Utf8Bytes
? I'm not sure whether that's appropriate but it's a thought.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UTF8Bytes
makes sense to me! I'll rename it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the implementation of EncodedBytes
, that class does not seem to be aware of UTF-8
in any way, so renaming it is probably not appropriate after all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I don't really have an opinion anymore either way TBH.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still leaning towards calling it UTF8Bytes
because its main purpose is as a return type from XContentString#bytes()
, which specifies in the javadoc that it's always UTF-8 encoded.
public Text(BytesReference bytes) { | ||
/** | ||
* Construct a Text from a UTF-8 encoded ByteBuffer and an explicit string length. Used to avoid string conversion | ||
* in {@link #stringLength()}. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the javadocs here should answer this question: Is there a requirement that the stringLength
matches the value that would have been produced by the other constructor? Or is the caller free to specify a different length?
return text == null ? bytes.utf8ToString() : text; | ||
if (text == null) { | ||
var byteBuff = ByteBuffer.wrap(bytes.bytes(), bytes.offset(), bytes.length()); | ||
text = StandardCharsets.UTF_8.decode(byteBuff).toString(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re my comment above: I think we could follow this with:
assert (stringLength < 0) || (stringLength == text.length());
This would catch cases where the wrong string length is specified at construction time.
@@ -36,31 +31,45 @@ public static Text[] convertFromStringArray(String[] strings) { | |||
return texts; | |||
} | |||
|
|||
private BytesReference bytes; | |||
private EncodedBytes bytes; | |||
private String text; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I wonder why this is called text
when we refer to it as "string" everywhere else?)
|
||
@Override | ||
public int compareTo(EncodedBytes o) { | ||
return ByteBuffer.wrap(bytes, offset, length).compareTo(ByteBuffer.wrap(o.bytes, o.offset, o.length)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Random thought: Is there any value in peeling off a fast-path case for when all the fields are identical by ==
? Not sure whether this ever actually happens, but it would avoid two object allocations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense to me, especially since equal()
delegates to compareTo()
, so this could happen fairly frequently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, essentially you're making sure you're no slower than the built-in Record.equals()
for the case that it returns true. Could be a little slower for the false
case.
String string(); | ||
|
||
/** | ||
* Returns a UTF8-encoded {@link ByteBuffer} view of the data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a ByteBuffer
.
|
||
import java.nio.charset.StandardCharsets; | ||
|
||
public class TextTests extends ESTestCase { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a pretty good test suite!
Optional: Perhaps consider some tests that use randomization to do a sequence of operations (like string()
, stringLength()
, bytes()
) and assert that the result is right?
This is probably not necessary, since you've already added regression tests for the sequences that bit us before.
assertEquals(value, text.string()); | ||
assertEquals(encoded, text.bytes()); | ||
|
||
assertSame(text.bytes(), text.bytes()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tricky!
assertEquals(value, text.string()); | ||
assertEquals(encoded, text.bytes()); | ||
|
||
assertSame(encoded, text.bytes()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably we could just change all prior assertEquals
on encoded
to be assertSame
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to keep the assertEquals
to verify the equals()
method. I could see a situation where equals()
is wrong or messes up the internal state somehow.
Oh one last question: Did you ever try your new unit test on the old version of |
@prdoyle Thanks for the review! I haven't yet, but I agree it makes sense to run the new test against the old implementation, so I'll be sure to do that. |
This reverts commit 7690f46.
We actually can't use the native java ByteBuffer class here because it's a mutable class (since it keeps track of how many bytes have been consumed), which creates concurrency issues.
6e9de13
to
4a0ddcd
Compare
Ok, I rebased the PR so that the |
This PR is a precursor to #126492.
It does three things:
EncodedBytes
record instead of the elasticsearch BytesReference.These changes were originally implemented in #127666 and #128316, however they were reverted in #128484 due to problems caused by the mutable nature of java ByteBuffers. This is resolved by instead using a new immutable
EncodedBytes
record.