Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(spec/java): add strip flag in meta string encoding spec #1565

Merged
merged 3 commits into from
Apr 24, 2024

Conversation

chaokunyang
Copy link
Collaborator

What does this PR do?

add strip flag in meta string encoding spec

Related issues

#1540

Does this PR introduce any user-facing change?

  • Does this PR introduce any public API change?
  • Does this PR introduce any binary protocol compatibility change?

Benchmark

@chaokunyang
Copy link
Collaborator Author

CI failure seems related to actions/setup-python#850

Copy link
Member

@theweipeng theweipeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@chaokunyang chaokunyang merged commit ba451c5 into apache:main Apr 24, 2024
32 checks passed
@qingoba
Copy link
Contributor

qingoba commented Apr 26, 2024

In MetaStringEncoder.java, we use 0x08:

boolean stripLastChar = bytes.length * 8 >= totalBits + bitsPerChar;
if (stripLastChar) {
bytes[0] = (byte) (bytes[0] | 0x80);
}

In MetaString.java, we use 0b1:

if (encoding != Encoding.UTF_8) {
  Preconditions.checkArgument(bytes.length > 0);
  this.stripLastChar = (bytes[0] & 0b1) != 0;
} else {
  this.stripLastChar = false;
}

These two are not consistent.

@LiangliangSui
Copy link
Contributor

In MetaStringEncoder.java, we use 0x08:

boolean stripLastChar = bytes.length * 8 >= totalBits + bitsPerChar;
if (stripLastChar) {
bytes[0] = (byte) (bytes[0] | 0x80);
}

In MetaString.java, we use 0b1:

if (encoding != Encoding.UTF_8) {
  Preconditions.checkArgument(bytes.length > 0);
  this.stripLastChar = (bytes[0] & 0b1) != 0;
} else {
  this.stripLastChar = false;
}

These two are not consistent.

Thanks for pointing out this issue, this is indeed a bug and our unit tests don't cover it.

0x80 should also be used in MetaString.

if (encoding != Encoding.UTF_8) {
  Preconditions.checkArgument(bytes.length > 0);
  this.stripLastChar = (bytes[0] & 0x80) != 0;
} else {
  this.stripLastChar = false;
}

Do you want to submit a PR to fix this issue and add unit tests? @qingoba

@qingoba
Copy link
Contributor

qingoba commented Apr 26, 2024

Sure, I will do this after I finish #1566

@chaokunyang
Copy link
Collaborator Author

In MetaStringEncoder.java, we use 0x08:

boolean stripLastChar = bytes.length * 8 >= totalBits + bitsPerChar;
if (stripLastChar) {
bytes[0] = (byte) (bytes[0] | 0x80);
}

In MetaString.java, we use 0b1:

if (encoding != Encoding.UTF_8) {
  Preconditions.checkArgument(bytes.length > 0);
  this.stripLastChar = (bytes[0] & 0b1) != 0;
} else {
  this.stripLastChar = false;
}

These two are not consistent.

The MetaStringDecoder parse the flag by itself, and didn't use this flag. Currently the org.apache.fury.meta.MetaString#stripLastChar is not called by other classes. So the fury serializaiton ut still succceed.

@qingoba
Copy link
Contributor

qingoba commented Apr 26, 2024

There is another question.
If input string is empty, encoded bytes is also empty:

public MetaString encode(String input, Encoding encoding) {
  Preconditions.checkArgument(
      input.length() < Short.MAX_VALUE, "Long meta string than 32767 is not allowed");
  if (input.isEmpty()) {
    return new MetaString(input, Encoding.UTF_8, specialChar1, specialChar2, new byte[0]);
  }
}

However, decode procedure doesn't check empty. It will access bytes[0] directly:

private String decodeLowerSpecial(byte[] data) {
  StringBuilder decoded = new StringBuilder();
  int totalBits = data.length * 8; // Total number of bits in the data
  boolean stripLastChar = (data[0] & 0x80) != 0; // Check the first bit of the first byte
}

So decode procedure should check empty.

@chaokunyang
Copy link
Collaborator Author

There is another question. If input string is empty, encoded bytes is also empty:

public MetaString encode(String input, Encoding encoding) {
  Preconditions.checkArgument(
      input.length() < Short.MAX_VALUE, "Long meta string than 32767 is not allowed");
  if (input.isEmpty()) {
    return new MetaString(input, Encoding.UTF_8, specialChar1, specialChar2, new byte[0]);
  }
}

However, decode procedure doesn't check empty. It will access bytes[0] directly:

private String decodeLowerSpecial(byte[] data) {
  StringBuilder decoded = new StringBuilder();
  int totalBits = data.length * 8; // Total number of bits in the data
  boolean stripLastChar = (data[0] & 0x80) != 0; // Check the first bit of the first byte
}

So decode procedure should check empty.

Hi @qingoba , do you have time to fix this today? We plan to start release fury v0.5.0 today and include this fix. If you don't have time, I can take over it and submit a fix later

@qingoba
Copy link
Contributor

qingoba commented Apr 27, 2024

I can fix this now.

chaokunyang pushed a commit that referenced this pull request Apr 27, 2024
…1587)

## What does this PR do?
Fix two bugs:
+ fix method `MetaString.stripLastChar()` logical error
+ add empty encoded bytes check in `MetaStringDecoder.decode()`


## Related issues
- #1565 


## Does this PR introduce any user-facing change?
- [No] Does this PR introduce any public API change?
- [No] Does this PR introduce any binary protocol compatibility change?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants