Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -140,7 +140,7 @@ private static int memcmp(
long leftLong = PlatformDependent.getLong(lPos);
long rightLong = PlatformDependent.getLong(rPos);
if (leftLong != rightLong) {
return unsignedLongCompare(leftLong, rightLong);
return unsignedLongCompare(Long.reverseBytes(leftLong), Long.reverseBytes(rightLong));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the purpose is to compare memory segment in lexicographic order, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. I added a unittest, please take a look.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pprudhvi thanks for your effort.
I think we need to address two problems here:

  1. we need to check the byte order before we decide if we need to reserve bytes.
  2. we must make sure that comparing long values in endian order is identical to comparing 8 bytes in lexicographic order.

I have a little doubtful about 2). If you have an answer to it, please let me know. Thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following prints 2^24

    ArrowBuf left = allocator.buffer(4);
    left.setByte(0, 0);
    left.setByte(1, 0);
    left.setByte(2, 0);
    left.setByte(3, 1);
    System.out.println(PlatformDependent.getInt(left.memoryAddress()));
    left.close();

this means the bytes are assumed to be in little endian format while reading as int. Directly comparing int won't be the same as comparing in lexicographic order

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pprudhvi Maybe I did not make it clear enough. Let me explain the two points in more detail:

  1. Some machines are big endian while others are little endian. We should check this before reversing the bytes. If the code is running on a big endian machine, there is no need to reverse bytes.

  2. I am not sure if the following two results are always identical when comparing 2 pieces of continuous 8 bytes data:
    a) read both data as longs (in big endian order), and compare the long values.
    b) read both data as 8 bytes, and compare the 8 bytes in sequence, until the first unmatched bytes are found. If all 8 bytes are matched, we return 0.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the arrow buffers are always encoded in little-endian, and java is always in big-endian. so, I think we don't need to check endianness again.

}
lPos += 8;
rPos += 8;
Expand All @@ -151,7 +151,7 @@ private static int memcmp(
int leftInt = PlatformDependent.getInt(lPos);
int rightInt = PlatformDependent.getInt(rPos);
if (leftInt != rightInt) {
return unsignedIntCompare(leftInt, rightInt);
return unsignedIntCompare(Integer.reverseBytes(leftInt), Integer.reverseBytes(rightInt));
}
lPos += 4;
rPos += 4;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -111,4 +111,26 @@ public void testCompare() {
buffer2.close();

}

@Test
public void testStringCompare() {
String[] leftStrings = {"cat", "cats", "catworld", "dogs", "bags"};
String[] rightStrings = {"dog", "dogs", "dogworld", "dog", "sgab"};

for (int i = 0; i < leftStrings.length; ++i) {
String leftStr = leftStrings[i];
String rightStr = rightStrings[i];

ArrowBuf left = allocator.buffer(SIZE);
left.setBytes(0, leftStr.getBytes());
ArrowBuf right = allocator.buffer(SIZE);
right.setBytes(0, rightStr.getBytes());

assertEquals(leftStr.compareTo(rightStr) < 0 ? -1 : 1,
ByteFunctionHelpers.compare(left, 0, leftStr.length(), right, 0, rightStr.length()));

left.close();
right.close();
}
}
}