-
Notifications
You must be signed in to change notification settings - Fork 3.4k
HBASE-28175: Deep copy RpcLogDetails' param field #5481
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
hbase-server/src/main/java/org/apache/hadoop/hbase/namequeues/RpcLogDetails.java
Outdated
Show resolved
Hide resolved
💔 -1 overall
This message was automatically generated. |
0edc56f
to
eafcd3f
Compare
.append("scan", scan).isEquals(); | ||
.append(scan, that.scan).isEquals(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also noticed this bug in the SlowLogParams equals implementation, but I don't think it's related to the slow log payload corruption that we've observed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I doubt if we can write some real test for this, can you think of any @rmdmattingly?
It would be difficult to reliably reproduce the overwriting of the input buffer in a unit test, but maybe it's possible. We could probably write a unit test which confirms the copied nature of the message here |
This might seem silly, but I wonder if you could at least do the simple test of:
This may not be our usual end-to-end test that proves the bug, but it at least proves the fix under the assumption that the bug is occurring periodically. |
Nice, I like that idea! I'm headed out on vacation tonight, but will add that test when I return later next week |
💔 -1 overall
This message was automatically generated. |
An alternative approach here might be to find a way to call ByteBuffer.retain() when creating the SlowLogParams, and then ByteBuffer.release() when consuming in SlowLogService. That might be a little tricker, but would probably be better in terms of GC and time spent copying. It sort of depends on the rate of slow logs. Looking briefly at it, I think we could possibly utilize the method @Apache9 added to RpcCall in HBASE-22539 -- retainByWAL when enqueueing a RpcLogDetails, and then call releaseByWAL in SlowLogQueueService.consumeEventFromDisruptor. Obviously we'd need to rename the methods in that case. What do you think of that Duo? It seems like the exact same problem |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
@Test | ||
public void itDeepCopiesRpcLogDetailsParams() throws IOException { | ||
ByteBuffer buffer = ByteBuffer.allocate(scanRequest.toByteArray().length); | ||
CodedInputStream cis = UnsafeByteOperations.unsafeWrap(buffer).newCodedInput(); | ||
cis.enableAliasing(true); | ||
buffer.put(scanRequest.toByteArray()); | ||
Message.Builder messageBuilder = ClientProtos.ScanRequest.newBuilder(); | ||
ProtobufUtil.mergeFrom(messageBuilder, cis, buffer.capacity()); | ||
Message message = messageBuilder.build(); | ||
RpcLogDetails rpcLogDetails = | ||
new RpcLogDetails(getRpcCall(message), message, null, 0L, 0L, null, true, false); | ||
|
||
// log's scan should be equal | ||
ClientProtos.Scan logScan = ((ClientProtos.ScanRequest) rpcLogDetails.getParam()).getScan(); | ||
assertEquals(logScan, scan); | ||
|
||
// ensure we have a different byte array for testing | ||
assertFalse(Arrays.equals(scanRequest.toByteArray(), otherScanRequest.toByteArray())); | ||
|
||
// corrupt the underlying buffer | ||
buffer.position(0); | ||
buffer.put(otherScanRequest.toByteArray(), 0, otherScanRequest.toByteArray().length); | ||
assertArrayEquals(otherScanRequest.toByteArray(), buffer.array()); | ||
|
||
// log scan should still be original scan | ||
assertEquals(logScan, scan); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test fails without this PR's changes to RpcLogDetails
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
The test failures above are unrelated. I believe this is ready for review |
this.param = param.newBuilderForType().mergeFrom(param.toByteArray()).build(); | ||
} catch (InvalidProtocolBufferException e) { | ||
LOG.error("Failed to parse protobuf for message {}", param, e); | ||
if (this.param == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this would likely always be true right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I can't imagine when it wouldn't be true so maybe we should just remove this
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, anything from your side @bbeaudreault sir?
Signed-off-by: Viraj Jasani <vjasani@apache.org> Signed-off-by: Bryan Beaudreault <bbeaudreault@apache.org>
Signed-off-by: Viraj Jasani <vjasani@apache.org> Signed-off-by: Bryan Beaudreault <bbeaudreault@apache.org>
Signed-off-by: Viraj Jasani <vjasani@apache.org> Signed-off-by: Bryan Beaudreault <bbeaudreault@apache.org>
…pache#5481) Signed-off-by: Viraj Jasani <vjasani@apache.org> Signed-off-by: Bryan Beaudreault <bbeaudreault@apache.org>
The RpcLogDetails class represents a slow (or large) log event which will later be consumed by the SlowLogQueueService.
Right now the RpcLogDetails' param field points to the slow call's Message, and this Message is backed by a CodedInputStream which may be overwritten before the given log is consumed. This overwriting of the Message may result in slow query payloads for which the metadata derived post-consumption is inaccurate.
To solve this bug we need to deep copy the Message in the RpcLogDetails constructor. I have this bug reproduced in a QA environment and have confirmed that this resolves the issue.
cc @bbeaudreault @hgromer @eab148 @bozzkar