-
Notifications
You must be signed in to change notification settings - Fork 3.4k
HBASE-26122: Implement an optional maximum size for Gets, after which a partial result is returned #3532
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HBASE-26122: Implement an optional maximum size for Gets, after which a partial result is returned #3532
Conversation
b72fade
to
3ef4560
Compare
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
3ef4560
to
8a43862
Compare
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
8a43862
to
bef5299
Compare
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
f362fda
to
3fc6c7c
Compare
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
3fc6c7c
to
5d9c8cb
Compare
🎊 +1 overall
This message was automatically generated. |
… a partial result is returned
5d9c8cb
to
4ab2b9a
Compare
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Can only go into branch-2 and master I'd say, not into 2.4 given it adds API. That ok by you @bbeaudreault ?
* | ||
* If set to a value greater than zero, the server may respond with a Result where | ||
* {@link Result#mayHaveMoreCellsInRow()} is true. The user is required to handle | ||
* this case. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you do when mayHaveMoreCellsInRow is true @bbeaudreault ? How do you use this boolean in prod (if you don't mind me asking...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At HubSpot we have a wrapper implementation of Table which all downstream users go through. This wrapper table enforces that setMaxResultSize
is set to a standard value that we've deemed safe. If a result comes back and mayHaveMoreCellsInRow
is true, we throw an exception. If a team gets such an exception they can request a temporary allowance which disables the check. In the meantime they are expected to add a filter to paginate so they don't hit the max limit.
This is a little draconian, but we used to have lots of OOM issues due to large gets/puts/scans. Another possible solution is to iterate with PageFilter, like I did in testGetPartialResults
. We planned to do something like that eventually, but in the end we had rolled this out in such a way that the number of exceptions were so few that we never did the work.
Would you be open to an automatic stitching in the future, like we do with Scans? I can't do that now, but might be a reasonable followup jira.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In terms of that last sentence, maybe it's better to not support stitching for Gets. Instead people should rewrite these large Gets as Scans or add filters like above. Stitching obviously increases the latency and that could be very misleading for Gets. Multigets even worse (and harder to implement)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want a Cell Streaming API.
@@ -91,6 +91,8 @@ message Get { | |||
optional Consistency consistency = 12 [default = STRONG]; | |||
repeated ColumnFamilyTimeRange cf_time_range = 13; | |||
optional bool load_column_families_on_demand = 14; /* DO NOT add defaults to load_column_families_on_demand. */ | |||
|
|||
optional uint64 max_result_size = 15; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, these protos we want to let go of eventually. They are for use of downstreamers, not for internal hbase use.... internally we use the shaded stuff. So, its fine adding this here but probably better not adding it.
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
Show resolved
Hide resolved
|
||
private Pair<List<Cell>, ScannerContext> getInternal(Get get, boolean withCoprocessor, long nonceGroup, long nonce) | ||
throws IOException { | ||
ScannerContext scannerContext = ScannerContext.newBuilder() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Every Get will now carry a ScannerContext where previous it was usually null? (IIRC) Do you think this will cost Bryan? Will there be other benefits having a ScannerContext on every Get?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good point, thanks. I'll push a commit which only adds a ScannerContext if getMaxResultSize > 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the new commit, I also removed the ScannerContext from the version of get
which just returns a List<Cell>
. I probably shouldn't have included it in the first place. It's odd to return a partial result with no way to inform the client to that fact. I'm not sure in what context this method is used, looks like just tests maybe if intellij isn't lying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. LGTM.
? ScannerContext.newBuilder() | ||
.setSizeLimit(LimitScope.BETWEEN_CELLS, get.getMaxResultSize(), get.getMaxResultSize()) | ||
.build() | ||
: null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
[2021-08-10T22:31:00.190Z] +1 overall Fail is our build issue... java.nio.file.FileSystemException: /home/jenkins/jenkins-agent/workspace/Base-PreCommit-GitHub-PR_PR-3532/yetus-jdk11-hadoop3-check: Read-only file system |
https://issues.apache.org/jira/browse/HBASE-26122
Adds a
Get#setMaxResultSize(long)
method, similar to the one in Scan. The default is -1, meaning not enabled. When a max result size is added to a Get and there are more cells than fit in the specified size,Result#mayHaveMoreCellsInRow()
will be true. This utilizes ScannerContext on the server side, since Gets are backed by single row scans.Unlike Scans, no response stitching is implemented. The user must handle the possible true value in
Result#mayHaveMoreCellsInRow()
accordingly. This seems like a fine initial behavior since the default is unlimited, meaning a user would have to opt in to this new functionality with the intent to handle the possible return values.I've added tests to TestHRegion for the HRegion implementation. For the RSRpcServices I wasn't sure of the best convention, so I added it to the existing TestPartialResultsFromClientSide. All new tests pass.