Skip to content

HBASE-28647 Support streams in REST Client, RemoteHTable and RemoteAdmin #6010

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

stoty
Copy link
Contributor

@stoty stoty commented Jun 20, 2024

No description provided.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@stoty stoty requested a review from Apache9 June 21, 2024 11:17
Copy link
Contributor

@Apache9 Apache9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not fully understand what does the streams mean here...

All requests and responses are fully kept in memory here I think?

* For sending a Protobuf encoded object via Apache HttpClient efficiently without an interim byte
* array. This exposes the underlying Apache HttpClient types, but so do the other client classes.
*/
@InterfaceAudience.Public
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be IA.Public?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we expect the user to explicitly create these entities.

i have not updated the tests to use them, but the intentended usage is similar to how
org.apache.hadoop.hbase.rest.client.RemoteHTable.put(Put) is implemented.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least have a section in the javadoc to show how to use this class?

And since we have not reach an agreement on whether to move RemoteAdmin and RemoteHTable to src/main, I'm not sure whether we should make this class IA.Public...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only mentined RemoHTable as example where this is used, but this
is not limited to RemotHTable, it should also be used directly in user code.

I will add a standalone test case that uses this class as an illustration.

Copy link
Contributor Author

@stoty stoty Oct 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least have a section in the javadoc to show how to use this class?

And since we have not reach an agreement on whether to move RemoteAdmin and RemoteHTable to src/main, I'm not sure whether we should make this class IA.Public...

This is not tied to RemoteAdmin / RemoteHTable.
This is just a more efficient way to use the existing public Client class (or any custom client based on Apache HttpClient)

The RemoteAdmin / RemoteHTable API does not even change, this new class only used internally in the RemoteHTable implementation.


@Override
public boolean isStreaming() {
// TODO Auto-generated method stub
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this line?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

@stoty
Copy link
Contributor Author

stoty commented Jul 8, 2024

I do not fully understand what does the streams mean here...

All requests and responses are fully kept in memory here I think?

Currently they are.
The current code calls the Apache HttpClient getResponseBody() method, which will cause the client to wait untill all data is received, and load it into a byte array.

However, the goal is to avoid having to do that.

Protobuf primarily works on streams, so for a large resultset, we may reduce both processing (wall clock) time and memory consumption by not buffering the whole response into memory, but reading directly from the stream, so that

  • We do not have to wait for the full response to arrive before starting to process it.
  • We do not have to copy the whole response into a single byte array.
  • The processed response segments can be GCd while we are processing the rest of the message.

The Cell/Cellset structures are still kept in memory, but we avoid having to explicitly store them twice during processing (once the serialized byte array and once the java POJOs)

@stoty stoty requested a review from Apache9 July 8, 2024 14:54
@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache9
Copy link
Contributor

Apache9 commented Jul 12, 2024

Please fix the javac and javadoc warnings if possible?

@stoty
Copy link
Contributor Author

stoty commented Jul 15, 2024

I have added org.apache.hadoop.hbase.rest.TestGetAndPutResource.testMultipleCellPutPBEntity() to illustrate how ProtobufEntity can be used to more efficiently execute PUT operations.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@stoty
Copy link
Contributor Author

stoty commented Jul 15, 2024

Fixed the warnings (though most were in code that I haven't touched)
and added a test that can serve as an example on using ProtobufEntity.
PTAL @Apache9 .

@stoty
Copy link
Contributor Author

stoty commented Aug 16, 2024

Can you take a look @Apache9 ?

@stoty
Copy link
Contributor Author

stoty commented Sep 10, 2024

Now that you're back from vacation and hopefully settled can you take a look at this again, @Apache9 ?

path.append('/');
path.append("dummy_row");
HttpPut httpPut = new HttpPut(path.toString());
httpPut.setEntity(new ProtobufHttpEntity(model));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is how a normal client can use ProtobufHttpEntity, @Apache9 .

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 30s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 3m 16s master passed
+1 💚 compile 0m 21s master passed
+1 💚 javadoc 0m 17s master passed
+1 💚 shadedjars 5m 58s branch has no errors when building our shaded downstream artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 3m 2s the patch passed
+1 💚 compile 0m 21s the patch passed
+1 💚 javac 0m 21s the patch passed
+1 💚 javadoc 0m 16s the patch passed
+1 💚 shadedjars 5m 52s patch has no errors when building our shaded downstream artifacts.
_ Other Tests _
+1 💚 unit 3m 50s hbase-rest in the patch passed.
24m 43s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6010/9/artifact/yetus-jdk17-hadoop3-check/output/Dockerfile
GITHUB PR #6010
Optional Tests javac javadoc unit compile shadedjars
uname Linux 85fb0039742f 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / a0803e6
Default Java Eclipse Adoptium-17.0.11+9
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6010/9/testReport/
Max. process+thread count 1672 (vs. ulimit of 30000)
modules C: hbase-rest U: hbase-rest
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6010/9/console
versions git=2.34.1 maven=3.9.8
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 31s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
_ master Compile Tests _
+1 💚 mvninstall 3m 17s master passed
+1 💚 compile 0m 33s master passed
+1 💚 checkstyle 0m 9s master passed
+1 💚 spotbugs 0m 35s master passed
+1 💚 spotless 0m 45s branch has no errors when running spotless:check.
_ Patch Compile Tests _
+1 💚 mvninstall 3m 3s the patch passed
+1 💚 compile 0m 31s the patch passed
+1 💚 javac 0m 31s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 9s the patch passed
+1 💚 spotbugs 0m 41s the patch passed
+1 💚 hadoopcheck 11m 42s Patch does not cause any errors with Hadoop 3.3.6 3.4.0.
+1 💚 spotless 0m 43s patch has no errors when running spotless:check.
_ Other Tests _
+1 💚 asflicense 0m 10s The patch does not generate ASF License warnings.
30m 7s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6010/9/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #6010
Optional Tests dupname asflicense javac spotbugs checkstyle codespell detsecrets compile hadoopcheck hbaseanti spotless
uname Linux b9ac3c8ec6bf 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / a0803e6
Default Java Eclipse Adoptium-17.0.11+9
Max. process+thread count 84 (vs. ulimit of 30000)
modules C: hbase-rest U: hbase-rest
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6010/9/console
versions git=2.34.1 maven=3.9.8 spotbugs=4.7.3
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@stoty
Copy link
Contributor Author

stoty commented Feb 24, 2025

Can we get back to this patch, @Apache9 ?

To recap:

This is a perf optimization for the included REST Client, that avoids copying the full HTTP response into a byte array before feeding it into protobuf, in favor of using streams, which saves both memory and CPU.

ProtobufHttpEntity is a similar optimization for uploading large mutations, which again avoids storing the protobuf output to a byte array before uploading, and instead allows streaming the protobuf marshaller output directly to HTTP.

I have also updated the code to use ClosableHttpResponse instead of HttpResponse.

None of this affects the server code.

While this does expose some Apache HttpClient classes in the API, this is nothing new, as the current Client already does the same.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants