-
Couldn't load subscription status.
- Fork 2.3k
[GRPC] Add SMILE/CBOR/YAML document format support to Bulk GRPC endpoint #19744
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
8168c3c to
25991d7
Compare
|
❌ Gradle check result for 25991d7: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: Karen X <karenxyr@gmail.com>
Signed-off-by: Karen X <karenxyr@gmail.com>
Signed-off-by: Karen X <karenxyr@gmail.com>
|
❌ Gradle check result for 4ce3caf: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
❌ Gradle check result for 4ce3caf: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
This isn't a one-way door, right? If we ever find that the autodetected type is wrong, we have the option of adding a document_type field that will bypass the autodetection. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, simple improvement!
|
❌ Gradle check result for 41bfa33: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
❌ Gradle check result for 41bfa33: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: Karen X <karenxyr@gmail.com>
|
❌ Gradle check result for 9ad3d33: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #19744 +/- ##
============================================
- Coverage 73.15% 73.09% -0.06%
+ Complexity 70958 70940 -18
============================================
Files 5736 5736
Lines 324734 324743 +9
Branches 46979 46980 +1
============================================
- Hits 237548 237380 -168
- Misses 68031 68252 +221
+ Partials 19155 19111 -44 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: Karen X <karenxyr@gmail.com>
|
❌ Gradle check result for 7499a2e: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
❕ Gradle check result for 7499a2e: UNSTABLE Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure. |
Description
This PR adds auto-detection capability to the gRPC Bulk API to support ingestion of all OpenSearch XContent document formats (CBOR, SMILE, and YAML), not just JSON.
The main motivation is to improve performance via binary formats (CBOR, SMILE). A secondary reason is to maintain feature parity with the HTTP APIs.
Differences: REST Bulk vs gRPC Bulk API
Some differences compared to the HTTP side are:
\n) to parse the NDJSON format, where each line represents either an action metadata object or a document. (JSON uses\nand SMILE uses\0xFFas the delimiter between documents, but CBOR/YAML do not have such delimeters). Thus HTTP Bulk using NDJSON cannot support CBOR/YAML. gRPC avoids this because it uses Protobufs with explicit message boundaries (bulk_request_body[] array), eliminating the need for stream separators.application/json,application/smile) must be provided to determine the format of the request. The gRPC request parser usesMediaTypeRegistry.mediaTypeFromBytesto auto-detects the document format. An alternative considered was to provide a "document_type" field in the protobuf request to allow the user to set it explictly, but this didn't seem necessary.Test Plan
Note: Smile is unable to be tested via a grpccurl command as there is no plaintext/non-binary representation for the SMILE document. But unit tests confirm SMILE format detection + setting is working.
Related Issues
Partially resolves #19311
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.