-
Couldn't load subscription status.
- Fork 2.3k
Refactor TransportUpdateAction to use TransportShardBulkAction directly #18996
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Refactor TransportUpdateAction to use TransportShardBulkAction directly #18996
Conversation
Eliminates unnecessary network hop through client.bulk() when already on the shard node. Improves performance for updates and retries by executing bulk operations directly at the shard level. Fixes opensearch-project#15264 Signed-off-by: Atri Sharma <atri.jiit@gmail.com>
Signed-off-by: Atri Sharma <atri.jiit@gmail.com>
|
❌ Gradle check result for 766222c: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
❌ Gradle check result for 52518e9: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
❌ Gradle check result for 52518e9: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
❌ Gradle check result for 52518e9: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
Tagging @mgodwan also to help with the review. |
|
❌ Gradle check result for 52518e9: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
❌ Gradle check result for 52518e9: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
❌ Gradle check result for 52518e9: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
Gentle ping, please |
| + "] has a default ingest pipeline or a final ingest pipeline, the support of the ingest pipelines for update operation causes unexpected result and will be removed in 3.0.0" | ||
| ); | ||
| } | ||
| client.bulk(toSingleItemBulkRequest(indexRequest), wrapBulkResponse(ActionListener.<IndexResponse>wrap(response -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given we use bulk today, ingest pipeline are getting applied when users rely on _update API for the documents which are generated here as part of the indexing request. With this change, we will not be able to leverage the same as we are not going through the TransportBulkAction anymore.
This may end up to be a breaking change for users if we directly delegate to bulk shard action and not apply the operations within the ingest pipeline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That said, I don't think _bulk API applies ingest pipeline on update operations either, so this change may end up unifying the behavior for both bulk/update APIs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given this is a breaking change in terms of expectations for users, I don't think it would be ok to do this change, unless we enable support for ingest pipeline here. @shwetathareja thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we enable ingest here, would it not defeat the optimisation that this change looks to do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Steps to see the change in final doc:
PUT _ingest/pipeline/mypipeline
{
"description": "This pipeline processes student data",
"processors": [
{
"set": {
"description": "Sets the graduation year to 2023",
"field": "grad_year",
"value": 2023
}
},
{
"set": {
"description": "Sets graduated to true",
"field": "graduated",
"value": true
}
},
{
"uppercase": {
"field": "name"
}
}
]
}
PUT index
{
"settings": {
"index.default_pipeline": "mypipeline",
"number_of_shards": 1,
"number_of_replicas": 0
}
}
PUT index/_doc/1
{
"name": "Clark"
}
POST index/_update/1
{
"doc": {
"name" : "Bruce",
"last_name" : "Wayne"
}
}
GET index/_doc/1
Final o/p without this change:
{
"_index": "index",
"_id": "1",
"_version": 2,
"_seq_no": 1,
"_primary_term": 1,
"found": true,
"_source": {
"graduated": true,
"name": "BRUCE", <-- ingest pipeline applied to change the field's value to upper case
"last_name": "Wayne",
"grad_year": 2023
}
}
Final o/p with this change:
{
"_index": "index",
"_id": "1",
"_version": 2,
"_seq_no": 1,
"_primary_term": 1,
"found": true,
"_source": {
"graduated": true,
"name": "Bruce", <-- see the original input case being persisted due to no ingest pipeline triggered now
"grad_year": 2023,
"last_name": "Wayne"
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, i dont think we can have breaking change as users might be using this as a feature to process using ingest pipeline.
|
This PR is stalled because it has been open for 30 days with no activity. |
Refactor TransportUpdateAction to use TransportShardBulkAction directly
Eliminates unnecessary network hop through client.bulk() when already on the
shard node. Improves performance for updates and retries by executing bulk
operations directly at the shard level.
Fixes #15264