Skip to content

Conversation

@atris
Copy link
Contributor

@atris atris commented Aug 8, 2025

Refactor TransportUpdateAction to use TransportShardBulkAction directly

Eliminates unnecessary network hop through client.bulk() when already on the
shard node. Improves performance for updates and retries by executing bulk
operations directly at the shard level.

Fixes #15264

atris added 2 commits August 8, 2025 22:32
  Eliminates unnecessary network hop through client.bulk() when already on the
  shard node. Improves performance for updates and retries by executing bulk
  operations directly at the shard level.

  Fixes opensearch-project#15264

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>
Signed-off-by: Atri Sharma <atri.jiit@gmail.com>
@atris atris requested a review from a team as a code owner August 8, 2025 18:49
@github-actions github-actions bot added bug Something isn't working Indexing Indexing, Bulk Indexing and anything related to indexing labels Aug 8, 2025
@atris
Copy link
Contributor Author

atris commented Aug 8, 2025

@mch2 @shwetathareja

@github-actions
Copy link
Contributor

github-actions bot commented Aug 8, 2025

❌ Gradle check result for 766222c: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>
@github-actions
Copy link
Contributor

github-actions bot commented Aug 8, 2025

❌ Gradle check result for 52518e9: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

❌ Gradle check result for 52518e9: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

❌ Gradle check result for 52518e9: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@shwetathareja
Copy link
Member

Tagging @mgodwan also to help with the review.

@atris atris closed this Aug 11, 2025
@atris atris reopened this Aug 11, 2025
@github-actions
Copy link
Contributor

❌ Gradle check result for 52518e9: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@atris atris closed this Aug 11, 2025
@atris atris reopened this Aug 11, 2025
@github-actions
Copy link
Contributor

❌ Gradle check result for 52518e9: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@atris atris closed this Aug 11, 2025
@atris atris reopened this Aug 11, 2025
@github-actions
Copy link
Contributor

❌ Gradle check result for 52518e9: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@atris
Copy link
Contributor Author

atris commented Aug 13, 2025

Gentle ping, please

+ "] has a default ingest pipeline or a final ingest pipeline, the support of the ingest pipelines for update operation causes unexpected result and will be removed in 3.0.0"
);
}
client.bulk(toSingleItemBulkRequest(indexRequest), wrapBulkResponse(ActionListener.<IndexResponse>wrap(response -> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given we use bulk today, ingest pipeline are getting applied when users rely on _update API for the documents which are generated here as part of the indexing request. With this change, we will not be able to leverage the same as we are not going through the TransportBulkAction anymore.

This may end up to be a breaking change for users if we directly delegate to bulk shard action and not apply the operations within the ingest pipeline.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That said, I don't think _bulk API applies ingest pipeline on update operations either, so this change may end up unifying the behavior for both bulk/update APIs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given this is a breaking change in terms of expectations for users, I don't think it would be ok to do this change, unless we enable support for ingest pipeline here. @shwetathareja thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we enable ingest here, would it not defeat the optimisation that this change looks to do?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Steps to see the change in final doc:

PUT _ingest/pipeline/mypipeline
{
    "description": "This pipeline processes student data",
    "processors": [
        {
            "set": {
                "description": "Sets the graduation year to 2023",
                "field": "grad_year",
                "value": 2023
            }
        },
        {
            "set": {
                "description": "Sets graduated to true",
                "field": "graduated",
                "value": true
            }
        },
        {
            "uppercase": {
                "field": "name"
            }
        }
    ]
}
PUT index
{
    "settings": {
        "index.default_pipeline": "mypipeline",
        "number_of_shards": 1,
        "number_of_replicas": 0
    }
}
PUT index/_doc/1
{
    "name": "Clark"
}
POST index/_update/1
{
  "doc": {
    "name" : "Bruce",
    "last_name" : "Wayne"
  }
}
GET index/_doc/1

Final o/p without this change:
{
    "_index": "index",
    "_id": "1",
    "_version": 2,
    "_seq_no": 1,
    "_primary_term": 1,
    "found": true,
    "_source": {
        "graduated": true,
        "name": "BRUCE", <-- ingest pipeline applied to change the field's value to upper case
        "last_name": "Wayne",
        "grad_year": 2023
    }
}
Final o/p with this change:
{
    "_index": "index",
    "_id": "1",
    "_version": 2,
    "_seq_no": 1,
    "_primary_term": 1,
    "found": true,
    "_source": {
        "graduated": true,
        "name": "Bruce",  <-- see the original input case being persisted due to no ingest pipeline triggered now
        "grad_year": 2023,
        "last_name": "Wayne"
    }
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, i dont think we can have breaking change as users might be using this as a feature to process using ingest pipeline.

@opensearch-trigger-bot
Copy link
Contributor

This PR is stalled because it has been open for 30 days with no activity.

@opensearch-trigger-bot opensearch-trigger-bot bot added the stalled Issues that have stalled label Sep 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working Indexing Indexing, Bulk Indexing and anything related to indexing stalled Issues that have stalled

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Refactor TransportUpdateAction implementation to use TransportShardBulkAction instead of invoking client.bulk()

3 participants