Skip to content

Possible Bulk Update Optimizations #26802

Closed
@imotov

Description

@imotov

We had several reports of slow bulk updates in 5.0 and above. Some of them are summarized in #23792. The main issue that led to significant slow down seems to the refresh that is now performed before each update operation. Basically, in order to perform a bulk requests with 1000 updates on the same record we will have to perform refresh 1000 times. This problem can be avoided if we could combine all update operations on the same record within a shard batch into a single update operation, this way we could get away with performing refresh only once.

For example, if we have the following bulk request:

update rec A
update rec A
update rec B
update rec A

it currently translates into

refresh // possibly if A was updated in the previous bulk and wasn't refreshed yet
get A
update A
index A

refresh // always
get A
update A
index A

// no refresh - because B wasn't modified in this bulk yet
get B
update B
index B

refresh // always
get A
update A
index A

If we combine all updates together we could transform it

refresh
get A
update A
update A
update A
index A
get B
update B
index B

We would still have an issue if we have a mix of index or delete operations with update operations on the same record in the same bulk request, but, perhaps we can either fall back to the current slow way of doing things or optimize them only as much as we can, for example, we can ignore all updates before index operation and all updates and index operation before delete, etc.

Metadata

Metadata

Assignees

No one assigned

    Labels

    :Distributed Indexing/CRUDA catch all label for issues around indexing, updating and getting a doc by id. Not search.discuss

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions