Skip to content

esutil bulk indexer multiple workers do not partition by id #170

Open
@xiankaing

Description

@xiankaing

If I make 100 updates to the same id,
e.g. for i in 1..100; addToBulkIndexer{update doc 2 with some_field: i},
it's not guaranteed that after this, the some_field is 100. In fact, it's usually not.
I suspect this is because the workers in bulk indexer will flush independently, and do not get assigned a range/bucket of ids.

Is it possible to have a mode where the workers will be assigned ids based on some hash function we pass in on init?
e.g. func(id string) { atoi(id) % 42 } (obviously something more sophisticated than this in real code)

my thinking:

add

type BulkIndexerConfig struct {
  AssignWorker func(id string) int // returns some integer which will be modded by NumWorkers
}

and instead of having one bi.queue each worker w would have its own channel--a bi.queues[workerNum]
and bi.add would do

workerNum := addToBulkIndexer(item.DocumentID)
bi.queues[workerNum] <- item

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions