Open
Description
If I make 100 updates to the same id,
e.g. for i in 1..100; addToBulkIndexer{update doc 2 with some_field: i}
,
it's not guaranteed that after this, the some_field is 100. In fact, it's usually not.
I suspect this is because the workers in bulk indexer will flush independently, and do not get assigned a range/bucket of ids.
Is it possible to have a mode where the workers will be assigned ids based on some hash function we pass in on init?
e.g. func(id string) { atoi(id) % 42 }
(obviously something more sophisticated than this in real code)
my thinking:
add
type BulkIndexerConfig struct {
AssignWorker func(id string) int // returns some integer which will be modded by NumWorkers
}
and instead of having one bi.queue
each worker w would have its own channel--a bi.queues[workerNum]
and bi.add would do
workerNum := addToBulkIndexer(item.DocumentID)
bi.queues[workerNum] <- item