Open
Description
Description
Soft deletes consume a lot of CPU when flushing docvalue updates or calculating the numsToDelete
in SoftDeleteRetentionMergePolicy
. I was looking for some way to speed up these operations. The new DocIdSetIterator#intoBitset
interface seems to provide a good approach, which is as follows:
- Impl intoBitset for IndexedDISI and Docvalues #14529.
- Speed up flush of softdelete by intoBitset #14552
- speed up numDeletesToMerge of SoftDeletesRetentionMergePolicy #14531.
Another optimization I'm looking for is to expose the fact that soft deleted fields always use a single value, so that we can avoid having to go through the calculations to calculate the min/max/gcd (idea initially raised in #12557). My current idea of API designing is pretty simple, but I'm not sure if it's good.
public abstract class NumericDocValues extends DocValuesIterator {
/**
* If the impl knows all docs have the same value, return the value, otherwise null.
*/
public Long singleValue() {
return null;
}
...
}