Description
Soft deletes consume a lot of CPU when flushing docvalue updates or calculating the numsToDelete in SoftDeleteRetentionMergePolicy. I was looking for some way to speed up these operations. The new DocIdSetIterator#intoBitset interface seems to provide a good approach, which is as follows:
Another optimization I'm looking for is to expose the fact that soft deleted fields always use a single value, so that we can avoid having to go through the calculations to calculate the min/max/gcd (idea initially raised in #12557). My current idea of API designing is pretty simple, but I'm not sure if it's good.
public abstract class NumericDocValues extends DocValuesIterator {
/**
* If the impl knows all docs have the same value, return the value, otherwise null.
*/
public Long singleValue() {
return null;
}
...
}
Description
Soft deletes consume a lot of CPU when flushing docvalue updates or calculating the
numsToDeleteinSoftDeleteRetentionMergePolicy. I was looking for some way to speed up these operations. The newDocIdSetIterator#intoBitsetinterface seems to provide a good approach, which is as follows:Another optimization I'm looking for is to expose the fact that soft deleted fields always use a single value, so that we can avoid having to go through the calculations to calculate the min/max/gcd (idea initially raised in #12557). My current idea of API designing is pretty simple, but I'm not sure if it's good.