Skip to content

Speed up soft delete #14521

Open
Open
@gf2121

Description

@gf2121

Description

Soft deletes consume a lot of CPU when flushing docvalue updates or calculating the numsToDelete in SoftDeleteRetentionMergePolicy. I was looking for some way to speed up these operations. The new DocIdSetIterator#intoBitset interface seems to provide a good approach, which is as follows:

Another optimization I'm looking for is to expose the fact that soft deleted fields always use a single value, so that we can avoid having to go through the calculations to calculate the min/max/gcd (idea initially raised in #12557). My current idea of API designing is pretty simple, but I'm not sure if it's good.

public abstract class NumericDocValues extends DocValuesIterator {

  /**
   * If the impl knows all docs have the same value, return the value, otherwise null. 
   */
  public Long singleValue() {
    return null;
  }

  ...

}

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions