Skip to content

Was swap-remove behavior ever considered when removing entries? #503

Open
@matthieu-m

Description

@matthieu-m

Motivation

One of the scenarios were Swiss Tables tend NOT to fare too well in are scenarios involving deletions.

A Swiss Table in which entries are only inserted -- a fairly common usecase, for example dumping all the HTTP parameters or headers of a an incoming request -- will have contiguous entries in each group of 16. This leads to good cache line density, and helps in reducing cache misses.

On the other hand, once an entry is deleted, a hole is left in the run of entries of the group it was deleted from -- unless it was last, of course -- and in the presence of a good hash, further insertions into the table will likely end up in different groups for a while.

This leads to tables in which a significant number of deletions have been performed looking like... Swiss Cheese. Appropriate, I suppose, but not cache-friendly.

Backward Shifting Deletion

Prior to hashbrown, the standard hashmap in Rust used Robin Hood with Backward Shifting Deletion: upon deletion, any entry immediately following the deleted entry (recursively) which had been displaced due to the now deleted entry, was shifted back. This helped avoiding tombstones in the probe sequence, and led to good cache density.

The cost of shifting all those entries back is however none too good. Deletion is not really O(1) any longer when an unspecified number of elements may need to be shifted back.

So I would not recommend backward shifting deletion...

Swap Remove

But what about swap_remove?

The idea would be that if the entry deleted from a group is not the last entry of the group, then that last entry is swapped in the now available spot. Just like Vec::swap_remove this is O(1), while preserving the contiguity of the entries in the group.

This is a slight additional cost for deletion, in exchange for better cache density for the entire period of time following the deletion up until a new element is inserted back that would have plugged the hole.

Is it worth it?

I don't know! I've never heard of it being used.

Before I try to implement the behavior, however, I thought I'd ask if:

  1. I misread the code, and it's actually already implemented.
  2. This idea was already considered, and discarded for some reason.
  3. Whether it seems worthy of consideration, or I'm heading into a hopeless direction...

I am quite keen on hearing folks thoughts on the matter.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions