Skip to content

Commit

Permalink
fix a mismanagement of the hash table that could lead to data loss
Browse files Browse the repository at this point in the history
This commit fixes a data loss bug in our hopscotch table implementation.
Removing values from the table can result in other values becoming
disconnected and lost.

Let A, B, and C be values that all hash to cell 0.
Assume the hopscotch distance factor H = 2.

   0     1     2
+-----+-----+-----+
|     |     |     |
+-----+-----+-----+

After adding A
   0     1     2
+-----+-----+-----+
|  A  |     |     |
+-----+-----+-----+
   |
   +-Neighbors =

After adding B
   0     1     2
+-----+-----+-----+
|  A  |  B  |     |
+-----+-----+-----+
   |
   +-Neighbors = 1

After adding C
   0     1     2
+-----+-----+-----+
|  A  |  B  |  C  |
+-----+-----+-----+
   |
   +-Neighbors = 1, 2

If we then remove B,
   0     1     2
+-----+-----+-----+
|  A  | [X] |  C  |
+-----+-----+-----+
   |
   +-Neighbors = 1, 2

* It is replaced with a placeholder [X].
* A's neighbor table is not updated to reflect the loss.

If we then remove A,
   0     1     2
+-----+-----+-----+
| [X] | [X] | [C] |
+-----+-----+-----+
   |
   +-Neighbors = 2

* The table is rebalanced to promote A's lowest neighbor to the primary
  cell position.
* C from cell 2 remains cell 0's neighbor.

The bug manifests if [X] the placeholder value passes the null check set
out in MAP_TABLE_VALUE_NULL; that is, the placeholder is "effectively
null".

Looking up the key that matches C will first evaluate its base cell, the
one that collided with the key in the first place. Since that is
placeholder [X], and [X] is "effectively null", the lookup stops.

C is never retrieved from the hash table.

---
The expedient solution to this bug is to update cell 0's neighbors when
B is first removed, effectively skipping the hole:

If we remove B as above,
   0     1     2
+-----+-----+-----+
|  A  | [X] |  C  |
+-----+-----+-----+
   |
   +-Neighbors = 2 <<< HERE

but clear the neighbor bit for cell 1, the promotion that happens when A
is later removed places C in cell 0.

   0     1     2
+-----+-----+-----+
|  C  | [X] | [X] |
+-----+-----+-----+
   |
   +-Neighbors =
  • Loading branch information
DHowett committed Jan 22, 2018
1 parent f503a3d commit 038543f
Showing 1 changed file with 16 additions and 0 deletions.
16 changes: 16 additions & 0 deletions hash_table.h
Original file line number Diff line number Diff line change
Expand Up @@ -392,6 +392,22 @@ static void PREFIX(_remove)(PREFIX(_table) *table, void *key)
MAP_LOCK();
PREFIX(_table_cell) cell = PREFIX(_table_get_cell)(table, key);
if (NULL == cell) { return; }

uint32_t hash = MAP_TABLE_HASH_KEY(key);
PREFIX(_table_cell) baseCell = PREFIX(_table_lookup)(table, hash);
if (baseCell && baseCell <= cell && cell - baseCell <= 32)
{
uint32_t jump = 1 << (cell - baseCell - 1);
if ((baseCell->secondMaps & jump))
{
// If we are removing a cell stored adjacent to its base due to hash
// collision, we have to clear the base cell's neighbor bit.
// Otherwise, a later remove can move the new placeholder value to the head
// which will cause further chained lookups to fail.
baseCell->secondMaps &= ~jump;
}
}

// If the cell contains a value, set it to the placeholder and shuffle up
// everything
if (0 == cell->secondMaps)
Expand Down

0 comments on commit 038543f

Please sign in to comment.