Skip to content

Replicated cache deletion behavior when the 3rd node joins the cluster. #234

@rezigned

Description

@rezigned

Problem

Let's say we have 3 nodes A, B and C with the following sequence of events:

  1. A starts and write a cache Cache.put(:a, 1)
  2. B joins the cluster. It can still see the cache Cache.all() #=> [:a]
  3. C joins the cluster. All cache are gone (Cache.all() #=> []) in all nodes.

From what I understand (after reading through the code).

The current design of generational cache has 2 generations at most. e.g. [new, old]. Each time when a new node starts (or joins the cluster) it will always create a new generation first before copying the data (as seen in the code below)

with :ok <- maybe_run_on_nodes(adapter_meta, nodes, :new_generation),
:ok <- copy_entries_from_nodes(adapter_meta, nodes),
:ok <- maybe_run_on_nodes(adapter_meta, [node()], :new_generation) do
maybe_run_on_nodes(adapter_meta, nodes, :reset_generation_timer)

This seems to cause the issue I mentioned above. When A first started, its generational cache become

[a1] # a1 is holding the cache data `:a => 1`

When B joins, A's cache become

[a2, a1] # a2 is the new one, a1 is the old

Now, when C joins, all caches are gone.

[a3, a2] # a1 is gone, including the data `:a => 1` (technically, it's in `deprecated` table)

Solution/Suggestion?

I've modified the code above to run copy_entries_from_nodes first whenever a new node joins the cluster and it seemed to fix the issue. Is this the correct way to fix this?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions