We want to distribute an incoming key value object between a set of workers/nodes, that will process it. This way each objected can be stored, accessed, and managed individually.
- The Mapping Problem: Where Does the Data Go?
- The Dynamism Problem: How Do You Handle Change/Failure?
- The Distribution Problem : How Do we distribute work?
The initial approach consisted in hashing the key of the object, creating an hopefully unique idx, this way we cloud perform idx % no_nodes, and the object will get redirected always to the same worker.
That's ok, but if we have some kind of failure in one node, the objects keys needed to be recalculated since we are heavily depedent on the number of nodes.
So this won't work, even if we recalculate the keys, data from object-1 that was previous in node-x will need to get moved to node-y, since all the nodes will get afected by the failure of one node.
The Goal is to have a ring with a fixed number of slots, this way we can solve the dynamism problem.
When there is a change in some server/node, if we were distributing, based on the no_nodes, the keys needed to be recalculated as discussed before.
In this approach the core information remains on the Ring Hashing slots, so when a server fails its information, in this case lists will get mapped to the next node slot.
For example:
We have a ring with 10 slots, we have 3 nodes, say for example node-1, node-2 and node-3. They get assigned to slot 3, 6 and 9, by hashing they id's.
If node-1 fails all the tasks from node-1 will get shifted to node-3. That partly solve the Dynamism Problem.
The mapping Problem is also solved by clock wise distribution, the work load of one node is the idx of that node until the next idx in the ring. Say for example, we have 20 slots and we got nodes on slot 0,10,20. If we got a hashing hit on slot 7, the slot that will handle that will be the slot 0, that goes from [0,10[, node-2 will handle [10,20[, and node-3 will handle [20].
Just by elaborating the previous section some straight forward problems came up.
- How do we handle the distribution of the servers gracefully, we can have a server with a lot of overhead vs a server with no work, in the last example was server 20.
- When we have a server failure, all of the work of that server gets transfered to the next one, adding overhead to that server.
This get better, when adding virtual nodes to ring. They are called virtual since it's not a real worker that is on that slot, its just to better distribute ocupation within the ring and prevent not efficient work load distribution.
So each physical node will get some virtual nodes assigned, to it. Lets say worker1-v1, worker1-v2, ...
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.Partitions.html https://jacobantony.substack.com/p/data-partitioning-in-dynamo