Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose NDC doc #2915

Merged
merged 10 commits into from
Dec 19, 2019
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
310 changes: 310 additions & 0 deletions docs/design/2290-cadence-ndc.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,310 @@
# Design doc: Cadence NDC

Last updated: Dec 2019

Reference: [#2290](https://github.com/uber/cadence/issues/2290)


## Abstract

Cadence Cross DC is a feature which asynchronously replicates workflows from active data center to other passive data centers, for reconstruction. When necessary, customer can failover to any of the data centers which has the backup for high availability.
wxing1292 marked this conversation as resolved.
Show resolved Hide resolved

Cadence Cross DC is an AP (in terms of CAP).

This doc explains the high level concepts about Cadence Cross DC (mainly the N data center version).


## Newly Introduced Components

1. Version
2. Version history
3. Workflow history conflict resolution
4. Zombie workflows
5. Workflow task processing

### Version

Version is a newly introduced concept in Cadence Cross DC which describes the chronological order of events (per customer domain).
wxing1292 marked this conversation as resolved.
Show resolved Hide resolved

Cadence Cross DC is AP, all domain change events & workflow history events are replicated asynchronously for high throughput. This means that data across data centers are not strongly consistent. To guarantee that domain data & workflow data will achieve eventual consistency (expecially when there is data conflict during a failover), version is introduced and attached to customers' domains. All workflow history events generated in a domain will also come with the version in that domain.
wxing1292 marked this conversation as resolved.
Show resolved Hide resolved

All participating data centers are pre-configured with a unique initial version, and a shared version increment:

`initial version < shared version increment`

When performing failover for one domain from one data center to another data center, the version attched to the domain will be changed by the following rule:
wxing1292 marked this conversation as resolved.
Show resolved Hide resolved

for all versions which follow `version % (shared version increment) == (active data centers' initial version)`
find the smallest version which has `version >= old version in domain`

When there is a data conflict, comparison will be made and workflow history events with the highest version will win.

When a data center is trying to mutate a workflow, version will be checked. A data center can mutate a workflow if and only if

1. version in the domain belongs to this data center, i.e.
`(vesion in domain) % (shared version increment) == (this data centers' initial version)`
2. the version of this workflow's last event is not less then version in domain, i.e.
`(last event's version) <= (version in domain)`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

>=

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if 1 is satisfied, what is the case that can lead to last event's version > version in domain?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all replication tasks are handled asynchronously, meaning that domain replication (change of version) can experience delay.



### Version History

Version history is a newly introduced concept which provides high level summary about version information of workflow history.

Whenever there is a new workflow history event generated, the version from domain will be attached. Workflow mutable state will keep track of all hisory events & corresponding version.
wxing1292 marked this conversation as resolved.
Show resolved Hide resolved

Example, version history without data conflict:

T = 0: adding event with event ID == 1 & version == 1
```
| -------- | ------------- | --------------- | ------- |
| Events | Version History |
| -------- | ------------- | --------------- | ------- |
| Event ID | Event Version | Event ID | Version |
| -------- | ------------- | --------------- | ------- |
| 1 | 1 | 1 | 1 |
| -------- | ------------- | --------------- | ------- |
```

T = 1: adding event with event ID == 2 & version == 1
```
| -------- | ------------- | --------------- | ------- |
| Events | Version History |
| -------- | ------------- | --------------- | ------- |
| Event ID | Event Version | Event ID | Version |
| -------- | ------------- | --------------- | ------- |
| 1 | 1 | 2 | 1 |
| 2 | 1 | | |
| -------- | ------------- | --------------- | ------- |
```

T = 2: adding event with event ID == 3 & version == 1
```
| -------- | ------------- | --------------- | ------- |
| Events | Version History |
| -------- | ------------- | --------------- | ------- |
| Event ID | Event Version | Event ID | Version |
| -------- | ------------- | --------------- | ------- |
| 1 | 1 | 3 | 1 |
| 2 | 1 | | |
| 3 | 1 | | |
| -------- | ------------- | --------------- | ------- |
```

T = 3: domain failover triggered, domain version is now 2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it matter whether this is active or passive? Or are they the same (eventually)?
Maybe worth calling out explicitly.

adding event with event ID == 4 & version == 2
```
| -------- | ------------- | --------------- | ------- |
| Events | Version History |
| -------- | ------------- | --------------- | ------- |
| Event ID | Event Version | Event ID | Version |
| -------- | ------------- | --------------- | ------- |
| 1 | 1 | 3 | 1 |
| 2 | 1 | 4 | 2 |
| 3 | 1 | | |
| 4 | 2 | | |
| -------- | ------------- | --------------- | ------- |
```

T = 4: adding event with event ID == 5 & version == 2
```
| -------- | ------------- | --------------- | ------- |
| Events | Version History |
| -------- | ------------- | --------------- | ------- |
| Event ID | Event Version | Event ID | Version |
| -------- | ------------- | --------------- | ------- |
| 1 | 1 | 3 | 1 |
| 2 | 1 | 5 | 2 |
| 3 | 1 | | |
| 4 | 2 | | |
wxing1292 marked this conversation as resolved.
Show resolved Hide resolved
| -------- | ------------- | --------------- | ------- |
```

Since Cadence is AP, during failover (change of active data center of a domain), there exist case that more than one data center can modify a workflow, causing divergence of workflow history. Below shows how version history will look like under such condition.

Example, version history with data conflict:

Below will show version history of the same workflow in 2 different data centers.

T = 0: existing version history in data center A & B
```
| -------- | ------------- | --------------- | ------- |
| Events | Version History |
| -------- | ------------- | --------------- | ------- |
| Event ID | Event Version | Event ID | Version |
| -------- | ------------- | --------------- | ------- |
| 1 | 1 | 2 | 1 |
wxing1292 marked this conversation as resolved.
Show resolved Hide resolved
| 2 | 1 | 3 | 2 |
| 3 | 2 | | |
| -------- | ------------- | --------------- | ------- |
```

T = 1: adding event with event ID == 4 & version == 2 in data center A
```
| -------- | ------------- | --------------- | ------- |
| Events | Version History |
| -------- | ------------- | --------------- | ------- |
| Event ID | Event Version | Event ID | Version |
| -------- | ------------- | --------------- | ------- |
| 1 | 1 | 2 | 1 |
| 2 | 1 | 4 | 2 |
| 3 | 2 | | |
| 4 | 2 | | |
| -------- | ------------- | --------------- | ------- |
```

T = 1: domain failover triggered, adding event with event ID == 4 & version == 3 in data center B
```
| -------- | ------------- | --------------- | ------- |
| Events | Version History |
| -------- | ------------- | --------------- | ------- |
| Event ID | Event Version | Event ID | Version |
| -------- | ------------- | --------------- | ------- |
| 1 | 1 | 2 | 1 |
| 2 | 1 | 3 | 2 |
| 3 | 2 | 4 | 3 |
| 4 | 3 | | |
| -------- | ------------- | --------------- | ------- |
```

T = 2: replication task from data center B arrives in data center A
Note: below are a tree structures
```
| -------- | ------------- |
| Events |
| -------- | ------------- |
| Event ID | Event Version |
| -------- | ------------- |
| 1 | 1 |
| 2 | 1 |
| 3 | 2 |
| -------- | ------------- |
|
| ------------- | ------------ |
| |
| -------- | ------------- | | -------- | ------------- |
| Event ID | Event Version | | Event ID | Event Version |
| -------- | ------------- | | -------- | ------------- |
| 4 | 2 | | 4 | 3 |
| -------- | ------------- | | -------- | ------------- |

| --------------- | ------- |
| Version History |
| --------------- | ------- |
| Event ID | Version |
| --------------- | ------- |
| 2 | 1 |
| --------------- | ------- |
|
| ------- | ------------------- |
| |
| --------------- | ------- | | --------------- | ------- |
| Event ID | Version | | Event ID | Version |
| --------------- | ------- | | --------------- | ------- |
| 4 | 2 | | 3 | 2 |
| --------------- | ------- | | 4 | 3 |
| --------------- | ------- |
```

T = 2: replication task from data center A arrives in data center B, same as above


### Workflow History Conflict Resolution

When a workflow encounters divergence of workflow history, proper conflict resolution should be applied.

In Cadence NDC, workflow history events are modeled as a tree, as shown in second example in [### Version History].

Workflows which encounters divergence will have more than one history branches. Among all history branches, the history branch with the highest version is considered as the `current branch` and workflow mutable state is a summary of the current branch. Whenever there is a switch between workflow history branches, a complete rebuild of workflow mutable state will occur.


### Zombie Workflows

There is an existing contract that for any domain & workflow ID combination, there can be at most one run (domain & workflow ID & run ID) open / running.

Cadence NDC aims to keep the workflow state as up-to-date as possible among all participating data centers.

Due to the nature of Cadence NDC, i.e. workflow history events are replicated asynchronously, different run (same domain & workflow ID) can arrive at target data center at different time, sometimes out of order, as shown below:

```
| ------------- | | ------------- | | ------------- |
| Data Center A | | Network Layer | | Data Center B |
| ------------- | | ------------- | | ------------- |
| | |
| Run 1 Replication Events | |
| -----------------------> | |
| | |
| Run 2 Replication Events | |
| -----------------------> | |
| | |
| | |
| | |
| | Run 2 Replication Events |
| | -----------------------> |
| | |
| | Run 1 Replication Events |
| | -----------------------> |
| | |
| ------------- | | ------------- | | ------------- |
| Data Center A | | Network Layer | | Data Center B |
| ------------- | | ------------- | | ------------- |
```

Since run 2 appears in data center B first, run 1 cannot be replicated as runnable due to rule `at most one run open` (see above), thus, `zombie` workflow state is introduced. Zombie state indicates a workflow which cannot be actively mutated by a data center (assuming corresponding domain is active in this data center). A zombie workflow can only be changed by replication task.

Run 1 will be replicated similar as run 2, except run 1's workflow state will be zombie before run 1 reaches completion.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, where do we keep this state? In DB?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all persisted to database.

a workflow can potentially turn from a zombie workflow to current running workflow (in this data center), if say there is a data center which only has run 1 (no run 2) & the run 1 is running with a higher version.



### Workflow Task Processing

In the context of Cadence NDC, workflow mutable state is an entity which tracks all pending tasks. Prior to the introduction of Cadence NDC, workflow history events are from a signle branch, and Cadence server will only append new events to workflow history.

After the introduction of Cadence NDC, it is possible that a workflow can have multiple workflow history branches. Tasks generated according to one history branch maybe invalidated by history branch switching during conflict resolution.

Example:

T = 0: task A is generated according to event ID: 4, version: 2
```
| -------- | ------------- |
| Events |
| -------- | ------------- |
| Event ID | Event Version |
| -------- | ------------- |
| 1 | 1 |
| 2 | 1 |
| 3 | 2 |
| -------- | ------------- |
|
|
| -------- | ------------- |
| Event ID | Event Version |
| -------- | ------------- |
| 4 | 2 | <-- task A belongs to this event
| -------- | ------------- |
```

T = 1: conflict resolution happens, workflow mutable state is rebuilt and history event ID: 4, version: 3 is written down to persistence
```
| -------- | ------------- |
| Events |
| -------- | ------------- |
| Event ID | Event Version |
| -------- | ------------- |
| 1 | 1 |
| 2 | 1 |
| 3 | 2 |
| -------- | ------------- |
|
| ------------- | -------------------------------------------- |
| |
| -------- | ------------- | | -------- | ------------- |
| Event ID | Event Version | | Event ID | Event Version |
| -------- | ------------- | | -------- | ------------- |
| 4 | 2 | <-- task A belongs to this event | 4 | 3 | <-- current branch / mutable state
| -------- | ------------- | | -------- | ------------- |
```

T = 2: task A is loaded.

At this time, due to the rebuilt of workflow mutable state (conflict resolution), task A is no longer relevant (task A's corresponding event belongs to non-current branch). Task processing logic will verify both the event ID and version of the task against corresponding workflow mutable state, then discard task A.