Skip to content

DOC-5348 RDI: info about at-least-once delivery and checkpointing #1724

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 25 additions & 2 deletions content/integrate/redis-data-integration/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,8 @@ in sequence:
currently uses an open source collector called
[Debezium](https://debezium.io/) for this step.

1. The collector records the captured changes using Redis streams
1. The collector records the captured changes using
[Redis streams]({{< relref "/develop/data-types/streams" >}})
in the RDI database.

1. A *stream processor* reads data from the streams and applies
Expand All @@ -68,6 +69,28 @@ RDI automatically enters a second phase called *change streaming*, where
changes in the data are captured as they happen. Changes are usually
added to the target within a few seconds after capture.

## At-least-once delivery guarantee

RDI guarantees *at-least-once delivery* to the target. This means that
a given change will never be lost, but it might be added to the target
more than once. Apart from a slight performance overhead, adding a
change multiple times is harmless because the multiple writes
are [*idempotent*](https://en.wikipedia.org/wiki/Idempotence) (that is
to say that all writes after the first one make no change to the
overall state).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd change this sentence to:

"Apart from a slight performance overhead, adding a change multiple times is harmless because all writes after the first one make no change to the overall state."


## Checkpointing

RDI uses Redis streams to store the sequence of change events
captured from the source. The events are then retrieved in order
from the streams, processed, and written to the target. The stream
processor uses a *checkpoint* mechanism to keep track of the last
event in the sequence that it has successfully processed and stored. If the processor fails
for any reason, it can restart from the last checkpoint and
re-process any events that might not have been written to the target.
This ensures that all changes are eventually recorded, even in the
face of failures.

## Backpressure mechanism

Sometimes, data records can get added to the streams faster than RDI can
Expand All @@ -85,7 +108,7 @@ an error, just an informative message to note that RDI has applied
the backpressure mechanism.
{{</note>}}

### Supported sources
## Supported sources

RDI supports the following database sources using [Debezium Server](https://debezium.io/documentation/reference/stable/operations/debezium-server.html) connectors:

Expand Down