Skip to content

Conversation

avelanarius
Copy link
Contributor

Changes the strategy the CDC library uses to generate windows to read the CDC log. Previously, when a library was started, it would read queryTimeWindowSize large windows starting from |now - ttl|. The issue with that approach is that it requires a large number of windows to catch up, for example: 24 hours / 30 seconds * 256 * 8 = 5898240 for a 8-shard Scylla server.

A new approach just creates a single large window from |now - ttl| to |now - confidence window|.

public static TaskState createInitialFor(GenerationId generation, Timestamp now,
long confidenceWindowSizeMs, long queryTimeWindowSizeMs) {
// Start reading at generation start:
Timestamp windowStart = generation.getGenerationStart();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the conclusion of your tests @avelanarius? Does it make sense to take max(generation.getGenerationStart(), now - CDC ttl)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it made sense. The test was the following: create a table with small TTL (3 seconds), insert 10^6 rows, then do two types of selects: select of small window and select of a large window. Both of those queries returned no rows (3 second TTL...), but a select of small window was faster. Sometimes, select of a large window even caused timeouts (even though no rows were returned). We hypothesized that this is due to replicas sending "tombstones" to coordinator.

// queryTimeWindowSizeMs large (the consumer might need to wait a bit
// for the window to be ready for reading).
Timestamp windowEnd = now.plus(-confidenceWindowSizeMs, ChronoUnit.MILLIS);
if (windowEnd.compareTo(windowStart) < 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this enough? Wouldn't we want to adjust windowEnd also when windowEnd - windowStart < queryTimeWindowSizeMs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, but this is only an initial window so it doesn't really matter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could add this at the cost of additional complexity.


return this;
// Trim the start of the window with minimumWindowStart.
Timestamp newWindowStart = windowStart;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why wasn't this needed before and now is? Or was it and it was a bug we weren't trimming the start?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously it only handled one case:

---- WINDOW ---     | TTL

->

                    ---- WINDOW ---

Now it also handles this case:

----      WINDOW           ---     
                    | TTL
->

                    - WINDOW -     
                    | TTL

Previously it didn't really matter to do the second type of trimming, because the windows were very small (default 30 seconds).

Copy link
Contributor

@haaawk haaawk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM in general

Changes the strategy the CDC library uses to generate windows to read
the CDC log. Previously, when a library was started, it would read
queryTimeWindowSize large windows starting from |now - ttl|. The issue
with that approach is that it requires a large number of windows to
catch up, for example: 24 hours / 30 seconds * 256 * 8 = 5898240 for
a 8-shard Scylla server.

A new approach just creates a single large window from |now - ttl| to
|now - confidence window|.
@avelanarius
Copy link
Contributor Author

v2: fixed flaky unit test. (off-by-one error)

@racevedoo
Copy link

Any news here? I'm pretty interested in this PR

@dkropachev dkropachev force-pushed the master branch 13 times, most recently from 3c4bf55 to 3318b32 Compare June 23, 2025 17:23
@dkropachev dkropachev force-pushed the master branch 8 times, most recently from 5031da9 to b5730b8 Compare August 2, 2025 13:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants