Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(airbyte-cdk): Add Global Parent State Cursor #39593

Merged
merged 33 commits into from
Sep 6, 2024
Merged
Changes from 1 commit
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
60d8c79
Add GlobalParentCursor
tolik0 Jun 19, 2024
9467198
Move `global_parent_cursor` to incremental sync
tolik0 Jun 19, 2024
2a5d5e5
Move last slice flag to StreamSlice
tolik0 Jun 19, 2024
f6fbd1d
Fix format
tolik0 Jun 19, 2024
d72b817
Fix docs
tolik0 Jun 19, 2024
af78dda
Add Slack changes
tolik0 Jul 9, 2024
f4f5b78
Small fix for Slack state migration
tolik0 Jul 9, 2024
72fc51f
Add Jira changes
tolik0 Jul 11, 2024
4f4461b
Add local filtering for Global Parent cursor
tolik0 Jul 11, 2024
2ff6f0e
Fix formatting
tolik0 Jul 12, 2024
8b3cf0b
Fix description
tolik0 Jul 12, 2024
bcd0645
Fix warnings
tolik0 Jul 16, 2024
5f15d19
Rename class and update the docs
tolik0 Jul 31, 2024
1d90e13
Fix mypy errors
tolik0 Jul 31, 2024
de255ce
Update docs
tolik0 Aug 1, 2024
85fe4cc
Add unit tests
tolik0 Aug 1, 2024
dd42be7
Delete connector changes
tolik0 Aug 1, 2024
b536e2c
Fix format
tolik0 Aug 1, 2024
23ea1ce
Delete Slack changes
tolik0 Aug 1, 2024
596b9c2
Add docs and fix small errors
tolik0 Aug 2, 2024
9f2d882
Update docs
tolik0 Aug 6, 2024
cdc0d6d
Add lookback window
tolik0 Aug 14, 2024
c2491ec
Update the docs with the lookback window
tolik0 Aug 14, 2024
6883e9b
Update incremental sync docs
tolik0 Aug 15, 2024
a2b210d
Update field description
tolik0 Aug 15, 2024
4cde7f0
Update class doc for GlobalSubstreamCursor
tolik0 Aug 15, 2024
1b60b3e
Refactor for concurrent CDK compatibility
tolik0 Aug 19, 2024
70bc219
Update docstring for stream_slices
tolik0 Aug 19, 2024
eed4423
Add comment with sequence for stream slices
tolik0 Aug 20, 2024
933af1e
Delete wrong change
tolik0 Aug 20, 2024
e4d5172
Fix Timer
tolik0 Sep 6, 2024
015114c
Fix tests
tolik0 Sep 6, 2024
95cd310
Fix formatting
tolik0 Sep 6, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Update incremental sync docs
  • Loading branch information
tolik0 committed Sep 6, 2024
commit 6883e9ba0d112b52cc2c40918f8c49119d72f0ae
Original file line number Diff line number Diff line change
Expand Up @@ -209,8 +209,9 @@ The default state format is **per partition**, but there are options to enhance
```

#### Global Substream Cursor
- **Description**: This option uses a single global cursor for all partitions, significantly reducing the state size. The child state is updated only at the end of the sync, so progress depends on the parent stream state when using the incremental dependency option.
- **Description**: This option uses a single global cursor for all partitions, significantly reducing the state size. It enforces a minimal lookback window for substream based on the duration of the previous sync to avoid losing records. This lookback ensures that any records added or updated during the sync are captured in subsequent syncs.
- **When to Use**: Use this option if the number of partitions in the parent stream is significantly higher than the 10,000 partition limit (e.g., millions of records per sync). This prevents the inefficiency of reading most partitions in full refresh and avoids duplicates during the next sync.
- **Operational Detail**: The global cursor's value is updated only at the end of the sync. If the sync fails, only the parent state is updated if the incremental dependency is enabled.
- **Example State**:
```json
[
Expand All @@ -221,7 +222,7 @@ The default state format is **per partition**, but there are options to enhance
### Summary
- **Per Partition**: Default, use for manageable partitions (<10k).
- **Incremental Dependency**: Use for incremental parent streams with a dependent child cursor. Ensure API updates parent cursor with child records.
- **Global Substream Cursor**: Use for large-scale parent streams with many partitions.
- **Global Substream Cursor**: Ideal for large-scale parent streams with many partitions to optimize performance.

Choose the option that best fits your data structure and sync requirements to optimize performance and data integrity.

Expand Down