Skip to content

fix: ensure most_recent_cursor_value is deserialized #436

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 21, 2025

Conversation

maxi297
Copy link
Contributor

@maxi297 maxi297 commented Mar 20, 2025

Following issue on source-salesforce: https://airbytehq-team.slack.com/archives/C08JX410FJM

The issue was that we re-use the input dictionary for the state but most_recent_cursor_value was not converted to a datetime which mean that when we were trying to serialize it, it would fail because of a typing issue (this field was still a string and not a datetime and hence would cause 'str' object has no attribute 'year' error).

In order to fix that, we just have to properly deserialize most_recent_cursor_value. This wasn't an issue for low-code connectors because they have sequential state. source-salesforce has a partitioned state which is different.

Summary by CodeRabbit

  • Refactor
    • Enhanced state handling to accurately process and convert recent record data during state updates.
  • Tests
    • Expanded test coverage to ensure consistent initialization and serialization of state, improving overall reliability.

@maxi297 maxi297 requested review from bazarnov and tolik0 March 20, 2025 20:26
@maxi297 maxi297 changed the title Ensure most_recent_cursor_value is deserialized fix: ensure most_recent_cursor_value is deserialized Mar 20, 2025
@maxi297
Copy link
Contributor Author

maxi297 commented Mar 20, 2025

/autofix

Auto-Fix Job Info

This job attempts to auto-fix any linting or formating issues. If any fixes are made,
those changes will be automatically committed and pushed back to the PR.

Note: This job can only be run by maintainers. On PRs from forks, this command requires
that the PR author has enabled the Allow edits from maintainers option.

PR auto-fix job started... Check job output.

✅ Changes applied successfully.

@github-actions github-actions bot added bug Something isn't working security labels Mar 20, 2025
Copy link
Contributor

coderabbitai bot commented Mar 20, 2025

📝 Walkthrough

Walkthrough

This pull request enhances the state deserialization process in the stream state converter by adding a conditional check in the deserialize method to detect and transform the MOST_RECENT_RECORD_KEY using the _from_state_message method. Additionally, the related unit tests in the cursor test file have been updated to use a deep copy of the default state and a new test has been added to verify that the cursor state serialization correctly reflects the most recent cursor value. No changes were made to the public API declarations.

Changes

File(s) Change Summary
airbyte_cdk/.../abstract_stream_state_converter.py Added a conditional branch in deserialize to check for MOST_RECENT_RECORD_KEY and process its value with _from_state_message.
unit_tests/.../test_cursor.py Replaced empty dictionary initializations with a deep copy of _NO_STATE in multiple methods; added a new test to verify proper serialization of the most recent cursor value.

Sequence Diagram(s)

sequenceDiagram
    participant C as Client
    participant ASC as AbstractStreamStateConverter
    participant F as _from_state_message

    C->>ASC: deserialize(state)
    ASC->>ASC: Check stream_slice for keys
    alt Contains MOST_RECENT_RECORD_KEY
         ASC->>F: Convert MOST_RECENT_RECORD_KEY value
         F-->>ASC: Return converted value
         ASC->>ASC: Update state with converted value
    else No MOST_RECENT_RECORD_KEY
         ASC->>ASC: Process other keys as usual
    end
    ASC-->>C: Return updated state
Loading

Possibly related PRs

Suggested labels

enhancement

Suggested reviewers

  • aaronsteers (wdyt?)
  • brianjlai (wdyt?)

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fc7c98c and 0b2620d.

📒 Files selected for processing (2)
  • airbyte_cdk/sources/streams/concurrent/state_converters/abstract_stream_state_converter.py (1 hunks)
  • unit_tests/sources/streams/concurrent/test_cursor.py (5 hunks)
🧰 Additional context used
🧬 Code Definitions (2)
airbyte_cdk/sources/streams/concurrent/state_converters/abstract_stream_state_converter.py (2)
airbyte_cdk/sources/streams/concurrent/state_converters/incrementing_count_stream_state_converter.py (1) (1)
  • _from_state_message (15-16)
airbyte_cdk/sources/streams/concurrent/state_converters/datetime_stream_state_converter.py (1) (1)
  • _from_state_message (21-22)
unit_tests/sources/streams/concurrent/test_cursor.py (5)
airbyte_cdk/sources/streams/concurrent/cursor.py (4) (4)
  • ConcurrentCursor (127-481)
  • close_partition (63-67)
  • close_partition (110-111)
  • close_partition (218-226)
airbyte_cdk/sources/streams/concurrent/state_converters/abstract_stream_state_converter.py (1) (1)
  • ConcurrencyCompatibleStateType (13-15)
airbyte_cdk/sources/streams/concurrent/state_converters/datetime_stream_state_converter.py (2) (2)
  • EpochValueConcurrentStreamStateConverter (114-142)
  • get_end_provider (36-37)
airbyte_cdk/sources/types.py (3) (3)
  • StreamSlice (66-160)
  • partition (90-95)
  • cursor_slice (98-103)
airbyte_cdk/sources/connector_state_manager.py (1) (1)
  • update_state_for_stream (69-79)
⏰ Context from checks skipped due to timeout of 90000ms (8)
  • GitHub Check: Check: 'source-pokeapi' (skip=false)
  • GitHub Check: Check: 'source-amplitude' (skip=false)
  • GitHub Check: Check: 'source-shopify' (skip=false)
  • GitHub Check: Check: 'source-hardcoded-records' (skip=false)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Pytest (Fast)
  • GitHub Check: SDM Docker Image Build
🔇 Additional comments (4)
airbyte_cdk/sources/streams/concurrent/state_converters/abstract_stream_state_converter.py (1)

74-77: Great addition to ensure proper deserialization of most_recent_cursor_value!

This change adds the missing deserialization logic for the MOST_RECENT_RECORD_KEY which nicely complements the existing serialization logic (lines 92-95). The code follows the same pattern used for deserializing START_KEY and END_KEY, ensuring consistency in the state handling approach.

This directly addresses the issue mentioned in the PR objectives where most_recent_cursor_value wasn't being converted to a datetime format, which was causing serialization errors when trying to access the year attribute on a string.

unit_tests/sources/streams/concurrent/test_cursor.py (3)

4-4: Good addition of deepcopy import to protect against state dictionary mutation.

This import enables the creation of independent copies of the state dictionaries, which prevents potential side effects from dictionary mutations during testing.


87-87: Smart use of deepcopy to avoid state sharing between tests.

Using deepcopy(_NO_STATE) instead of directly passing _NO_STATE prevents shared state between test cases that could lead to unexpected test interactions or failures, especially if _NO_STATE ever gets modified.

Also applies to: 102-102, 268-268


953-1006: Excellent new test case that verifies the state serialization fix!

This test properly verifies that when a cursor is initialized with a state containing most_recent_cursor_value (line 966), and a partition is closed, the cursor correctly preserves this value in the state when emitting it (line 999).

The test directly validates the fix implemented in the abstract_stream_state_converter.py file, ensuring that the deserialization and subsequent serialization of the most_recent_cursor_value work correctly.

✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

‼️ IMPORTANT
Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.

  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@maxi297 maxi297 merged commit 837913f into main Mar 21, 2025
26 checks passed
@maxi297 maxi297 deleted the maxi297/fix-most-recent-cursor-value-parsing-issue branch March 21, 2025 12:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working security
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants