Skip to content

fix(connector-builder): fix property chunking in connector builder #567

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Jun 2, 2025

Conversation

aldogonzalez8
Copy link
Contributor

@aldogonzalez8 aldogonzalez8 commented May 26, 2025

WHAT

SimpleRetrieverTestReadDecorator when self._emit_connector_builder_message is True is missing additional_query_properties param, causing:

QueryProperties component is defined but stream_partition does not contain query_properties. Please contact Airbyte Support

image

Relates to https://github.com/airbytehq/airbyte-internal-issues/issues/12775

HOW

We have replaced Retriever decorator by stream_slicer decorator in order to not have to maintain properties in two retrievers.

Summary by CodeRabbit

  • Tests
    • Added new tests to validate property chunking in declarative source connectors, ensuring correct handling of chunked HTTP requests and responses.
    • Introduced sample response files for testing currency exchange rate data with property chunking.
    • Updated tests to reflect the new stream slicer decorator usage for limiting slices during test reads.
    • Added tests verifying the behavior and type recognition of the new stream slicer decorator.
  • Bug Fixes
    • Improved argument handling for internal components to ensure correct parameter passing during test read operations.
  • Refactor
    • Simplified test read mode by removing slice-limiting retriever decorators and adopting a streamlined stream slicer decorator, centralizing slice limiting logic.
    • Enhanced type checking for stream slicers to recognize decorated instances transparently.
  • New Features
    • Exposed a new stream slicer decorator to limit the number of slices during test reads, enhancing control over test data volume and debugging.

@aldogonzalez8 aldogonzalez8 self-assigned this May 26, 2025
@github-actions github-actions bot added the bug Something isn't working label May 26, 2025
@aldogonzalez8 aldogonzalez8 changed the title fix(connector-builder): property chinking in connector builder fix(connector-builder): fix property chunking in connector builder May 26, 2025
Copy link
Contributor

coderabbitai bot commented May 26, 2025

📝 Walkthrough

Walkthrough

This change refactors argument passing in the Airbyte connector builder's declarative source and retriever instantiation for clarity by using explicit keyword arguments. It removes the SimpleRetrieverTestReadDecorator in favor of a new StreamSlicerTestReadDecorator to limit stream slices during test reads. Additionally, it adds a new test validating property chunking with supporting mock HTTP response files. It also introduces a metaclass for StreamSlicer to support isinstance checks on wrapped slicers.

Changes

File(s) Change Summary
airbyte_cdk/sources/declarative/manifest_declarative_source.py Changed ModelToComponentFactory instantiation to use explicit keyword argument emit_connector_builder_messages.
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py Refactored retriever creation to remove SimpleRetrieverTestReadDecorator; introduced _should_limit_slices_fetched() and wrap slicer with StreamSlicerTestReadDecorator when limiting slices. Added _get_log_formatter helper.
airbyte_cdk/sources/declarative/retrievers/simple_retriever.py Removed SimpleRetrieverTestReadDecorator class entirely.
airbyte_cdk/sources/declarative/retrievers/init.py Removed import and export of SimpleRetrieverTestReadDecorator.
airbyte_cdk/sources/declarative/stream_slicers/init.py Added import and export of new StreamSlicerTestReadDecorator.
airbyte_cdk/sources/declarative/stream_slicers/stream_slicer_test_read_decorator.py Added new StreamSlicerTestReadDecorator class to limit the number of stream slices returned by a wrapped slicer.
airbyte_cdk/sources/streams/concurrent/partitions/stream_slicer.py Added StreamSlicerMeta metaclass to support isinstance checks on wrapped slicers; updated StreamSlicer to use this metaclass.
unit_tests/connector_builder/test_property_chunking.py Added new test validating property chunking with chunked HTTP requests and incremental sync using mocked responses.
unit_tests/resource/http/response/declarative/property_chunking/rates_one_two.json Added mock HTTP response file for chunk one/two in property chunking test.
unit_tests/resource/http/response/declarative/property_chunking/rates_three_four.json Added mock HTTP response file for chunk three/four in property chunking test.
unit_tests/sources/declarative/retrievers/test_simple_retriever.py Removed SimpleRetrieverTestReadDecorator usage and tests; replaced with StreamSlicerTestReadDecorator wrapping the stream slicer. Removed test for log message emission.
unit_tests/connector_builder/test_connector_builder_handler.py Removed assertions for SimpleRetrieverTestReadDecorator instances; replaced with assertions for SimpleRetriever.
unit_tests/sources/declarative/parsers/test_model_to_component_factory.py Updated tests to expect StreamSlicerTestReadDecorator wrapping stream slicers instead of direct slicers; replaced SimpleRetrieverTestReadDecorator with SimpleRetriever.
unit_tests/sources/declarative/stream_slicers/test_stream_slicer_read_decorator.py Added new tests for StreamSlicerTestReadDecorator verifying instance recognition and wrapping behavior.

Sequence Diagram(s)

sequenceDiagram
    participant TestRunner
    participant ConnectorBuilder
    participant HTTPMock
    participant ManifestDeclarativeSource
    participant ModelToComponentFactory
    participant StreamSlicerTestReadDecorator
    participant SimpleRetriever

    TestRunner->>ConnectorBuilder: handle_connector_builder_request(test_read, manifest, catalog, state)
    ConnectorBuilder->>ManifestDeclarativeSource: Instantiate with manifest
    ManifestDeclarativeSource->>ModelToComponentFactory: Instantiate with emit_connector_builder_messages (keyword arg)
    ModelToComponentFactory->>StreamSlicerTestReadDecorator: Wrap stream slicer if limiting slices
    ModelToComponentFactory->>SimpleRetriever: create_simple_retriever with wrapped slicer
    SimpleRetriever->>HTTPMock: Send chunked HTTP requests (rates_one_two.json, rates_three_four.json)
    HTTPMock-->>SimpleRetriever: Return mocked JSON responses
    SimpleRetriever-->>ConnectorBuilder: Return chunked records
    ConnectorBuilder-->>TestRunner: Return test read output
Loading

Possibly related PRs

  • airbytehq/airbyte-python-cdk#452: Enhances ModelToComponentFactory with new low-code features including QueryProperties and property chunking, related to factory instantiation and usage.
  • airbytehq/airbyte-python-cdk#201: Improves nested field handling in RequestOption and modifies ModelToComponentFactory methods, related to request options and factory usage.
  • airbytehq/airbyte-python-cdk#534: Adds log_formatter to create_simple_retriever and uses SimpleRetrieverTestReadDecorator; related to retriever decoration and logging behavior.

Suggested reviewers

  • maxi297
  • brianjlai

By the way, would you be interested in adding tests for edge cases in property chunking, like empty property lists or chunk sizes larger than the list, to further strengthen coverage? Wdyt?


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8aba62e and fa5e35f.

📒 Files selected for processing (1)
  • unit_tests/sources/declarative/stream_slicers/test_stream_slicer_read_decorator.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • unit_tests/sources/declarative/stream_slicers/test_stream_slicer_read_decorator.py
⏰ Context from checks skipped due to timeout of 90000ms (9)
  • GitHub Check: Check: 'source-pokeapi' (skip=false)
  • GitHub Check: Check: 'source-amplitude' (skip=false)
  • GitHub Check: Check: 'source-shopify' (skip=false)
  • GitHub Check: Check: 'source-hardcoded-records' (skip=false)
  • GitHub Check: Pytest (Fast)
  • GitHub Check: SDM Docker Image Build
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Analyze (python)
✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

‼️ IMPORTANT
Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.

  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
unit_tests/resource/http/response/declarative/property_chunking/rates_three_four.json (1)

1-10: Consider adding a trailing newline at end-of-file for consistency with other fixtures. wdyt?

unit_tests/resource/http/response/declarative/property_chunking/rates_one_two.json (1)

1-10: Could we include an extra currency entry to simulate multi-currency scenarios, or is it preferable to keep the test payload minimal? wdyt?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 155cdc8 and 925dc88.

📒 Files selected for processing (5)
  • airbyte_cdk/sources/declarative/manifest_declarative_source.py (1 hunks)
  • airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1 hunks)
  • unit_tests/connector_builder/test_property_chunking.py (1 hunks)
  • unit_tests/resource/http/response/declarative/property_chunking/rates_one_two.json (1 hunks)
  • unit_tests/resource/http/response/declarative/property_chunking/rates_three_four.json (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (9)
  • GitHub Check: Check: 'source-hardcoded-records' (skip=false)
  • GitHub Check: Check: 'source-pokeapi' (skip=false)
  • GitHub Check: Check: 'source-amplitude' (skip=false)
  • GitHub Check: Check: 'source-shopify' (skip=false)
  • GitHub Check: Pytest (Fast)
  • GitHub Check: SDM Docker Image Build
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Analyze (python)
🔇 Additional comments (7)
unit_tests/resource/http/response/declarative/property_chunking/rates_three_four.json (1)

1-10: Approve addition of JSON mock for property chunking. This fixture correctly mirrors the structure of rates_one_two.json for the second chunk and provides all expected fields for the test.

unit_tests/resource/http/response/declarative/property_chunking/rates_one_two.json (1)

1-10: Approve addition of JSON mock for property chunking. The payload follows the established schema and sets up the first chunk response accurately.

airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)

3315-3315: LGTM! Great fix for the missing parameter issue!

This change perfectly addresses the issue described in the PR objectives. The additional_query_properties=query_properties parameter was indeed missing from the SimpleRetrieverTestReadDecorator constructor, while it was already being passed to the regular SimpleRetriever constructor at line 3330. This ensures consistent behavior between both constructors and should resolve the "QueryProperties component is defined but stream_partition does not contain query_properties" error when self._emit_connector_builder_messages is True.

The fix is targeted, minimal, and maintains the existing logic flow. Nice work on identifying and fixing this inconsistency!

airbyte_cdk/sources/declarative/manifest_declarative_source.py (1)

131-131: LGTM! Nice improvement to parameter clarity.

This change to explicitly pass emit_connector_builder_messages as a keyword argument is a good practice. It makes the code more readable and ensures the parameter is properly passed to the ModelToComponentFactory constructor, which should help resolve the property chunking issue mentioned in the PR. The explicit naming prevents potential parameter ordering issues, wdyt?

unit_tests/connector_builder/test_property_chunking.py (3)

34-69: Well-structured manifest configuration for property chunking testing.

The manifest setup looks comprehensive and correctly defines property chunking with a limit of 2 items per chunk. The QueryProperties configuration with PropertyChunking should properly test the feature that was fixed in this PR. Nice work on defining a realistic test scenario, wdyt?


197-243: Excellent end-to-end test coverage for property chunking.

This test effectively validates the property chunking functionality by:

  • Freezing time for consistent test execution
  • Mocking the expected chunked HTTP requests with URL-encoded property lists
  • Verifying exactly 2 pages are generated (matching the 2 chunks)
  • Ensuring no error logs are present

The assertions properly validate that the property chunking works as expected and doesn't generate the error mentioned in the PR objectives. The URL encoding check for comma-separated values (%2C) shows good attention to detail, wdyt?


208-210: Good practice using the flag that was fixed in the PR.

Setting emit_connector_builder_messages=True here directly tests the scenario that was causing the original issue. This ensures the fix for the missing additional_query_properties parameter is properly validated, wdyt?

Copy link
Contributor

@maxi297 maxi297 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one comment that if you think is worth, maybe we could tackle now. I'm still approving because this fix improves things

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
airbyte_cdk/sources/declarative/retrievers/simple_retriever.py (2)

654-654: Consider clarifying the docstring wording.

The current wording suggests maximum_number_of_slices is required when test read is enabled, but the implementation actually makes it optional (defaulting to 0). Maybe update to something like "maximum_number_of_slices should be provided when slice limiting is desired in test read mode" - wdyt?


660-680: Clean implementation of conditional test read setup!

The logic nicely separates concerns - log formatter setup only happens in test read mode, and validation is appropriately conditional. One small optimization you might consider: the validation could check self.maximum_number_of_slices <= 0 instead of < 1 for slightly clearer intent, but both work perfectly fine. wdyt?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 925dc88 and d0e7c7d.

📒 Files selected for processing (3)
  • airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (3 hunks)
  • airbyte_cdk/sources/declarative/retrievers/simple_retriever.py (1 hunks)
  • unit_tests/sources/declarative/retrievers/test_simple_retriever.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
🧰 Additional context used
🧬 Code Graph Analysis (1)
airbyte_cdk/sources/declarative/retrievers/simple_retriever.py (5)
airbyte_cdk/sources/http_logger.py (1)
  • format_http_message (12-51)
airbyte_cdk/sources/declarative/declarative_stream.py (3)
  • name (93-97)
  • name (100-102)
  • stream_slices (172-187)
airbyte_cdk/sources/streams/core.py (2)
  • name (139-143)
  • stream_slices (403-418)
airbyte_cdk/sources/declarative/retrievers/retriever.py (1)
  • stream_slices (33-34)
airbyte_cdk/sources/types.py (1)
  • StreamSlice (75-169)
⏰ Context from checks skipped due to timeout of 90000ms (9)
  • GitHub Check: Check: 'source-pokeapi' (skip=false)
  • GitHub Check: Check: 'source-amplitude' (skip=false)
  • GitHub Check: Check: 'source-shopify' (skip=false)
  • GitHub Check: Check: 'source-hardcoded-records' (skip=false)
  • GitHub Check: Pytest (Fast)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: SDM Docker Image Build
  • GitHub Check: Analyze (python)
🔇 Additional comments (5)
unit_tests/sources/declarative/retrievers/test_simple_retriever.py (1)

918-918: LGTM! Test parameter aligns with the new decorator behavior.

This addition correctly configures the test to enable test read mode, which is now required for the log formatter to be set up properly. Setting maximum_number_of_slices=1 is perfect for this test case since it's focused on log emission rather than slice processing.

airbyte_cdk/sources/declarative/retrievers/simple_retriever.py (4)

657-658: Verify the default value change impact.

Changing the default maximum_number_of_slices from 5 to 0 effectively disables test read mode by default, which seems like the right approach for making the behavior more explicit. However, this could be a breaking change if existing code relied on the previous default behavior. Have you confirmed this won't affect existing implementations that didn't explicitly set this parameter?

The new emit_connector_builder_messages attribute nicely provides an alternative way to enable test read mode specifically for connector builder scenarios.


681-686: Excellent abstraction for test read mode detection!

This method cleanly encapsulates the logic and makes the code much more readable. The boolean conversion with bool() handles both conditions elegantly. The docstring clearly explains the purpose too.


690-693: Perfect conditional slice limiting logic!

The implementation correctly handles both scenarios - full slice iteration when test read is disabled, and limited slices when enabled. Using islice is the right approach here, and delegating to the parent when test read is disabled maintains clean separation of concerns.


657-693: Verify connection to PR objectives about query properties.

The changes look solid for improving test read mode control, but I noticed the PR objectives mention fixing a missing additional_query_properties parameter when emit_connector_builder_message is True. While I can see the new emit_connector_builder_messages attribute, I don't see the direct fix for the query properties issue in this file. Could you clarify how these changes connect to resolving that specific error message? Maybe the fix is in another file not shown in this review?

@aldogonzalez8 aldogonzalez8 requested a review from maxi297 May 27, 2025 19:12
…ecorator for scenarios wehere we want to limit the number of requests that are made to the backend source.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
unit_tests/sources/declarative/retrievers/test_simple_retriever.py (1)

705-715: Nice architectural improvement - decorator moved to the correct abstraction level.

The test correctly adapts to the new pattern where the StreamSlicerTestReadDecorator wraps the stream slicer instead of the entire retriever. This is a cleaner approach since slice limiting is naturally a slicer concern rather than a retriever concern. The test functionality remains the same while the architecture is improved.

One quick verification - would it be worth adding a comment in the test to clarify that we're now limiting slices at the slicer level rather than retriever level? This could help future developers understand the architectural decision, wdyt?

airbyte_cdk/sources/declarative/stream_slicers/stream_slicer_test_read_decorator.py (2)

14-22: Clean decorator pattern implementation!

The class structure looks great - proper inheritance from StreamSlicer and good use of @dataclass. The default of 5 slices seems reasonable for test scenarios.

Could we enhance the docstring to mention this is specifically for test reads? Something like "This decorator is used during test reads to prevent excessive API calls by limiting the number of stream slices processed." - wdyt?


27-77: Consistent delegation pattern implemented well!

All four delegate methods follow the same pattern correctly - keyword-only arguments, proper type hints, and clean delegation. The explicit implementation makes the interface clear.

I notice there's some repetition across these methods. Have you considered if there's a clean way to reduce the duplication while maintaining readability? Maybe a helper method or using __getattr__ more broadly? - wdyt? Though the current explicit approach is definitely clear and maintainable.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d0e7c7d and 335981e.

📒 Files selected for processing (8)
  • airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (5 hunks)
  • airbyte_cdk/sources/declarative/retrievers/__init__.py (0 hunks)
  • airbyte_cdk/sources/declarative/retrievers/simple_retriever.py (0 hunks)
  • airbyte_cdk/sources/declarative/stream_slicers/__init__.py (1 hunks)
  • airbyte_cdk/sources/declarative/stream_slicers/stream_slicer_test_read_decorator.py (1 hunks)
  • unit_tests/connector_builder/test_connector_builder_handler.py (2 hunks)
  • unit_tests/sources/declarative/parsers/test_model_to_component_factory.py (6 hunks)
  • unit_tests/sources/declarative/retrievers/test_simple_retriever.py (2 hunks)
💤 Files with no reviewable changes (2)
  • airbyte_cdk/sources/declarative/retrievers/init.py
  • airbyte_cdk/sources/declarative/retrievers/simple_retriever.py
✅ Files skipped from review due to trivial changes (1)
  • airbyte_cdk/sources/declarative/stream_slicers/init.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
🧰 Additional context used
🧬 Code Graph Analysis (3)
unit_tests/connector_builder/test_connector_builder_handler.py (2)
airbyte_cdk/sources/declarative/retrievers/simple_retriever.py (1)
  • SimpleRetriever (54-625)
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (1)
  • SimpleRetriever (2700-2754)
unit_tests/sources/declarative/retrievers/test_simple_retriever.py (2)
airbyte_cdk/sources/declarative/retrievers/simple_retriever.py (5)
  • SimpleRetriever (54-625)
  • name (107-115)
  • name (118-120)
  • primary_key (291-293)
  • primary_key (296-298)
airbyte_cdk/sources/declarative/stream_slicers/stream_slicer_test_read_decorator.py (1)
  • StreamSlicerTestReadDecorator (15-81)
airbyte_cdk/sources/declarative/stream_slicers/stream_slicer_test_read_decorator.py (3)
airbyte_cdk/sources/types.py (1)
  • StreamSlice (75-169)
airbyte_cdk/sources/declarative/retrievers/simple_retriever.py (1)
  • stream_slices (584-593)
airbyte_cdk/sources/declarative/async_job/job_orchestrator.py (1)
  • stream_slice (88-89)
⏰ Context from checks skipped due to timeout of 90000ms (9)
  • GitHub Check: Check: 'source-shopify' (skip=false)
  • GitHub Check: Check: 'source-pokeapi' (skip=false)
  • GitHub Check: Check: 'source-amplitude' (skip=false)
  • GitHub Check: Check: 'source-hardcoded-records' (skip=false)
  • GitHub Check: Pytest (Fast)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: SDM Docker Image Build
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Analyze (python)
🔇 Additional comments (12)
unit_tests/connector_builder/test_connector_builder_handler.py (2)

1114-1114: LGTM! Test assertion correctly updated for the new architecture.

This change properly reflects the refactoring where SimpleRetrieverTestReadDecorator was removed in favor of StreamSlicerTestReadDecorator. The assertion now correctly checks for the underlying SimpleRetriever type, which makes sense since test limiting functionality has moved to the stream slicer level. Wdyt?


1160-1160: Consistent and correct assertion update.

This mirrors the previous change and maintains consistency across both test functions that verify retriever types. The assertion correctly checks for SimpleRetriever now that the test decorator approach has been refactored. Nice consistency!

unit_tests/sources/declarative/retrievers/test_simple_retriever.py (1)

43-44: LGTM! Clean refactoring of the import statement.

The import change reflects the architectural improvement where slice limiting functionality has been moved from the retriever level to the stream slicer level, which makes more sense from a separation of concerns perspective.

unit_tests/sources/declarative/parsers/test_model_to_component_factory.py (6)

154-159: Import updates look good for the decorator refactor

The import changes correctly reflect the architectural shift from SimpleRetrieverTestReadDecorator to StreamSlicerTestReadDecorator. This aligns well with moving the decoration logic from the retriever level to the stream slicer level.


715-722: Test setup for request/response mocking looks appropriate

The addition of PreparedRequest and Response objects with proper setup (headers, URL, status code) provides good test isolation for the logging functionality. The mock objects have the essential attributes needed for the test.


734-739: Assertions correctly verify the new architecture

The test now properly checks that:

  1. The retriever is still a SimpleRetriever (not decorated)
  2. The log_formatter is properly set
  3. The formatter behavior matches the factory's implementation

This is consistent with the architectural change where decoration happens at the stream slicer level. Nice work on maintaining the logging functionality verification!


3089-3091: Stream slicer assertion updated correctly for datetime-based cursor

The assertion now correctly checks that the stream_slicer is wrapped with StreamSlicerTestReadDecorator and contains the expected DatetimeBasedCursor. This reflects the architectural change where test decoration happens at the slicer level rather than the retriever level.


3178-3180: Consistent assertion pattern for partition cursor

The test follows the same pattern as the datetime-based cursor test, verifying that the PerPartitionCursor is properly wrapped with StreamSlicerTestReadDecorator. The consistency across different cursor types is good to see!


3220-3222: Default behavior assertion updated appropriately

Even for the default case with SinglePartitionRouter, the test correctly verifies the new decorator pattern. This ensures that the test decoration is applied consistently across all stream slicer types, which is exactly what we'd want.

airbyte_cdk/sources/declarative/stream_slicers/stream_slicer_test_read_decorator.py (3)

1-12: LGTM on imports and setup!

The imports are well-organized and all necessary for the decorator implementation. Good use of islice from itertools for the slice limiting functionality.


24-25: Perfect use of islice for slice limiting!

This is exactly the right approach - using islice keeps the implementation lazy and memory-efficient. The core functionality is clean and focused.


79-81: Excellent fallback delegation!

The __getattr__ method provides perfect transparency for the decorator - any future methods or attributes added to the StreamSlicer interface will automatically work through this decorator. Great forward-compatibility thinking!

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🔭 Outside diff range comments (1)
unit_tests/sources/declarative/stream_slicers/test_stream_slicer_read_decorator.py (1)

46-71: ⚠️ Potential issue

Good test for metaclass behavior, but there are a couple of issues to address.

The test correctly validates that the metaclass enables isinstance checks for wrapped slicers. However:

  1. Line 48 issue: mocked_partition_router().stream_slices.return_value = [...] creates a mock but doesn't assign it to anything. This should probably be assigned to a variable that's then used in the GlobalSubstreamCursor constructor.

  2. Missing functionality test: While the isinstance checks are important, it would be valuable to also test that the StreamSlicerTestReadDecorator actually limits the number of slices as intended. Currently we only test the type checking behavior, wdyt?

Here's a potential fix for the partition router issue:

-    mocked_partition_router().stream_slices.return_value = [
+    partition_router = mocked_partition_router()
+    partition_router.stream_slices.return_value = [
         StreamSlice(
             partition=first_partition, cursor_slice={}, extra_fields={"extra_field": "extra_value"}
         ),
     ]
 
-    global_cursor = GlobalSubstreamCursor(cursor, mocked_partition_router)
+    global_cursor = GlobalSubstreamCursor(cursor, partition_router)
🧹 Nitpick comments (1)
airbyte_cdk/sources/streams/concurrent/partitions/stream_slicer.py (1)

9-25: Nice metaclass implementation! This elegantly solves the isinstance check problem for wrapped slicers.

The implementation handles the decorator pattern transparently and falls back gracefully to standard behavior. The docstring clearly explains the motivation and use case.

One question - have you considered what happens if there's a chain of wrapped slicers (e.g., a decorator wrapping another decorator)? The recursive isinstance(instance.wrapped_slicer, cls) should handle this correctly, but it might be worth adding a test case for that scenario, wdyt?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ca07d1a and c1435f5.

📒 Files selected for processing (2)
  • airbyte_cdk/sources/streams/concurrent/partitions/stream_slicer.py (1 hunks)
  • unit_tests/sources/declarative/stream_slicers/test_stream_slicer_read_decorator.py (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
airbyte_cdk/sources/streams/concurrent/partitions/stream_slicer.py (1)
airbyte_cdk/sources/types.py (1)
  • StreamSlice (75-169)
⏰ Context from checks skipped due to timeout of 90000ms (9)
  • GitHub Check: Check: 'source-pokeapi' (skip=false)
  • GitHub Check: Check: 'source-amplitude' (skip=false)
  • GitHub Check: Check: 'source-hardcoded-records' (skip=false)
  • GitHub Check: Check: 'source-shopify' (skip=false)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: SDM Docker Image Build
  • GitHub Check: Pytest (Fast)
  • GitHub Check: Analyze (python)
🔇 Additional comments (5)
airbyte_cdk/sources/streams/concurrent/partitions/stream_slicer.py (2)

3-4: LGTM! Import additions support the metaclass implementation.

The new imports are necessary for the custom metaclass implementation and type annotations. Good practice to import exactly what's needed.


27-27: Perfect! Minimal change to enable the metaclass behavior.

Adding the metaclass to StreamSlicer enables the custom isinstance checks for all StreamSlicer instances and subclasses, which is exactly what's needed for the decorator pattern.

unit_tests/sources/declarative/stream_slicers/test_stream_slicer_read_decorator.py (3)

5-17: Comprehensive imports for testing the metaclass behavior.

The imports cover the key classes needed to test isinstance behavior with different stream slicer types and the new decorator. Good coverage of the relevant types.


22-40: Clean builder pattern for test mocks.

The MockedCursorBuilder provides a fluent interface for setting up test data, which makes the tests more readable. The mock setup correctly implements the expected DeclarativeCursor interface.


42-43: Simple helper function, but there's an issue with its usage in the test.

The function itself is fine, but looking at line 48 in the test, it seems like mocked_partition_router().stream_slices.return_value = ... doesn't actually assign the mock to anything. Should this be assigned to a variable first, or is the mock being configured but not used? wdyt?

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
unit_tests/sources/declarative/stream_slicers/test_stream_slicer_read_decorator.py (4)

116-116: Fix typo in function name - "aysnc" should be "async", wdyt?

There's a small typo in the function name that should be corrected for clarity.

-def test_isinstance_global_cursor_aysnc_job_partition_router():
+def test_isinstance_global_cursor_async_job_partition_router():

144-144: Fix typo in function name - "substrea" should be "substream", wdyt?

Another small typo in the function name that should be corrected.

-def test_isinstance_substrea_partition_router():
+def test_isinstance_substream_partition_router():

86-224: Consider adding tests for the core slice-limiting functionality, wdyt?

While these isinstance tests are valuable for verifying the metaclass behavior, it might be worth adding tests that verify the decorator actually limits the number of slices as intended. The current tests focus on type preservation but don't validate the primary purpose of StreamSlicerTestReadDecorator.

You could add a test like:

def test_slice_limiting_functionality():
    # Create a slicer that returns many slices
    mock_slicer = Mock(spec=StreamSlicer)
    mock_slicer.stream_slices.return_value = [
        StreamSlice(partition={f"key_{i}": f"value_{i}"}, cursor_slice={})
        for i in range(10)
    ]
    
    # Wrap with decorator limiting to 3 slices
    wrapped_slicer = StreamSlicerTestReadDecorator(
        wrapped_slicer=mock_slicer,
        maximum_number_of_slices=3,
    )
    
    # Verify only 3 slices are returned
    slices = list(wrapped_slicer.stream_slices())
    assert len(slices) == 3

145-159: Consider extracting common SubstreamPartitionRouter setup to reduce duplication, wdyt?

The setup for SubstreamPartitionRouter is duplicated between two tests. You could extract this into a helper function to improve maintainability.

def create_substream_partition_router():
    return SubstreamPartitionRouter(
        config={},
        parameters={},
        parent_stream_configs=[
            ParentStreamConfig(
                type="ParentStreamConfig",
                parent_key="id",
                partition_field="id",
                stream=DeclarativeStream(
                    type="DeclarativeStream",
                    retriever=CustomRetriever(type="CustomRetriever", class_name="a_class_name"),
                ),
            )
        ],
    )

Also applies to: 178-192

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ee498b1 and b052a89.

📒 Files selected for processing (3)
  • airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (5 hunks)
  • airbyte_cdk/sources/declarative/stream_slicers/stream_slicer_test_read_decorator.py (1 hunks)
  • unit_tests/sources/declarative/stream_slicers/test_stream_slicer_read_decorator.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • airbyte_cdk/sources/declarative/stream_slicers/stream_slicer_test_read_decorator.py
  • airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
🧰 Additional context used
🧬 Code Graph Analysis (1)
unit_tests/sources/declarative/stream_slicers/test_stream_slicer_read_decorator.py (10)
airbyte_cdk/sources/declarative/incremental/per_partition_cursor.py (1)
  • CursorFactory (19-24)
airbyte_cdk/sources/declarative/incremental/global_substream_cursor.py (1)
  • GlobalSubstreamCursor (71-357)
airbyte_cdk/sources/declarative/incremental/per_partition_with_global.py (2)
  • PerPartitionWithGlobalCursor (20-200)
  • _get_active_cursor (83-84)
airbyte_cdk/sources/declarative/incremental/declarative_cursor.py (1)
  • DeclarativeCursor (9-13)
airbyte_cdk/sources/types.py (4)
  • StreamSlice (75-169)
  • partition (99-104)
  • cursor_slice (107-112)
  • extra_fields (115-117)
airbyte_cdk/sources/declarative/interpolation/interpolated_string.py (1)
  • InterpolatedString (13-79)
airbyte_cdk/sources/declarative/partition_routers/async_job_partition_router.py (1)
  • AsyncJobPartitionRouter (20-65)
airbyte_cdk/sources/declarative/stream_slicers/stream_slicer_test_read_decorator.py (2)
  • StreamSlicerTestReadDecorator (14-28)
  • stream_slices (23-24)
airbyte_cdk/sources/message/repository.py (1)
  • NoopMessageRepository (63-71)
unit_tests/sources/declarative/async_job/test_integration.py (1)
  • MockAsyncJobRepository (39-54)
⏰ Context from checks skipped due to timeout of 90000ms (9)
  • GitHub Check: Check: 'source-amplitude' (skip=false)
  • GitHub Check: Check: 'source-hardcoded-records' (skip=false)
  • GitHub Check: Check: 'source-pokeapi' (skip=false)
  • GitHub Check: Check: 'source-shopify' (skip=false)
  • GitHub Check: Pytest (Fast)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: SDM Docker Image Build
  • GitHub Check: Analyze (python)
🔇 Additional comments (1)
unit_tests/sources/declarative/stream_slicers/test_stream_slicer_read_decorator.py (1)

1-224: LGTM! Comprehensive test coverage for isinstance behavior.

This test file provides excellent coverage for verifying that the StreamSlicerTestReadDecorator preserves type information through the metaclass implementation. The tests cover all the major slicer types and ensure that the decorator pattern works correctly with isinstance checks, which is crucial for the refactoring from SimpleRetrieverTestReadDecorator to StreamSlicerTestReadDecorator.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
unit_tests/sources/declarative/stream_slicers/test_stream_slicer_read_decorator.py (3)

137-137: Fix typo in function name, wdyt?

There's a typo in the function name - "aysnc" should be "async".

-def test_isinstance_global_cursor_aysnc_job_partition_router():
+def test_isinstance_global_cursor_async_job_partition_router():

165-165: Fix typo in function name, wdyt?

There's a typo in the function name - "substrea" should be "substream".

-def test_isinstance_substrea_partition_router():
+def test_isinstance_substream_partition_router():

184-217: Consider reducing reliance on private member access, wdyt?

While the test comprehensively validates PerPartitionWithGlobalCursor behavior, it heavily relies on accessing private members (_per_partition_cursor, _cursor_factory, _partition_router, _global_cursor, _stream_cursor). This makes the test brittle to internal implementation changes.

Could we verify the behavior through public APIs instead where possible? For example, testing the actual slice generation or cursor state methods rather than internal structure?

 # Instead of accessing private members, consider testing behavior:
-    assert wrapped_slicer._per_partition_cursor._cursor_factory == cursor_factory
-    assert wrapped_slicer._partition_router == partition_router
-    assert wrapped_slicer._global_cursor._stream_cursor == date_time_based_cursor
+    # Test actual behavior through public interface
+    # e.g., test slice generation, state management, etc.
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3372590 and 8aba62e.

📒 Files selected for processing (1)
  • unit_tests/sources/declarative/stream_slicers/test_stream_slicer_read_decorator.py (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (9)
  • GitHub Check: Check: 'source-pokeapi' (skip=false)
  • GitHub Check: Check: 'source-amplitude' (skip=false)
  • GitHub Check: Check: 'source-shopify' (skip=false)
  • GitHub Check: Check: 'source-hardcoded-records' (skip=false)
  • GitHub Check: Pytest (Fast)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: SDM Docker Image Build
  • GitHub Check: Analyze (python)
🔇 Additional comments (8)
unit_tests/sources/declarative/stream_slicers/test_stream_slicer_read_decorator.py (8)

48-66: Helper class looks well-designed, wdyt?

The MockedCursorBuilder follows a clean builder pattern and provides good separation of concerns for test setup. The mock configuration is clear and reusable across tests.


68-70: Simple and effective mock factory, wdyt?

Clean utility function for creating partition router mocks.


72-87: Good factory function for test data, wdyt?

The date_time_based_cursor_factory provides a well-configured DatetimeBasedCursor for testing purposes with reasonable defaults.


89-105: Comprehensive SubstreamPartitionRouter setup, wdyt?

The factory function creates a properly configured SubstreamPartitionRouter with all necessary components for testing.


107-135: Solid test coverage for GlobalSubstreamCursor type preservation, wdyt?

The test correctly verifies that the decorator preserves instanceof checks for GlobalSubstreamCursor while ensuring it doesn't incorrectly identify as other types.


138-163: Good test for AsyncJobPartitionRouter type preservation, wdyt?

The test logic correctly verifies that the decorator maintains isinstance behavior for AsyncJobPartitionRouter. The setup with JobTracker and AsyncJobOrchestrator looks appropriate.


166-182: Comprehensive test for SubstreamPartitionRouter, wdyt?

The test correctly validates type preservation for SubstreamPartitionRouter with proper negative assertions.


219-235: Excellent test for core functionality, wdyt?

This test effectively validates the primary purpose of the decorator - limiting the number of slices returned. The setup is clear and the assertion is straightforward.

@lmossman
Copy link
Contributor

lmossman commented May 30, 2025

@aldogonzalez8 any reason why we shouldn't merge this today / Monday morning?

@aldogonzalez8
Copy link
Contributor Author

@aldogonzalez8 any reason why we shouldn't merge this today / Monday morning?

No, this is good, we can merge early Monday Morning.

@aldogonzalez8 aldogonzalez8 merged commit 4343391 into main Jun 2, 2025
28 checks passed
@aldogonzalez8 aldogonzalez8 deleted the ac8/fix-property-chinking-in-connector-builder branch June 2, 2025 10:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants