-
Notifications
You must be signed in to change notification settings - Fork 24
fix(connector-builder): fix property chunking in connector builder #567
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(connector-builder): fix property chunking in connector builder #567
Conversation
📝 WalkthroughWalkthroughThis change refactors argument passing in the Airbyte connector builder's declarative source and retriever instantiation for clarity by using explicit keyword arguments. It removes the Changes
Sequence Diagram(s)sequenceDiagram
participant TestRunner
participant ConnectorBuilder
participant HTTPMock
participant ManifestDeclarativeSource
participant ModelToComponentFactory
participant StreamSlicerTestReadDecorator
participant SimpleRetriever
TestRunner->>ConnectorBuilder: handle_connector_builder_request(test_read, manifest, catalog, state)
ConnectorBuilder->>ManifestDeclarativeSource: Instantiate with manifest
ManifestDeclarativeSource->>ModelToComponentFactory: Instantiate with emit_connector_builder_messages (keyword arg)
ModelToComponentFactory->>StreamSlicerTestReadDecorator: Wrap stream slicer if limiting slices
ModelToComponentFactory->>SimpleRetriever: create_simple_retriever with wrapped slicer
SimpleRetriever->>HTTPMock: Send chunked HTTP requests (rates_one_two.json, rates_three_four.json)
HTTPMock-->>SimpleRetriever: Return mocked JSON responses
SimpleRetriever-->>ConnectorBuilder: Return chunked records
ConnectorBuilder-->>TestRunner: Return test read output
Possibly related PRs
Suggested reviewers
By the way, would you be interested in adding tests for edge cases in property chunking, like empty property lists or chunk sizes larger than the list, to further strengthen coverage? Wdyt? 📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms (9)
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
unit_tests/resource/http/response/declarative/property_chunking/rates_three_four.json (1)
1-10
: Consider adding a trailing newline at end-of-file for consistency with other fixtures. wdyt?unit_tests/resource/http/response/declarative/property_chunking/rates_one_two.json (1)
1-10
: Could we include an extra currency entry to simulate multi-currency scenarios, or is it preferable to keep the test payload minimal? wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
airbyte_cdk/sources/declarative/manifest_declarative_source.py
(1 hunks)airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
(1 hunks)unit_tests/connector_builder/test_property_chunking.py
(1 hunks)unit_tests/resource/http/response/declarative/property_chunking/rates_one_two.json
(1 hunks)unit_tests/resource/http/response/declarative/property_chunking/rates_three_four.json
(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (9)
- GitHub Check: Check: 'source-hardcoded-records' (skip=false)
- GitHub Check: Check: 'source-pokeapi' (skip=false)
- GitHub Check: Check: 'source-amplitude' (skip=false)
- GitHub Check: Check: 'source-shopify' (skip=false)
- GitHub Check: Pytest (Fast)
- GitHub Check: SDM Docker Image Build
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Analyze (python)
🔇 Additional comments (7)
unit_tests/resource/http/response/declarative/property_chunking/rates_three_four.json (1)
1-10
: Approve addition of JSON mock for property chunking. This fixture correctly mirrors the structure ofrates_one_two.json
for the second chunk and provides all expected fields for the test.unit_tests/resource/http/response/declarative/property_chunking/rates_one_two.json (1)
1-10
: Approve addition of JSON mock for property chunking. The payload follows the established schema and sets up the first chunk response accurately.airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)
3315-3315
: LGTM! Great fix for the missing parameter issue!This change perfectly addresses the issue described in the PR objectives. The
additional_query_properties=query_properties
parameter was indeed missing from theSimpleRetrieverTestReadDecorator
constructor, while it was already being passed to the regularSimpleRetriever
constructor at line 3330. This ensures consistent behavior between both constructors and should resolve the "QueryProperties component is defined but stream_partition does not contain query_properties" error whenself._emit_connector_builder_messages
is True.The fix is targeted, minimal, and maintains the existing logic flow. Nice work on identifying and fixing this inconsistency!
airbyte_cdk/sources/declarative/manifest_declarative_source.py (1)
131-131
: LGTM! Nice improvement to parameter clarity.This change to explicitly pass
emit_connector_builder_messages
as a keyword argument is a good practice. It makes the code more readable and ensures the parameter is properly passed to theModelToComponentFactory
constructor, which should help resolve the property chunking issue mentioned in the PR. The explicit naming prevents potential parameter ordering issues, wdyt?unit_tests/connector_builder/test_property_chunking.py (3)
34-69
: Well-structured manifest configuration for property chunking testing.The manifest setup looks comprehensive and correctly defines property chunking with a limit of 2 items per chunk. The
QueryProperties
configuration withPropertyChunking
should properly test the feature that was fixed in this PR. Nice work on defining a realistic test scenario, wdyt?
197-243
: Excellent end-to-end test coverage for property chunking.This test effectively validates the property chunking functionality by:
- Freezing time for consistent test execution
- Mocking the expected chunked HTTP requests with URL-encoded property lists
- Verifying exactly 2 pages are generated (matching the 2 chunks)
- Ensuring no error logs are present
The assertions properly validate that the property chunking works as expected and doesn't generate the error mentioned in the PR objectives. The URL encoding check for comma-separated values (
%2C
) shows good attention to detail, wdyt?
208-210
: Good practice using the flag that was fixed in the PR.Setting
emit_connector_builder_messages=True
here directly tests the scenario that was causing the original issue. This ensures the fix for the missingadditional_query_properties
parameter is properly validated, wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one comment that if you think is worth, maybe we could tackle now. I'm still approving because this fix improves things
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
airbyte_cdk/sources/declarative/retrievers/simple_retriever.py (2)
654-654
: Consider clarifying the docstring wording.The current wording suggests
maximum_number_of_slices
is required when test read is enabled, but the implementation actually makes it optional (defaulting to 0). Maybe update to something like "maximum_number_of_slices should be provided when slice limiting is desired in test read mode" - wdyt?
660-680
: Clean implementation of conditional test read setup!The logic nicely separates concerns - log formatter setup only happens in test read mode, and validation is appropriately conditional. One small optimization you might consider: the validation could check
self.maximum_number_of_slices <= 0
instead of< 1
for slightly clearer intent, but both work perfectly fine. wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
(3 hunks)airbyte_cdk/sources/declarative/retrievers/simple_retriever.py
(1 hunks)unit_tests/sources/declarative/retrievers/test_simple_retriever.py
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
🧰 Additional context used
🧬 Code Graph Analysis (1)
airbyte_cdk/sources/declarative/retrievers/simple_retriever.py (5)
airbyte_cdk/sources/http_logger.py (1)
format_http_message
(12-51)airbyte_cdk/sources/declarative/declarative_stream.py (3)
name
(93-97)name
(100-102)stream_slices
(172-187)airbyte_cdk/sources/streams/core.py (2)
name
(139-143)stream_slices
(403-418)airbyte_cdk/sources/declarative/retrievers/retriever.py (1)
stream_slices
(33-34)airbyte_cdk/sources/types.py (1)
StreamSlice
(75-169)
⏰ Context from checks skipped due to timeout of 90000ms (9)
- GitHub Check: Check: 'source-pokeapi' (skip=false)
- GitHub Check: Check: 'source-amplitude' (skip=false)
- GitHub Check: Check: 'source-shopify' (skip=false)
- GitHub Check: Check: 'source-hardcoded-records' (skip=false)
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: SDM Docker Image Build
- GitHub Check: Analyze (python)
🔇 Additional comments (5)
unit_tests/sources/declarative/retrievers/test_simple_retriever.py (1)
918-918
: LGTM! Test parameter aligns with the new decorator behavior.This addition correctly configures the test to enable test read mode, which is now required for the log formatter to be set up properly. Setting
maximum_number_of_slices=1
is perfect for this test case since it's focused on log emission rather than slice processing.airbyte_cdk/sources/declarative/retrievers/simple_retriever.py (4)
657-658
: Verify the default value change impact.Changing the default
maximum_number_of_slices
from 5 to 0 effectively disables test read mode by default, which seems like the right approach for making the behavior more explicit. However, this could be a breaking change if existing code relied on the previous default behavior. Have you confirmed this won't affect existing implementations that didn't explicitly set this parameter?The new
emit_connector_builder_messages
attribute nicely provides an alternative way to enable test read mode specifically for connector builder scenarios.
681-686
: Excellent abstraction for test read mode detection!This method cleanly encapsulates the logic and makes the code much more readable. The boolean conversion with
bool()
handles both conditions elegantly. The docstring clearly explains the purpose too.
690-693
: Perfect conditional slice limiting logic!The implementation correctly handles both scenarios - full slice iteration when test read is disabled, and limited slices when enabled. Using
islice
is the right approach here, and delegating to the parent when test read is disabled maintains clean separation of concerns.
657-693
: Verify connection to PR objectives about query properties.The changes look solid for improving test read mode control, but I noticed the PR objectives mention fixing a missing
additional_query_properties
parameter whenemit_connector_builder_message
is True. While I can see the newemit_connector_builder_messages
attribute, I don't see the direct fix for the query properties issue in this file. Could you clarify how these changes connect to resolving that specific error message? Maybe the fix is in another file not shown in this review?
…ecorator for scenarios wehere we want to limit the number of requests that are made to the backend source.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
unit_tests/sources/declarative/retrievers/test_simple_retriever.py (1)
705-715
: Nice architectural improvement - decorator moved to the correct abstraction level.The test correctly adapts to the new pattern where the
StreamSlicerTestReadDecorator
wraps the stream slicer instead of the entire retriever. This is a cleaner approach since slice limiting is naturally a slicer concern rather than a retriever concern. The test functionality remains the same while the architecture is improved.One quick verification - would it be worth adding a comment in the test to clarify that we're now limiting slices at the slicer level rather than retriever level? This could help future developers understand the architectural decision, wdyt?
airbyte_cdk/sources/declarative/stream_slicers/stream_slicer_test_read_decorator.py (2)
14-22
: Clean decorator pattern implementation!The class structure looks great - proper inheritance from
StreamSlicer
and good use of@dataclass
. The default of 5 slices seems reasonable for test scenarios.Could we enhance the docstring to mention this is specifically for test reads? Something like "This decorator is used during test reads to prevent excessive API calls by limiting the number of stream slices processed." - wdyt?
27-77
: Consistent delegation pattern implemented well!All four delegate methods follow the same pattern correctly - keyword-only arguments, proper type hints, and clean delegation. The explicit implementation makes the interface clear.
I notice there's some repetition across these methods. Have you considered if there's a clean way to reduce the duplication while maintaining readability? Maybe a helper method or using
__getattr__
more broadly? - wdyt? Though the current explicit approach is definitely clear and maintainable.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
(5 hunks)airbyte_cdk/sources/declarative/retrievers/__init__.py
(0 hunks)airbyte_cdk/sources/declarative/retrievers/simple_retriever.py
(0 hunks)airbyte_cdk/sources/declarative/stream_slicers/__init__.py
(1 hunks)airbyte_cdk/sources/declarative/stream_slicers/stream_slicer_test_read_decorator.py
(1 hunks)unit_tests/connector_builder/test_connector_builder_handler.py
(2 hunks)unit_tests/sources/declarative/parsers/test_model_to_component_factory.py
(6 hunks)unit_tests/sources/declarative/retrievers/test_simple_retriever.py
(2 hunks)
💤 Files with no reviewable changes (2)
- airbyte_cdk/sources/declarative/retrievers/init.py
- airbyte_cdk/sources/declarative/retrievers/simple_retriever.py
✅ Files skipped from review due to trivial changes (1)
- airbyte_cdk/sources/declarative/stream_slicers/init.py
🚧 Files skipped from review as they are similar to previous changes (1)
- airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
🧰 Additional context used
🧬 Code Graph Analysis (3)
unit_tests/connector_builder/test_connector_builder_handler.py (2)
airbyte_cdk/sources/declarative/retrievers/simple_retriever.py (1)
SimpleRetriever
(54-625)airbyte_cdk/sources/declarative/models/declarative_component_schema.py (1)
SimpleRetriever
(2700-2754)
unit_tests/sources/declarative/retrievers/test_simple_retriever.py (2)
airbyte_cdk/sources/declarative/retrievers/simple_retriever.py (5)
SimpleRetriever
(54-625)name
(107-115)name
(118-120)primary_key
(291-293)primary_key
(296-298)airbyte_cdk/sources/declarative/stream_slicers/stream_slicer_test_read_decorator.py (1)
StreamSlicerTestReadDecorator
(15-81)
airbyte_cdk/sources/declarative/stream_slicers/stream_slicer_test_read_decorator.py (3)
airbyte_cdk/sources/types.py (1)
StreamSlice
(75-169)airbyte_cdk/sources/declarative/retrievers/simple_retriever.py (1)
stream_slices
(584-593)airbyte_cdk/sources/declarative/async_job/job_orchestrator.py (1)
stream_slice
(88-89)
⏰ Context from checks skipped due to timeout of 90000ms (9)
- GitHub Check: Check: 'source-shopify' (skip=false)
- GitHub Check: Check: 'source-pokeapi' (skip=false)
- GitHub Check: Check: 'source-amplitude' (skip=false)
- GitHub Check: Check: 'source-hardcoded-records' (skip=false)
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: SDM Docker Image Build
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Analyze (python)
🔇 Additional comments (12)
unit_tests/connector_builder/test_connector_builder_handler.py (2)
1114-1114
: LGTM! Test assertion correctly updated for the new architecture.This change properly reflects the refactoring where
SimpleRetrieverTestReadDecorator
was removed in favor ofStreamSlicerTestReadDecorator
. The assertion now correctly checks for the underlyingSimpleRetriever
type, which makes sense since test limiting functionality has moved to the stream slicer level. Wdyt?
1160-1160
: Consistent and correct assertion update.This mirrors the previous change and maintains consistency across both test functions that verify retriever types. The assertion correctly checks for
SimpleRetriever
now that the test decorator approach has been refactored. Nice consistency!unit_tests/sources/declarative/retrievers/test_simple_retriever.py (1)
43-44
: LGTM! Clean refactoring of the import statement.The import change reflects the architectural improvement where slice limiting functionality has been moved from the retriever level to the stream slicer level, which makes more sense from a separation of concerns perspective.
unit_tests/sources/declarative/parsers/test_model_to_component_factory.py (6)
154-159
: Import updates look good for the decorator refactorThe import changes correctly reflect the architectural shift from
SimpleRetrieverTestReadDecorator
toStreamSlicerTestReadDecorator
. This aligns well with moving the decoration logic from the retriever level to the stream slicer level.
715-722
: Test setup for request/response mocking looks appropriateThe addition of
PreparedRequest
andResponse
objects with proper setup (headers, URL, status code) provides good test isolation for the logging functionality. The mock objects have the essential attributes needed for the test.
734-739
: Assertions correctly verify the new architectureThe test now properly checks that:
- The retriever is still a
SimpleRetriever
(not decorated)- The
log_formatter
is properly set- The formatter behavior matches the factory's implementation
This is consistent with the architectural change where decoration happens at the stream slicer level. Nice work on maintaining the logging functionality verification!
3089-3091
: Stream slicer assertion updated correctly for datetime-based cursorThe assertion now correctly checks that the
stream_slicer
is wrapped withStreamSlicerTestReadDecorator
and contains the expectedDatetimeBasedCursor
. This reflects the architectural change where test decoration happens at the slicer level rather than the retriever level.
3178-3180
: Consistent assertion pattern for partition cursorThe test follows the same pattern as the datetime-based cursor test, verifying that the
PerPartitionCursor
is properly wrapped withStreamSlicerTestReadDecorator
. The consistency across different cursor types is good to see!
3220-3222
: Default behavior assertion updated appropriatelyEven for the default case with
SinglePartitionRouter
, the test correctly verifies the new decorator pattern. This ensures that the test decoration is applied consistently across all stream slicer types, which is exactly what we'd want.airbyte_cdk/sources/declarative/stream_slicers/stream_slicer_test_read_decorator.py (3)
1-12
: LGTM on imports and setup!The imports are well-organized and all necessary for the decorator implementation. Good use of
islice
from itertools for the slice limiting functionality.
24-25
: Perfect use of islice for slice limiting!This is exactly the right approach - using
islice
keeps the implementation lazy and memory-efficient. The core functionality is clean and focused.
79-81
: Excellent fallback delegation!The
__getattr__
method provides perfect transparency for the decorator - any future methods or attributes added to theStreamSlicer
interface will automatically work through this decorator. Great forward-compatibility thinking!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🔭 Outside diff range comments (1)
unit_tests/sources/declarative/stream_slicers/test_stream_slicer_read_decorator.py (1)
46-71
:⚠️ Potential issueGood test for metaclass behavior, but there are a couple of issues to address.
The test correctly validates that the metaclass enables isinstance checks for wrapped slicers. However:
Line 48 issue:
mocked_partition_router().stream_slices.return_value = [...]
creates a mock but doesn't assign it to anything. This should probably be assigned to a variable that's then used in the GlobalSubstreamCursor constructor.Missing functionality test: While the isinstance checks are important, it would be valuable to also test that the StreamSlicerTestReadDecorator actually limits the number of slices as intended. Currently we only test the type checking behavior, wdyt?
Here's a potential fix for the partition router issue:
- mocked_partition_router().stream_slices.return_value = [ + partition_router = mocked_partition_router() + partition_router.stream_slices.return_value = [ StreamSlice( partition=first_partition, cursor_slice={}, extra_fields={"extra_field": "extra_value"} ), ] - global_cursor = GlobalSubstreamCursor(cursor, mocked_partition_router) + global_cursor = GlobalSubstreamCursor(cursor, partition_router)
🧹 Nitpick comments (1)
airbyte_cdk/sources/streams/concurrent/partitions/stream_slicer.py (1)
9-25
: Nice metaclass implementation! This elegantly solves the isinstance check problem for wrapped slicers.The implementation handles the decorator pattern transparently and falls back gracefully to standard behavior. The docstring clearly explains the motivation and use case.
One question - have you considered what happens if there's a chain of wrapped slicers (e.g., a decorator wrapping another decorator)? The recursive
isinstance(instance.wrapped_slicer, cls)
should handle this correctly, but it might be worth adding a test case for that scenario, wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
airbyte_cdk/sources/streams/concurrent/partitions/stream_slicer.py
(1 hunks)unit_tests/sources/declarative/stream_slicers/test_stream_slicer_read_decorator.py
(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
airbyte_cdk/sources/streams/concurrent/partitions/stream_slicer.py (1)
airbyte_cdk/sources/types.py (1)
StreamSlice
(75-169)
⏰ Context from checks skipped due to timeout of 90000ms (9)
- GitHub Check: Check: 'source-pokeapi' (skip=false)
- GitHub Check: Check: 'source-amplitude' (skip=false)
- GitHub Check: Check: 'source-hardcoded-records' (skip=false)
- GitHub Check: Check: 'source-shopify' (skip=false)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: SDM Docker Image Build
- GitHub Check: Pytest (Fast)
- GitHub Check: Analyze (python)
🔇 Additional comments (5)
airbyte_cdk/sources/streams/concurrent/partitions/stream_slicer.py (2)
3-4
: LGTM! Import additions support the metaclass implementation.The new imports are necessary for the custom metaclass implementation and type annotations. Good practice to import exactly what's needed.
27-27
: Perfect! Minimal change to enable the metaclass behavior.Adding the metaclass to StreamSlicer enables the custom isinstance checks for all StreamSlicer instances and subclasses, which is exactly what's needed for the decorator pattern.
unit_tests/sources/declarative/stream_slicers/test_stream_slicer_read_decorator.py (3)
5-17
: Comprehensive imports for testing the metaclass behavior.The imports cover the key classes needed to test isinstance behavior with different stream slicer types and the new decorator. Good coverage of the relevant types.
22-40
: Clean builder pattern for test mocks.The MockedCursorBuilder provides a fluent interface for setting up test data, which makes the tests more readable. The mock setup correctly implements the expected DeclarativeCursor interface.
42-43
: Simple helper function, but there's an issue with its usage in the test.The function itself is fine, but looking at line 48 in the test, it seems like
mocked_partition_router().stream_slices.return_value = ...
doesn't actually assign the mock to anything. Should this be assigned to a variable first, or is the mock being configured but not used? wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (4)
unit_tests/sources/declarative/stream_slicers/test_stream_slicer_read_decorator.py (4)
116-116
: Fix typo in function name - "aysnc" should be "async", wdyt?There's a small typo in the function name that should be corrected for clarity.
-def test_isinstance_global_cursor_aysnc_job_partition_router(): +def test_isinstance_global_cursor_async_job_partition_router():
144-144
: Fix typo in function name - "substrea" should be "substream", wdyt?Another small typo in the function name that should be corrected.
-def test_isinstance_substrea_partition_router(): +def test_isinstance_substream_partition_router():
86-224
: Consider adding tests for the core slice-limiting functionality, wdyt?While these isinstance tests are valuable for verifying the metaclass behavior, it might be worth adding tests that verify the decorator actually limits the number of slices as intended. The current tests focus on type preservation but don't validate the primary purpose of
StreamSlicerTestReadDecorator
.You could add a test like:
def test_slice_limiting_functionality(): # Create a slicer that returns many slices mock_slicer = Mock(spec=StreamSlicer) mock_slicer.stream_slices.return_value = [ StreamSlice(partition={f"key_{i}": f"value_{i}"}, cursor_slice={}) for i in range(10) ] # Wrap with decorator limiting to 3 slices wrapped_slicer = StreamSlicerTestReadDecorator( wrapped_slicer=mock_slicer, maximum_number_of_slices=3, ) # Verify only 3 slices are returned slices = list(wrapped_slicer.stream_slices()) assert len(slices) == 3
145-159
: Consider extracting common SubstreamPartitionRouter setup to reduce duplication, wdyt?The setup for
SubstreamPartitionRouter
is duplicated between two tests. You could extract this into a helper function to improve maintainability.def create_substream_partition_router(): return SubstreamPartitionRouter( config={}, parameters={}, parent_stream_configs=[ ParentStreamConfig( type="ParentStreamConfig", parent_key="id", partition_field="id", stream=DeclarativeStream( type="DeclarativeStream", retriever=CustomRetriever(type="CustomRetriever", class_name="a_class_name"), ), ) ], )Also applies to: 178-192
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
(5 hunks)airbyte_cdk/sources/declarative/stream_slicers/stream_slicer_test_read_decorator.py
(1 hunks)unit_tests/sources/declarative/stream_slicers/test_stream_slicer_read_decorator.py
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- airbyte_cdk/sources/declarative/stream_slicers/stream_slicer_test_read_decorator.py
- airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
🧰 Additional context used
🧬 Code Graph Analysis (1)
unit_tests/sources/declarative/stream_slicers/test_stream_slicer_read_decorator.py (10)
airbyte_cdk/sources/declarative/incremental/per_partition_cursor.py (1)
CursorFactory
(19-24)airbyte_cdk/sources/declarative/incremental/global_substream_cursor.py (1)
GlobalSubstreamCursor
(71-357)airbyte_cdk/sources/declarative/incremental/per_partition_with_global.py (2)
PerPartitionWithGlobalCursor
(20-200)_get_active_cursor
(83-84)airbyte_cdk/sources/declarative/incremental/declarative_cursor.py (1)
DeclarativeCursor
(9-13)airbyte_cdk/sources/types.py (4)
StreamSlice
(75-169)partition
(99-104)cursor_slice
(107-112)extra_fields
(115-117)airbyte_cdk/sources/declarative/interpolation/interpolated_string.py (1)
InterpolatedString
(13-79)airbyte_cdk/sources/declarative/partition_routers/async_job_partition_router.py (1)
AsyncJobPartitionRouter
(20-65)airbyte_cdk/sources/declarative/stream_slicers/stream_slicer_test_read_decorator.py (2)
StreamSlicerTestReadDecorator
(14-28)stream_slices
(23-24)airbyte_cdk/sources/message/repository.py (1)
NoopMessageRepository
(63-71)unit_tests/sources/declarative/async_job/test_integration.py (1)
MockAsyncJobRepository
(39-54)
⏰ Context from checks skipped due to timeout of 90000ms (9)
- GitHub Check: Check: 'source-amplitude' (skip=false)
- GitHub Check: Check: 'source-hardcoded-records' (skip=false)
- GitHub Check: Check: 'source-pokeapi' (skip=false)
- GitHub Check: Check: 'source-shopify' (skip=false)
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: SDM Docker Image Build
- GitHub Check: Analyze (python)
🔇 Additional comments (1)
unit_tests/sources/declarative/stream_slicers/test_stream_slicer_read_decorator.py (1)
1-224
: LGTM! Comprehensive test coverage for isinstance behavior.This test file provides excellent coverage for verifying that the
StreamSlicerTestReadDecorator
preserves type information through the metaclass implementation. The tests cover all the major slicer types and ensure that the decorator pattern works correctly withisinstance
checks, which is crucial for the refactoring fromSimpleRetrieverTestReadDecorator
toStreamSlicerTestReadDecorator
.
unit_tests/sources/declarative/stream_slicers/test_stream_slicer_read_decorator.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
unit_tests/sources/declarative/stream_slicers/test_stream_slicer_read_decorator.py (3)
137-137
: Fix typo in function name, wdyt?There's a typo in the function name - "aysnc" should be "async".
-def test_isinstance_global_cursor_aysnc_job_partition_router(): +def test_isinstance_global_cursor_async_job_partition_router():
165-165
: Fix typo in function name, wdyt?There's a typo in the function name - "substrea" should be "substream".
-def test_isinstance_substrea_partition_router(): +def test_isinstance_substream_partition_router():
184-217
: Consider reducing reliance on private member access, wdyt?While the test comprehensively validates PerPartitionWithGlobalCursor behavior, it heavily relies on accessing private members (
_per_partition_cursor
,_cursor_factory
,_partition_router
,_global_cursor
,_stream_cursor
). This makes the test brittle to internal implementation changes.Could we verify the behavior through public APIs instead where possible? For example, testing the actual slice generation or cursor state methods rather than internal structure?
# Instead of accessing private members, consider testing behavior: - assert wrapped_slicer._per_partition_cursor._cursor_factory == cursor_factory - assert wrapped_slicer._partition_router == partition_router - assert wrapped_slicer._global_cursor._stream_cursor == date_time_based_cursor + # Test actual behavior through public interface + # e.g., test slice generation, state management, etc.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
unit_tests/sources/declarative/stream_slicers/test_stream_slicer_read_decorator.py
(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (9)
- GitHub Check: Check: 'source-pokeapi' (skip=false)
- GitHub Check: Check: 'source-amplitude' (skip=false)
- GitHub Check: Check: 'source-shopify' (skip=false)
- GitHub Check: Check: 'source-hardcoded-records' (skip=false)
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: SDM Docker Image Build
- GitHub Check: Analyze (python)
🔇 Additional comments (8)
unit_tests/sources/declarative/stream_slicers/test_stream_slicer_read_decorator.py (8)
48-66
: Helper class looks well-designed, wdyt?The
MockedCursorBuilder
follows a clean builder pattern and provides good separation of concerns for test setup. The mock configuration is clear and reusable across tests.
68-70
: Simple and effective mock factory, wdyt?Clean utility function for creating partition router mocks.
72-87
: Good factory function for test data, wdyt?The
date_time_based_cursor_factory
provides a well-configured DatetimeBasedCursor for testing purposes with reasonable defaults.
89-105
: Comprehensive SubstreamPartitionRouter setup, wdyt?The factory function creates a properly configured SubstreamPartitionRouter with all necessary components for testing.
107-135
: Solid test coverage for GlobalSubstreamCursor type preservation, wdyt?The test correctly verifies that the decorator preserves instanceof checks for GlobalSubstreamCursor while ensuring it doesn't incorrectly identify as other types.
138-163
: Good test for AsyncJobPartitionRouter type preservation, wdyt?The test logic correctly verifies that the decorator maintains isinstance behavior for AsyncJobPartitionRouter. The setup with JobTracker and AsyncJobOrchestrator looks appropriate.
166-182
: Comprehensive test for SubstreamPartitionRouter, wdyt?The test correctly validates type preservation for SubstreamPartitionRouter with proper negative assertions.
219-235
: Excellent test for core functionality, wdyt?This test effectively validates the primary purpose of the decorator - limiting the number of slices returned. The setup is clear and the assertion is straightforward.
@aldogonzalez8 any reason why we shouldn't merge this today / Monday morning? |
No, this is good, we can merge early Monday Morning. |
WHAT
SimpleRetrieverTestReadDecorator
whenself._emit_connector_builder_message
is True is missingadditional_query_properties
param, causing:Relates to https://github.com/airbytehq/airbyte-internal-issues/issues/12775
HOW
We have replaced Retriever decorator by stream_slicer decorator in order to not have to maintain properties in two retrievers.
Summary by CodeRabbit