-
Notifications
You must be signed in to change notification settings - Fork 24
feat: Removes stream_state
interpolation from CDK
#320
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…stream_state` to jinja interpolation
This comment was marked as off-topic.
This comment was marked as off-topic.
stream_state
interpolation)stream_state
interpolation from CDK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
airbyte_cdk/sources/declarative/interpolation/jinja.py (2)
41-43
: Great error message! Consider adding docstring?The mapping and error message are clear and helpful. What do you think about adding a brief docstring to explain the purpose of this mapping and how to extend it for future unsupported variables? wdyt?
_UNSUPPORTED_INTERPOLATION_VARIABLES: Mapping[str, str] = { + # Maps variable names that are no longer supported for interpolation to their corresponding error messages. + # Add new entries here when deprecating interpolation variables. "stream_state": "`stream_state` is no longer supported for interpolation. We recommend using `stream_interval` instead. Please reference the CDK Migration Guide for more information.", }
104-110
: Consider using AST for more accurate variable detection?The current implementation checks for variable names using string containment, which might have false positives (e.g., if "stream_state" appears in a string literal). Since we already have
_find_undeclared_variables
, what do you think about using it for a more robust check? wdyt?- for variable_name in _UNSUPPORTED_INTERPOLATION_VARIABLES: - if variable_name in input_str: + if isinstance(input_str, str): + undeclared = self._find_undeclared_variables(input_str) + for variable_name in _UNSUPPORTED_INTERPOLATION_VARIABLES: + if variable_name in undeclared: raise AirbyteTracedException( message=_UNSUPPORTED_INTERPOLATION_VARIABLES[variable_name], internal_message=_UNSUPPORTED_INTERPOLATION_VARIABLES[variable_name], failure_type=FailureType.config_error, )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
airbyte_cdk/sources/declarative/concurrent_declarative_source.py
(0 hunks)airbyte_cdk/sources/declarative/declarative_component_schema.yaml
(0 hunks)airbyte_cdk/sources/declarative/interpolation/jinja.py
(3 hunks)airbyte_cdk/sources/declarative/requesters/http_requester.py
(0 hunks)unit_tests/sources/declarative/test_concurrent_declarative_source.py
(0 hunks)
💤 Files with no reviewable changes (4)
- unit_tests/sources/declarative/test_concurrent_declarative_source.py
- airbyte_cdk/sources/declarative/concurrent_declarative_source.py
- airbyte_cdk/sources/declarative/requesters/http_requester.py
- airbyte_cdk/sources/declarative/declarative_component_schema.yaml
⏰ Context from checks skipped due to timeout of 90000ms (8)
- GitHub Check: Check: 'source-pokeapi' (skip=false)
- GitHub Check: Check: 'source-the-guardian-api' (skip=false)
- GitHub Check: Check: 'source-shopify' (skip=false)
- GitHub Check: Check: 'source-hardcoded-records' (skip=false)
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Analyze (python)
🔇 Additional comments (1)
airbyte_cdk/sources/declarative/interpolation/jinja.py (1)
14-14
: LGTM! Good error handling setup.The new imports for
FailureType
andAirbyteTracedException
are well-organized and align with the PR's objective to handle unsupported interpolation variables gracefully.Also applies to: 19-19
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code change looks good, and I for once understood it. Didn't run tests locally, and see they're failing — good to go on green CI?
..._cdk/sources/declarative/requesters/request_options/interpolated_request_options_provider.py
Show resolved
Hide resolved
@pnilan lol you got got:
|
@natikgadzhi Just went through and updated all the tests to either convert stream_state interpolation to stream_interval equivalent or remove outright if no longer relevant. CI should be green now. @brianjlai Can you review? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🔭 Outside diff range comments (1)
unit_tests/sources/declarative/requesters/request_options/test_interpolated_request_options_provider.py (1)
64-66
: 💡 Verification agent🧩 Analysis chain
Consider updating method signature to reflect new interpolation context.
The test is using
stream_interval
in templates but still passesstream_state
to the method. Should we update the method signature to better reflect the new design, wdyt?
🏁 Script executed:
#!/bin/bash # Search for other method signatures that might need similar updates rg -A 2 "def get_request_params"Length of output: 5962
Could you please provide the output from this script?
🏁 Script executed:
#!/bin/bash # Let's extract the method signature from interpolated_request_options_provider.py to check if it uses stream_state or stream_interval. rg -n "def get_request_params" airbyte_cdk/sources/declarative/requesters/request_options/interpolated_request_options_provider.pyLength of output: 163
Consistency Check: Update Parameter Naming in get_request_params?
There appears to be an inconsistency between the new interpolation context and the method signature. In your test (lines 64–66 of
unit_tests/sources/declarative/requesters/request_options/test_interpolated_request_options_provider.py
), you callget_request_params
withstream_state
, yet elsewhere in the updated interpolation logic the termstream_interval
is being introduced.
- The method signature in
airbyte_cdk/sources/declarative/requesters/request_options/interpolated_request_options_provider.py
(line 72) has not been updated to reflect this naming change.- Would you consider updating the parameter name in the method signature (or alternatively, aligning the test call) so that it consistently reflects the new context using
stream_interval
? wdyt?
🧹 Nitpick comments (5)
unit_tests/sources/declarative/extractors/test_record_selector.py (1)
26-26
: LGTM! Consider adding a comment about the migration.The change from
stream_state
tostream_interval.extra_fields
aligns with the PR objectives. Would it be helpful to add a comment explaining this migration for future reference, wdyt? 🤔+ # Note: Using stream_interval.extra_fields instead of stream_state as part of the stream_state interpolation removal "{{ record['created_at'] > stream_interval.extra_fields['created_at'] }}",
unit_tests/sources/declarative/datetime/test_min_max_datetime.py (1)
25-25
: Consider standardizing the stream_slice key naming?I notice we're using both
start_date
andnewer
as keys instream_slice
across different test cases. For consistency and to avoid confusion, should we standardize on one key name throughout the test file? wdyt?Also applies to: 31-31, 45-45, 54-54, 59-60, 68-68, 80-80
unit_tests/sources/declarative/requesters/request_options/test_interpolated_request_options_provider.py (1)
11-12
: Consider updating test data setup.The test uses
state
variable but templates usestream_interval
. Should we rename the variable to match the new context, wdyt?-state = {"date": "2021-01-01"} +stream_interval = {"start_date": "2021-01-01"}unit_tests/sources/declarative/extractors/test_record_filter.py (2)
59-60
: Verify the impact of changing comparison operator.The comparison operator changed from
>
to>=
. This means records with exact matching dates will now be included. Is this intentional? If so, should we add test cases for the edge case wherecreated_at
exactly equalsstream_interval.extra_fields['created_at']
, wdyt?Would you like me to generate additional test cases to cover this edge case?
64-65
: Consider updating test data to include edge cases.The test data doesn't include a case where
created_at
exactly equals the comparison value. Should we add such a case to verify the new>=
behavior, wdyt?[ {"id": 1, "created_at": "06-06-21"}, {"id": 2, "created_at": "06-07-21"}, + {"id": 4, "created_at": "06-07-21"}, # Edge case: exact match {"id": 3, "created_at": "06-08-21"}, ], - [{"id": 2, "created_at": "06-07-21"}, {"id": 3, "created_at": "06-08-21"}], + [{"id": 2, "created_at": "06-07-21"}, {"id": 4, "created_at": "06-07-21"}, {"id": 3, "created_at": "06-08-21"}],
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
unit_tests/sources/declarative/datetime/test_min_max_datetime.py
(6 hunks)unit_tests/sources/declarative/decoders/test_decoders_memory_usage.py
(1 hunks)unit_tests/sources/declarative/extractors/test_record_filter.py
(1 hunks)unit_tests/sources/declarative/extractors/test_record_selector.py
(2 hunks)unit_tests/sources/declarative/incremental/test_concurrent_perpartitioncursor.py
(2 hunks)unit_tests/sources/declarative/partition_routers/test_parent_state_stream.py
(2 hunks)unit_tests/sources/declarative/requesters/request_options/test_interpolated_request_options_provider.py
(5 hunks)unit_tests/sources/declarative/transformations/test_add_fields.py
(0 hunks)
💤 Files with no reviewable changes (1)
- unit_tests/sources/declarative/transformations/test_add_fields.py
✅ Files skipped from review due to trivial changes (1)
- unit_tests/sources/declarative/decoders/test_decoders_memory_usage.py
⏰ Context from checks skipped due to timeout of 90000ms (8)
- GitHub Check: Check: 'source-pokeapi' (skip=false)
- GitHub Check: Check: 'source-the-guardian-api' (skip=false)
- GitHub Check: Check: 'source-shopify' (skip=false)
- GitHub Check: Check: 'source-hardcoded-records' (skip=false)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Analyze (python)
🔇 Additional comments (7)
unit_tests/sources/declarative/extractors/test_record_selector.py (1)
83-87
: LGTM! Consider adding more test cases.The addition of
extra_fields
toStreamSlice
looks good. Would you like to add more test cases to verify edge cases, wdyt? For example:
- Missing
extra_fields
- Empty
extra_fields
- Different field names in
extra_fields
unit_tests/sources/declarative/partition_routers/test_parent_state_stream.py (2)
148-148
: LGTM! The change fromstream_state
tostream_interval
aligns with the CDK migration.The update correctly replaces the deprecated
stream_state
interpolation withstream_interval
in the record filter condition. The test cases thoroughly validate this change across various scenarios.
261-1074
: Comprehensive test coverage for the migration, nice work!The test cases thoroughly validate the migration from
stream_state
tostream_interval
by:
- Testing basic functionality with different state configurations
- Verifying state persistence and updates
- Handling edge cases with empty responses
- Testing state migration scenarios
unit_tests/sources/declarative/incremental/test_concurrent_perpartitioncursor.py (2)
146-146
: The change fromstream_state
tostream_interval.extra_fields
aligns with the PR objectives.The condition has been updated to use
stream_interval.extra_fields
instead ofstream_state
, which is consistent with the PR's goal of removingstream_state
interpolation. The test case validates that the new approach works correctly.
2546-2546
: Similar change in the request options manifest looks good.The condition in
SUBSTREAM_REQUEST_OPTIONS_MANIFEST
has been updated to usestream_interval['extra_fields']
instead ofstream_state
, maintaining consistency with the earlier change.unit_tests/sources/declarative/datetime/test_min_max_datetime.py (2)
89-89
: LGTM! Clean transition from stream_state to stream_sliceThe test setup and get_datetime call have been updated correctly to use stream_slice instead of stream_state.
Also applies to: 95-95
102-102
: Test coverage looks good! One small suggestion...The custom format tests have been updated correctly to use stream_slice. However, should we add a test case to verify behavior when stream_slice is None or missing the expected key? This would help ensure robust error handling, wdyt?
Also applies to: 108-108, 111-111, 120-120, 126-126, 129-129
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few suggestions here and there, but overall this is looking good.
One other area we might be able to cleanup, is simple_retriever.py._get_request_options()
. Although its probably too annoying to remove stream_state
from the method signature, we can stop passing it into the subsequent paginator_method
and stream_slicer
methods since at that point stream_state
defaults to None. And from there it all becomes the kwargs
you already removed. Might be worth trying to remove and see if the tests still pass and then we know it wasn't needed
unit_tests/sources/declarative/datetime/test_min_max_datetime.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
airbyte_cdk/sources/declarative/retrievers/simple_retriever.py
(0 hunks)airbyte_cdk/sources/declarative/transformations/add_fields.py
(2 hunks)unit_tests/sources/declarative/datetime/test_min_max_datetime.py
(6 hunks)unit_tests/sources/declarative/interpolation/test_jinja.py
(2 hunks)unit_tests/sources/declarative/retrievers/test_simple_retriever.py
(1 hunks)
💤 Files with no reviewable changes (1)
- airbyte_cdk/sources/declarative/retrievers/simple_retriever.py
🚧 Files skipped from review as they are similar to previous changes (2)
- unit_tests/sources/declarative/datetime/test_min_max_datetime.py
- airbyte_cdk/sources/declarative/transformations/add_fields.py
⏰ Context from checks skipped due to timeout of 90000ms (8)
- GitHub Check: Check: 'source-pokeapi' (skip=false)
- GitHub Check: Check: 'source-amplitude' (skip=false)
- GitHub Check: Check: 'source-shopify' (skip=false)
- GitHub Check: Check: 'source-hardcoded-records' (skip=false)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (Fast)
- GitHub Check: Analyze (python)
🔇 Additional comments (1)
unit_tests/sources/declarative/retrievers/test_simple_retriever.py (1)
132-132
:❓ Verification inconclusive
LGTM! The test has been updated to match the new method signature.
The change aligns with the PR objectives of removing
stream_state
interpolation from the codebase. The test now correctly verifies that_request_params
accepts two parameters instead of three while maintaining the same assertion logic.Let's verify that this change is consistent across the codebase:
🏁 Script executed:
#!/bin/bash # Description: Check for any remaining instances of _request_params with three parameters # This helps ensure the change has been applied consistently ast-grep --pattern $'_request_params\(\$_, \$_, \$_\)'Length of output: 55
LGTM! The test for
_request_params
now properly reflects the updated signature.
The assertion on line 132 correctly verifies that calling_request_params
with two parameters returns an empty dictionary, which aligns with the PR’s objective of removingstream_state
interpolation. I didn't find any evidence of a three-parameter call using our initial verification command, although the lack of output suggests that you might want to manually confirm or run additional checks to be 100% sure. wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good to me! coderabbit had a thing and one more adjustment to the schema file, but no need to block pending the fix.
Have you functionally tested it against one of our existing connectors that uses stream_state
?
Confirmed w/
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
cdk-migrations.md (2)
3-18
: Great Introduction to the Breaking Change Section!
You've done a clear job outlining the removal ofstream_state
and recommending the update tostream_interval
in the "Upgrading to 6.X.X" section. Would you consider adding a brief note (for example, after line 17) to remind developers to re-test their connectors after these changes or to reference specific sections of the migration guide for further details? wdyt?
21-31
: Clear YAML Example Demonstrating the Migration!
The before-and-after YAML snippet provides an excellent illustration of how to update the RecordFilter condition. Would you consider adding a short comment within the code block (or immediately after) to alert users about potential pitfalls when migrating fromstream_state
tostream_interval
? This might help reinforce best practices as they update their configuration. wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
cdk-migrations.md
(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (8)
- GitHub Check: Check: 'source-pokeapi' (skip=false)
- GitHub Check: Check: 'source-amplitude' (skip=false)
- GitHub Check: Check: 'source-shopify' (skip=false)
- GitHub Check: Check: 'source-hardcoded-records' (skip=false)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (Fast)
- GitHub Check: Analyze (python)
* main: fix: update cryptography package to latest version to address CVE (airbytehq#377) fix: (CDK) (HttpRequester) - Make the `HttpRequester.path` optional (airbytehq#370) feat: improved custom components handling (airbytehq#350) feat: add microseconds timestamp format (airbytehq#373) fix: Replace Unidecode with anyascii for permissive license (airbytehq#367) feat: add IncrementingCountCursor (airbytehq#346) feat: (low-code cdk) datetime format with milliseconds (airbytehq#369) fix: (CDK) (AsyncRetriever) - Improve UX on variable naming and interpolation (airbytehq#368) fix: (CDK) (AsyncRetriever) - Add the `request` and `response` to each `async` operations (airbytehq#356) fix: (CDK) (ConnectorBuilder) - Add `auxiliary requests` to slice; support `TestRead` for AsyncRetriever (part 1/2) (airbytehq#355) feat(concurrent perpartition cursor): Add parent state updates (airbytehq#343) fix: update csv parser for builder compatibility (airbytehq#364) feat(low-code cdk): add interpolation for limit field in Rate (airbytehq#353) feat(low-code cdk): add AbstractStreamFacade processing as concurrent streams in declarative source (airbytehq#347) fix: (CDK) (CsvParser) - Fix the `\\` escaping when passing the `delimiter` from Builder's UI (airbytehq#358) feat: expose `str_to_datetime` jinja macro (airbytehq#351) fix: update CDK migration for 6.34.0 (airbytehq#348) feat: Removes `stream_state` interpolation from CDK (airbytehq#320) fix(declarative): Pass `extra_fields` in `global_substream_cursor` (airbytehq#195) feat(concurrent perpartition cursor): Refactor ConcurrentPerPartitionCursor (airbytehq#331) feat(HttpMocker): adding support for PUT requests and bytes responses (airbytehq#342) chore: use certified source for manifest-only test (airbytehq#338) feat: check for request_option mapping conflicts in individual components (airbytehq#328) feat(file-based): sync file acl permissions and identities (airbytehq#260) fix: (CDK) (Connector Builder) - refactor the `MessageGrouper` > `TestRead` (airbytehq#332) fix(low code): Fix missing cursor for ClientSideIncrementalRecordFilterDecorator (airbytehq#334) feat(low-code): Add API Budget (airbytehq#314) chore(decoder): clean decoders and make csvdecoder available (airbytehq#326)
What
stream_state
interpolation frominterpolation_context
indeclarative_component_schema
stream_state
to theHttpRequester
,AddFields
, andRequestOptionsProvider
, andRecordFilter
components.stream_state
inaccessible as a interpolated variable.JinjaInterpolation
class to raiseAirbyteTracedException
when connector attempts to evaluate a jinja expression that containsstream_state
.DefaultStreams
(concurrent) ifstream_state
interpolation was used.CDK Migration Guide
Recommended Reading Order
declarative_component_schema.yaml
http_requester.py
interpolated_requests_input_provider.py
interpolated_nested_request_input_provider.py
add_fields.py
interpolated_requests_options_provider.py
jinja.py
cdk_migrations.md
Summary by CodeRabbit
stream_state
parameter with a new interval-based approach across request handling and transformations.