-
Notifications
You must be signed in to change notification settings - Fork 24
feat(property chunking): Allow PropertiesFromEndpoint to be defined on HttpRequester #507
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… allow for interpolation on other request options
airbyte_cdk/sources/declarative/declarative_component_schema.yaml
Outdated
Show resolved
Hide resolved
📝 WalkthroughWalkthroughThis change introduces a new Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Manifest
participant ModelToComponentFactory
participant HttpRequester
participant QueryProperties
User->>Manifest: Defines HttpRequester with fetch_properties_from_endpoint
Manifest->>ModelToComponentFactory: Passes manifest for component creation
ModelToComponentFactory->>HttpRequester: Instantiates HttpRequester
alt fetch_properties_from_endpoint defined
ModelToComponentFactory->>QueryProperties: Dynamically creates QueryProperties from endpoint
end
ModelToComponentFactory-->>User: Returns configured retriever
Possibly related PRs
Suggested labels
Suggested reviewers
Would you like to consider adding an integration test that exercises a full end-to-end flow with the new ✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (5)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml (1)
1981-1985
: Introducefetch_properties_from_endpoint
on HttpRequester
The new optional field correctly referencesPropertiesFromEndpoint
and matches the PR objective of enabling dynamic property injection. Would it be helpful to:
- Add a schema‐level
oneOf
ornot
constraint to disallow defining bothfetch_properties_from_endpoint
andQueryProperties
inrequest_parameters
at validation time, rather than waiting for the factory error?- Include an example under
HttpRequester
’sexamples
to demonstrate howfetch_properties_from_endpoint
is used in practice?Wdyt?
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (3)
2948-2964
: Explicit clash detection is great – consider simplifying the second scan?Love that we now surface an error when both
request_parameters
andfetch_properties_from_endpoint
define aPropertiesFromEndpoint
– much clearer 👏.Today we (1) call
_query_properties_in_request_parameters
to know a clash may exist, and (2) iterate a second time overrequest_parameters
to collectQueryPropertiesModel
instances. That means two scans of the same dict.Might it be cleaner to have
_query_properties_in_request_parameters
return the (key, model) of the firstQueryPropertiesModel
it finds (orNone
) and at the same time detect duplicates?
This would:
- avoid the second loop,
- centralise the “only one allowed” validation,
- let the caller directly obtain
query_properties_key
and the model.wdyt?
- if self._query_properties_in_request_parameters(model.requester): + qp_result = self._query_properties_in_request_parameters(model.requester) + if qp_result: ... - query_properties_definitions = [] - for key, request_parameter in model.requester.request_parameters.items(): - if isinstance(request_parameter, QueryPropertiesModel): - query_properties_key = key - query_properties_definitions.append(request_parameter) + query_properties_key, qp_model = qp_result + query_properties = self._create_component_from_model( + model=qp_model, config=config + )(Fully understand if you prefer to keep the current shape, just throwing the idea out there 😊)
2975-2986
: Shouldfetch_properties_from_endpoint
expose more knobs?When
fetch_properties_from_endpoint
is used we create a syntheticQueryPropertiesModel
withalways_include_properties=None
andproperty_chunking=None
. That mirrors the “no-chunking” requirement, but it also silently prevents a user from opting intoalways_include_properties
.Would it make sense to copy over (or allow in the schema) extra fields so callers can still control
always_include_properties
while leavingproperty_chunking
disabled? Something like:QueryPropertiesModel( type="QueryProperties", property_list=model.requester.fetch_properties_from_endpoint, always_include_properties=getattr(model.requester, "always_include_properties", None), property_chunking=None, )No pressure, just wondering if you foresee a need for that flexibility down the road – wdyt?
3108-3119
: Minor typing nit: prefercollections.abc.Mapping
overtyping.Mapping
forisinstance
typing.Mapping
works today (it aliases tocollections.abc.Mapping
in ≥3.9), but using it withisinstance
was never the intended public API and could become brittle.Switching to
from collections.abc import Mapping
(or importing as an alias) would make the intent explicit and future-proof the runtime check – wdyt?-from typing import Mapping +from collections.abc import Mapping ... - if request_parameters and isinstance(request_parameters, Mapping): + if request_parameters and isinstance(request_parameters, Mapping):unit_tests/sources/declarative/parsers/test_model_to_component_factory.py (1)
4445-4526
: Good validation test for conflicting property definitions!This test ensures we fail early when a developer accidentally defines
PropertiesFromEndpoint
both directly on the requester and within request parameters. This kind of validation prevents subtle runtime issues and makes the configuration errors more apparent.I notice you've set up a comprehensive test case that attempts to use both approaches simultaneously. Have you considered adding a more detailed error message in the actual implementation? The validation is correct, but wdyt about making the error message specify exactly what's conflicting?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml
(3 hunks)airbyte_cdk/sources/declarative/models/declarative_component_schema.py
(2 hunks)airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
(3 hunks)airbyte_cdk/sources/declarative/requesters/query_properties/query_properties.py
(0 hunks)unit_tests/sources/declarative/parsers/test_model_to_component_factory.py
(3 hunks)
💤 Files with no reviewable changes (1)
- airbyte_cdk/sources/declarative/requesters/query_properties/query_properties.py
🧰 Additional context used
🧬 Code Graph Analysis (1)
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (1)
airbyte_cdk/sources/declarative/requesters/query_properties/properties_from_endpoint.py (1)
PropertiesFromEndpoint
(14-40)
⏰ Context from checks skipped due to timeout of 90000ms (6)
- GitHub Check: Check: 'source-amplitude' (skip=false)
- GitHub Check: Check: 'source-shopify' (skip=false)
- GitHub Check: Check: 'source-hardcoded-records' (skip=false)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (Fast)
🔇 Additional comments (4)
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (1)
2257-2261
: LGTM! This is a nice addition for flexibility in the HTTP requester.The new field allows dynamic property retrieval from endpoints that can be injected into outbound requests via
stream_partition.extra_fields
. This seems especially useful for cases like CRMSearch streams where properties need to be referenced in POST request bodies. The implementation reuses the existingQueryProperties
component while addressing the chunking limitation, which is a clean approach.airbyte_cdk/sources/declarative/declarative_component_schema.yaml (2)
1467-1467
: Formatting cleanup approved: removed extraneous spaces inside the enum forFileUploader
to align with the other component definitions.
2377-2377
: Formatting cleanup approved: removed extraneous spaces inside the enum forKeyTransformation
for consistency with other enums.unit_tests/sources/declarative/parsers/test_model_to_component_factory.py (1)
4219-4299
: Great addition of a test for the newfetch_properties_from_endpoint
feature!This test appropriately verifies that
PropertiesFromEndpoint
can be defined directly on the requester rather than nested inside request parameters. I especially like how you validate that property chunking is properly disabled when using this approach (lines 4282-4283).The test comprehensively checks all aspects of the feature:
- Correct object instantiation
- Property paths set correctly
- Underlying retriever configuration
- No property chunking by default
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for enabling this for request body too
Anatolii tested the feature w/ |
Closes https://github.com/airbytehq/airbyte-internal-issues/issues/12646
While implementing CRMSearch streams for Hubspot, we found a gap where we need to be able to reference the properties retrieved dynamically to inject them into other request inputs. Specifically JSON body for POST requests.
Within these other request inputs, property chunking itself doesn't really make sense because there aren't input limits unlike request parameters.
This implementation does a clever trick where under the hood we reuse the
QueryProperties
runtime component without any property chunking enabled. And by doing so, nearly all of the rest of the flow can be reused. And the interpolation context for the request inputs should still have access to the stream slice. For example:{{ stream_partition.extra_fields['query_properties'] }}
Summary by CodeRabbit