Skip to content

feat(property chunking): Allow PropertiesFromEndpoint to be defined on HttpRequester #507

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Apr 28, 2025

Conversation

brianjlai
Copy link
Contributor

@brianjlai brianjlai commented Apr 25, 2025

Closes https://github.com/airbytehq/airbyte-internal-issues/issues/12646

While implementing CRMSearch streams for Hubspot, we found a gap where we need to be able to reference the properties retrieved dynamically to inject them into other request inputs. Specifically JSON body for POST requests.

Within these other request inputs, property chunking itself doesn't really make sense because there aren't input limits unlike request parameters.

This implementation does a clever trick where under the hood we reuse the QueryProperties runtime component without any property chunking enabled. And by doing so, nearly all of the rest of the flow can be reused. And the interpolation context for the request inputs should still have access to the stream slice. For example: {{ stream_partition.extra_fields['query_properties'] }}

Summary by CodeRabbit

  • New Features
    • Added support for dynamically fetching properties from an API endpoint and injecting them into outbound requests.
  • Bug Fixes
    • Improved validation to prevent conflicting definitions of dynamic property fetching in configurations.
  • Tests
    • Enhanced test coverage for dynamic property fetching scenarios and validation of conflicting configurations.
  • Refactor
    • Streamlined internal logic for detecting and handling dynamic property fetching configurations.
  • Style
    • Cleaned up formatting in configuration schema for improved readability.

@brianjlai brianjlai changed the title draft: Allow PropertiesFromEndpoint to be defined on HttpRequester feat(property chunking): Allow PropertiesFromEndpoint to be defined on HttpRequester Apr 26, 2025
@brianjlai brianjlai marked this pull request as ready for review April 26, 2025 06:03
Copy link
Contributor

coderabbitai bot commented Apr 26, 2025

📝 Walkthrough

Walkthrough

This change introduces a new fetch_properties_from_endpoint property to the HttpRequester schema and model, enabling dynamic retrieval of properties from an API endpoint for use in outbound requests. The factory logic is updated to handle this new property, including validation to prevent conflicting definitions. Associated tests are added and updated to cover these scenarios, including error handling when multiple property-fetching mechanisms are specified. The QueryProperties class has a redundant method removed, and the YAML schema is cleaned up for enum formatting.

Changes

File(s) Change Summary
airbyte_cdk/sources/declarative/declarative_component_schema.yaml Added fetch_properties_from_endpoint property to HttpRequester definition, cleaned up enum formatting.
airbyte_cdk/sources/declarative/models/declarative_component_schema.py Added optional fetch_properties_from_endpoint field to HttpRequester model.
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py Refactored detection of QueryProperties in request_parameters into a static method. Added handling for fetch_properties_from_endpoint, including validation to prevent multiple conflicting definitions. Modified create_simple_retriever logic accordingly.
airbyte_cdk/sources/declarative/requesters/query_properties/query_properties.py Removed the has_multiple_chunks method from the QueryProperties class.
unit_tests/sources/declarative/parsers/test_model_to_component_factory.py Renamed and updated test for PropertiesFromEndpoint in request_parameters. Added tests for fetch_properties_from_endpoint in requester and for error raising when both mechanisms are defined.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Manifest
    participant ModelToComponentFactory
    participant HttpRequester
    participant QueryProperties

    User->>Manifest: Defines HttpRequester with fetch_properties_from_endpoint
    Manifest->>ModelToComponentFactory: Passes manifest for component creation
    ModelToComponentFactory->>HttpRequester: Instantiates HttpRequester
    alt fetch_properties_from_endpoint defined
        ModelToComponentFactory->>QueryProperties: Dynamically creates QueryProperties from endpoint
    end
    ModelToComponentFactory-->>User: Returns configured retriever
Loading

Possibly related PRs

  • airbytehq/airbyte-python-cdk#452: Introduces PropertiesFromEndpoint, QueryProperties, and integrates them into the declarative schema and factory methods for dynamic property fetching and chunking, which is directly related to the handling and validation added in this PR.

Suggested labels

enhancement

Suggested reviewers

  • darynaishchenko
  • maxi297

Would you like to consider adding an integration test that exercises a full end-to-end flow with the new fetch_properties_from_endpoint property, just to ensure nothing is missed in real usage? Wdyt?

✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

‼️ IMPORTANT
Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.

  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (5)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml (1)

1981-1985: Introduce fetch_properties_from_endpoint on HttpRequester
The new optional field correctly references PropertiesFromEndpoint and matches the PR objective of enabling dynamic property injection. Would it be helpful to:

  1. Add a schema‐level oneOf or not constraint to disallow defining both fetch_properties_from_endpoint and QueryProperties in request_parameters at validation time, rather than waiting for the factory error?
  2. Include an example under HttpRequester’s examples to demonstrate how fetch_properties_from_endpoint is used in practice?

Wdyt?

airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (3)

2948-2964: Explicit clash detection is great – consider simplifying the second scan?

Love that we now surface an error when both request_parameters and fetch_properties_from_endpoint define a PropertiesFromEndpoint – much clearer 👏.

Today we (1) call _query_properties_in_request_parameters to know a clash may exist, and (2) iterate a second time over request_parameters to collect QueryPropertiesModel instances. That means two scans of the same dict.

Might it be cleaner to have _query_properties_in_request_parameters return the (key, model) of the first QueryPropertiesModel it finds (or None) and at the same time detect duplicates?
This would:

  • avoid the second loop,
  • centralise the “only one allowed” validation,
  • let the caller directly obtain query_properties_key and the model.

wdyt?

-        if self._query_properties_in_request_parameters(model.requester):
+        qp_result = self._query_properties_in_request_parameters(model.requester)
+        if qp_result:
             ...
-            query_properties_definitions = []
-            for key, request_parameter in model.requester.request_parameters.items():
-                if isinstance(request_parameter, QueryPropertiesModel):
-                    query_properties_key = key
-                    query_properties_definitions.append(request_parameter)
+            query_properties_key, qp_model = qp_result
+            query_properties = self._create_component_from_model(
+                model=qp_model, config=config
+            )

(Fully understand if you prefer to keep the current shape, just throwing the idea out there 😊)


2975-2986: Should fetch_properties_from_endpoint expose more knobs?

When fetch_properties_from_endpoint is used we create a synthetic QueryPropertiesModel with always_include_properties=None and property_chunking=None. That mirrors the “no-chunking” requirement, but it also silently prevents a user from opting into always_include_properties.

Would it make sense to copy over (or allow in the schema) extra fields so callers can still control always_include_properties while leaving property_chunking disabled? Something like:

QueryPropertiesModel(
    type="QueryProperties",
    property_list=model.requester.fetch_properties_from_endpoint,
    always_include_properties=getattr(model.requester, "always_include_properties", None),
    property_chunking=None,
)

No pressure, just wondering if you foresee a need for that flexibility down the road – wdyt?


3108-3119: Minor typing nit: prefer collections.abc.Mapping over typing.Mapping for isinstance

typing.Mapping works today (it aliases to collections.abc.Mapping in ≥3.9), but using it with isinstance was never the intended public API and could become brittle.

Switching to from collections.abc import Mapping (or importing as an alias) would make the intent explicit and future-proof the runtime check – wdyt?

-from typing import Mapping
+from collections.abc import Mapping
...
-        if request_parameters and isinstance(request_parameters, Mapping):
+        if request_parameters and isinstance(request_parameters, Mapping):
unit_tests/sources/declarative/parsers/test_model_to_component_factory.py (1)

4445-4526: Good validation test for conflicting property definitions!

This test ensures we fail early when a developer accidentally defines PropertiesFromEndpoint both directly on the requester and within request parameters. This kind of validation prevents subtle runtime issues and makes the configuration errors more apparent.

I notice you've set up a comprehensive test case that attempts to use both approaches simultaneously. Have you considered adding a more detailed error message in the actual implementation? The validation is correct, but wdyt about making the error message specify exactly what's conflicting?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4bd6048 and 81e918b.

📒 Files selected for processing (5)
  • airbyte_cdk/sources/declarative/declarative_component_schema.yaml (3 hunks)
  • airbyte_cdk/sources/declarative/models/declarative_component_schema.py (2 hunks)
  • airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (3 hunks)
  • airbyte_cdk/sources/declarative/requesters/query_properties/query_properties.py (0 hunks)
  • unit_tests/sources/declarative/parsers/test_model_to_component_factory.py (3 hunks)
💤 Files with no reviewable changes (1)
  • airbyte_cdk/sources/declarative/requesters/query_properties/query_properties.py
🧰 Additional context used
🧬 Code Graph Analysis (1)
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (1)
airbyte_cdk/sources/declarative/requesters/query_properties/properties_from_endpoint.py (1)
  • PropertiesFromEndpoint (14-40)
⏰ Context from checks skipped due to timeout of 90000ms (6)
  • GitHub Check: Check: 'source-amplitude' (skip=false)
  • GitHub Check: Check: 'source-shopify' (skip=false)
  • GitHub Check: Check: 'source-hardcoded-records' (skip=false)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Pytest (Fast)
🔇 Additional comments (4)
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (1)

2257-2261: LGTM! This is a nice addition for flexibility in the HTTP requester.

The new field allows dynamic property retrieval from endpoints that can be injected into outbound requests via stream_partition.extra_fields. This seems especially useful for cases like CRMSearch streams where properties need to be referenced in POST request bodies. The implementation reuses the existing QueryProperties component while addressing the chunking limitation, which is a clean approach.

airbyte_cdk/sources/declarative/declarative_component_schema.yaml (2)

1467-1467: Formatting cleanup approved: removed extraneous spaces inside the enum for FileUploader to align with the other component definitions.


2377-2377: Formatting cleanup approved: removed extraneous spaces inside the enum for KeyTransformation for consistency with other enums.

unit_tests/sources/declarative/parsers/test_model_to_component_factory.py (1)

4219-4299: Great addition of a test for the new fetch_properties_from_endpoint feature!

This test appropriately verifies that PropertiesFromEndpoint can be defined directly on the requester rather than nested inside request parameters. I especially like how you validate that property chunking is properly disabled when using this approach (lines 4282-4283).

The test comprehensively checks all aspects of the feature:

  • Correct object instantiation
  • Property paths set correctly
  • Underlying retriever configuration
  • No property chunking by default

Copy link
Contributor

@maxi297 maxi297 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for enabling this for request body too

@brianjlai
Copy link
Contributor Author

Anatolii tested the feature w/ associations stream. Appears to work so we should be set to merge

@brianjlai brianjlai merged commit 0006ceb into main Apr 28, 2025
33 of 35 checks passed
@brianjlai brianjlai deleted the brian/properties_from_endpoint_to_requester branch April 28, 2025 20:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants