Skip to content

fix: (CDK) (HttpRequester) - Make the HttpRequester.path optional #370

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Mar 4, 2025

Conversation

bazarnov
Copy link
Contributor

@bazarnov bazarnov commented Feb 27, 2025

What

Resolving:

How

  • For the airbyte_cdk/sources/declarative/declarative_component_schema.yaml:

    • Removed the path as a required property.
    • Added new interpolation contexts such as next_page_token, stream_interval, stream_partition, etc.
    • Updated examples to include new interpolation contexts in URL paths.
  • For the airbyte_cdk/sources/declarative/models/declarative_component_schema.py:

    • Made the path property optional in the HttpRequester class.
    • Added new examples for the path property.
  • For the airbyte_cdk/sources/declarative/requesters/http_requester.py:

    • Added EmptyString and made path optional in the HttpRequester class.
    • Implemented a method to handle empty string interpolation.
    • Added detailed docstrings and methods to handle URL base and path interpolation.
  • For the airbyte_cdk/sources/declarative/requesters/requester.py:

    • Updated get_url_base method to accept and use interpolation contexts.
  • For the airbyte_cdk/sources/types.py:

    • Introduced EmptyString as a type.
  • For the unit_tests/sources/declarative/requesters/test_http_requester.py:

    • Adjusted tests to match new URL handling, ensuring no trailing slashes.

User Impact

This update introduces the Requester interface change, which adds the ability for get_url_base() to utilize the interpolation context passed, such as stream_state, stream_slice / stream_interval, and next_page_token.

Sources that reuse the interface may need to update the interface for the get_url_base() method if it has been overridden. Otherwise, we anticipate no breaking changes from this CDK update.

Summary by CodeRabbit

Summary by CodeRabbit

  • New Features
    • Introduced dynamic API configuration options by supporting additional interpolation values (e.g., pagination tokens, stream intervals, partitions, and slices) for constructing URLs.
  • Enhancements
    • Improved URL formation for better consistency by allowing optional endpoint paths and more reliable handling of trailing slashes.
  • Tests
    • Added new tests to validate the behavior of the DefaultPaginator class and ensure correct path generation with various conditions.

@bazarnov bazarnov self-assigned this Feb 27, 2025
@github-actions github-actions bot added bug Something isn't working security labels Feb 27, 2025
Copy link
Contributor

coderabbitai bot commented Feb 27, 2025

📝 Walkthrough

Walkthrough

This PR updates the HTTP request construction logic. In the declarative component schema and its corresponding model, the path attribute is transitioned from a required field to an optional one and enhanced with additional interpolation contexts. In the requesters’ implementation, method signatures are updated to accept optional parameters (stream_state, stream_slice, next_page_token), a new _get_interpolation_context method is introduced, and URL joining logic is refined (e.g., handling trailing slashes). Additionally, a new variable (EmptyString) is added, and unit tests have been updated to reflect these changes.

Changes

File(s) Change Summary
airbyte_cdk/sources/declarative/declarative_component_schema.yaml
airbyte_cdk/sources/declarative/models/declarative_component_schema.py
Removed the required path property (in YAML) and changed it to an optional attribute (in Python) with a default of None; added new interpolation contexts (next_page_token, stream_interval, stream_partition, stream_slice, creation_response, polling_response, download_target) to both url_base and path; updated examples.
airbyte_cdk/sources/declarative/requesters/http_requester.py
airbyte_cdk/sources/declarative/requesters/requester.py
Updated method signatures to include optional parameters (stream_state, stream_slice, next_page_token); introduced _get_interpolation_context for creating interpolation data; improved URL construction logic (including handling of trailing and duplicate slashes).
airbyte_cdk/sources/types.py Added new variable EmptyString initialized as an empty string.
unit_tests/sources/declarative/requesters/test_http_requester.py Updated test assertions by removing the trailing slash in the expected URL output.
airbyte_cdk/sources/declarative/requesters/paginators/default_paginator.py
airbyte_cdk/sources/declarative/requesters/paginators/no_pagination.py
airbyte_cdk/sources/declarative/requesters/paginators/paginator.py
Updated path method signatures to include optional parameters (stream_state, stream_slice); enhanced pagination logic to utilize these parameters.
airbyte_cdk/sources/declarative/retrievers/simple_retriever.py Updated _paginator_path and _fetch_next_page methods to accept and pass new parameters (stream_state, stream_slice).
airbyte_cdk/utils/mapping_helpers.py Added get_interpolation_context function to create a context for interpolation based on new parameters.
unit_tests/sources/declarative/requesters/paginators/test_default_paginator.py Introduced new tests for DefaultPaginator to verify path method behavior with new parameters.

Sequence Diagram(s)

sequenceDiagram
    participant C as Client
    participant HR as HttpRequester
    participant R as Requester
    C->>HR: Call get_url_base(stream_state, stream_slice, next_page_token)
    HR->>HR: Build interpolation context (_get_interpolation_context)
    HR->>HR: Format 'url_base' with context
    HR->>HR: Evaluate optional 'path'
    HR->>HR: Call _join_url to merge base and path
    HR-->>C: Return constructed URL
    C->>R: Call get_url_base with new parameters
    R-->>C: Return URL
Loading

Possibly related PRs

Suggested labels

enhancement

Suggested reviewers

  • maxi297 – wdyt?
✨ Finishing Touches
  • 📝 Generate Docstrings

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

‼️ IMPORTANT
Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.

  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
airbyte_cdk/sources/declarative/requesters/http_requester.py (1)

118-134: Good refactoring with _get_interpolation_context method.

Extracting the interpolation context creation logic into a separate method reduces duplication and centralizes this functionality. I like how you're handling the extra_fields from stream_slice too.

One minor suggestion: would it make sense to also include stream_state in the returned context since it's passed as a parameter? wdyt?

airbyte_cdk/sources/declarative/declarative_component_schema.yaml (2)

1769-1775: Enhance URL base interpolation contexts.
I see you've added additional interpolation keys (next_page_token, stream_interval, stream_partition, stream_slice, creation_response, polling_response, and download_target) to the url_base property. This should provide extra flexibility for dynamic URL construction. Have you verified that all downstream components correctly support these additional contexts? wdyt?


1781-1798: Make the path argument optional.
I noticed that the path property now mirrors many aspects of the url_base (including multiple interpolation contexts) and is no longer listed as a required property. This change aligns with the PR objective to make the path argument optional. Would you consider adding a brief note in the description explicitly stating that this field is now optional to help guide implementers? wdyt?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 406542d and 0235a1c.

📒 Files selected for processing (6)
  • airbyte_cdk/sources/declarative/declarative_component_schema.yaml (1 hunks)
  • airbyte_cdk/sources/declarative/models/declarative_component_schema.py (1 hunks)
  • airbyte_cdk/sources/declarative/requesters/http_requester.py (6 hunks)
  • airbyte_cdk/sources/declarative/requesters/requester.py (1 hunks)
  • airbyte_cdk/sources/types.py (1 hunks)
  • unit_tests/sources/declarative/requesters/test_http_requester.py (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • airbyte_cdk/sources/types.py
⏰ Context from checks skipped due to timeout of 90000ms (8)
  • GitHub Check: Check: 'source-pokeapi' (skip=false)
  • GitHub Check: Check: 'source-amplitude' (skip=false)
  • GitHub Check: Check: 'source-shopify' (skip=false)
  • GitHub Check: Check: 'source-hardcoded-records' (skip=false)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (Fast)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Analyze (python)
🔇 Additional comments (11)
unit_tests/sources/declarative/requesters/test_http_requester.py (1)

828-828: This change aligns with the URL construction updates.

The test now verifies that trailing slashes from paths are correctly removed in the final URL, which makes the URL construction more consistent with common practices.

airbyte_cdk/sources/declarative/requesters/requester.py (1)

38-44: Good enhancement of the get_url_base method signature.

The method now accepts additional context parameters (stream_state, stream_slice, next_page_token), making it consistent with other methods like get_path. This allows for dynamic URL construction based on runtime context, enabling more flexible API integration patterns. This is the core change that addresses the PR objective.

airbyte_cdk/sources/declarative/models/declarative_component_schema.py (2)

2051-2053: Enhanced examples for url_base with interpolation contexts.

These new examples effectively demonstrate how to use the new capabilities for dynamic URL construction, including stream partition IDs and next page tokens. Great additions that will help users understand how to use the feature.


2057-2058: Making path optional is the key change in this PR.

By changing path from required to optional with a default value of None, you've successfully implemented the core requirement of the PR. This provides more flexibility for URL construction while maintaining backward compatibility.

airbyte_cdk/sources/declarative/requesters/http_requester.py (6)

28-28: Nice addition of EmptyString to import statement.

I see you've added EmptyString to the imports. This will be used to handle the new optional path parameter, making the code more robust when no path is provided.


55-55: Good design choice making path optional.

Making the path parameter optional aligns well with the PR objective of allowing more flexibility in URL construction, especially when only the base URL might be sufficient.


70-72: Nice handling of optional path value.

Using EmptyString as a fallback when path is None is a clean solution. This ensures that the interpolation logic works consistently whether a path is provided or not.


135-148: Great enhancement to get_url_base.

The method now accepts the same parameters as get_path, providing consistency in the interface. Using the new interpolation context method ensures that dynamic values from stream_slice and next_page_token can now be used in the URL base construction.

The use of os.path.join with EmptyString is clever to ensure we get a properly formatted URL.


149-162: Nicely updated get_path method.

Making the parameters optional with default values of None maintains backward compatibility while adding new functionality. The method now leverages the shared interpolation context logic, making the code more maintainable.


359-372: Excellent docstring addition to _join_url.

The detailed docstring clearly explains the purpose and behavior of this method, particularly regarding trailing slash handling. This kind of documentation is very helpful for both users and maintainers of the code.

airbyte_cdk/sources/declarative/declarative_component_schema.yaml (1)

1778-1780: Update URL base examples for dynamic interpolation.
The new examples now clearly illustrate the dynamic generation of the URL base (using expressions like {{ stream_partition['id'] }} and {{ next_page_token['id'] }}). Do these updated examples fully capture your use cases for various API configurations? wdyt?

@bazarnov bazarnov changed the title fix: (CDK) (HttpRequester) - Make the path arg optional fix: (CDK) (HttpRequester) - Make the HttpRequester.path optional Feb 27, 2025
@bazarnov bazarnov requested review from bnchrch, maxi297 and lmossman and removed request for bnchrch February 27, 2025 16:09
Copy link
Contributor

@lmossman lmossman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

YAML changes look good to me! Thanks Baz

Copy link
Contributor

@maxi297 maxi297 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I shared a concern which I think is blocking for this PR

@bazarnov bazarnov requested a review from maxi297 February 28, 2025 14:03
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
airbyte_cdk/sources/declarative/requesters/paginators/paginator.py (1)

52-57: Updated abstract method signature to support additional context.

The path method signature now includes optional stream_state and stream_slice parameters, which enhances the flexibility of URL path generation. This is a good change that supports the PR objective of making the HTTP request construction more flexible.

Just curious, should we update the method docstring as well to reflect these new parameters? wdyt?

airbyte_cdk/sources/declarative/retrievers/simple_retriever.py (1)

243-247: Update the docstring to reflect new parameters?

The docstring hasn't been updated to include descriptions for the new stream_state and stream_slice parameters. Would you consider enhancing it for better developer experience? wdyt?

    """
    If the paginator points to a path, follow it, else return nothing so the requester is used.
    :param next_page_token:
+   :param stream_state: The current state of the stream
+   :param stream_slice: The slice of the stream to process
    :return:
    """
airbyte_cdk/sources/declarative/declarative_component_schema.yaml (2)

1807-1814: Enhanced Interpolation Context in url_base
You've expanded the interpolation_context for the url_base property to include new variables such as next_page_token, stream_interval, stream_partition, stream_slice, creation_response, polling_response, and download_target. This update broadens the flexibility of URL construction. Could you please confirm that all components using this context always supply appropriate values for these variables? wdyt?


1824-1832: Updated path Property Interpolation and Optionality
The path property now includes an expanded interpolation_context similar to url_base. Since the PR objective was to make the path optional, please ensure that downstream processing correctly handles cases where path is null or missing. Would you consider adding or updating tests to validate that behavior? wdyt?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0235a1c and a4fb3eb.

📒 Files selected for processing (9)
  • airbyte_cdk/sources/declarative/declarative_component_schema.yaml (1 hunks)
  • airbyte_cdk/sources/declarative/models/declarative_component_schema.py (3 hunks)
  • airbyte_cdk/sources/declarative/requesters/http_requester.py (6 hunks)
  • airbyte_cdk/sources/declarative/requesters/paginators/default_paginator.py (3 hunks)
  • airbyte_cdk/sources/declarative/requesters/paginators/no_pagination.py (1 hunks)
  • airbyte_cdk/sources/declarative/requesters/paginators/paginator.py (2 hunks)
  • airbyte_cdk/sources/declarative/retrievers/simple_retriever.py (3 hunks)
  • airbyte_cdk/utils/mapping_helpers.py (2 hunks)
  • unit_tests/sources/declarative/requesters/paginators/test_default_paginator.py (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • airbyte_cdk/sources/declarative/requesters/http_requester.py
  • airbyte_cdk/sources/declarative/models/declarative_component_schema.py
⏰ Context from checks skipped due to timeout of 90000ms (7)
  • GitHub Check: Check: 'source-pokeapi' (skip=false)
  • GitHub Check: Check: 'source-amplitude' (skip=false)
  • GitHub Check: Check: 'source-shopify' (skip=false)
  • GitHub Check: Check: 'source-hardcoded-records' (skip=false)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Pytest (Fast)
🔇 Additional comments (14)
airbyte_cdk/sources/declarative/requesters/paginators/no_pagination.py (1)

22-27: LGTM! Method signature updated to match abstract method.

The signature has been correctly updated to match the new abstract method signature in the Paginator class. This change allows for additional context via stream_state and stream_slice to be provided, which aligns with the PR objective of enhancing URL path handling.

unit_tests/sources/declarative/requesters/paginators/test_default_paginator.py (3)

478-490: Good test coverage for edge case when token is None.

This test ensures the path method returns None when no token is provided, which is an important edge case to verify.


492-511: LGTM! Testing RequestOption vs RequestPath behavior.

This test validates the behavior when using a different option type (RequestOption instead of RequestPath), ensuring correct handling of different page token option types.


513-549: Comprehensive test for interpolation context with stream_slice.

This test thoroughly validates that the path method correctly uses the additional interpolation context from stream_slice and properly handles token values. The test covers a complex StreamSlice object with partition, cursor_slice, and extra_fields, which is a good representation of real-world scenarios.

airbyte_cdk/sources/declarative/requesters/paginators/paginator.py (1)

14-14: LGTM! Added StreamSlice import.

The StreamSlice import is correctly added to support the updated method signature.

airbyte_cdk/utils/mapping_helpers.py (2)

13-13: Updated imports to include StreamSlice and StreamState.

The new imports support the type hinting for the new get_interpolation_context function.


148-162: Well-implemented utility function for creating interpolation context.

This new function consolidates logic for creating an interpolation context with stream state, stream slice, and next page token. The implementation correctly handles extra fields from the stream slice when available.

I like how this centralizes the context creation logic that can be reused across different components. It's a clean implementation that handles the edge case when stream_slice is None or doesn't have extra_fields.

airbyte_cdk/sources/declarative/requesters/paginators/default_paginator.py (2)

154-172: Good enhancement to the path method!

This change adds support for additional context (stream_state, stream_slice) when constructing the path for pagination. The addition of the interpolation context using these parameters will make URL construction more flexible.

One thought: Have you considered updating the docstring for this method to reflect the new parameters? This would help future developers understand the expanded functionality. wdyt?


273-284: The decorator update looks correct!

The PaginatorTestReadDecorator's path method has been properly updated to match the new signature and pass all parameters to the decorated paginator.

airbyte_cdk/sources/declarative/retrievers/simple_retriever.py (4)

237-242: Good enhancement of the paginator path method

The expansion of the method signature to include stream_state and stream_slice aligns perfectly with the PR objectives. This allows for more dynamic URL construction using additional contextual information.


248-252: Clean implementation passing context to paginator

The implementation correctly passes all parameters to the paginator's path method, which ensures consistent behavior with the enhanced functionality.


311-315: Consistent context propagation

This change properly propagates the stream state and slice contexts to the paginator, ensuring that the URL construction has access to all relevant data.


586-590: Test decorator also properly updated

Good job ensuring that the test decorator class also passes the same parameters, maintaining consistency between the main implementation and the test helper.

airbyte_cdk/sources/declarative/declarative_component_schema.yaml (1)

1817-1819: Updated url_base Examples
The examples now show dynamic usage of the new interpolation contexts (e.g. conditional expressions, using stream_partition['id'] and next_page_token['id']). Are these examples reflective of actual usage scenarios in production? Perhaps a quick run with sample configurations would be helpful. wdyt?

Copy link
Contributor

@maxi297 maxi297 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able to confirm that things are fine with this test. I'm good with this change. Thanks for the fix @bazarnov

@bazarnov bazarnov merged commit 0edebbe into main Mar 4, 2025
24 checks passed
@bazarnov bazarnov deleted the baz/cdk/http-requester-make-path-optional branch March 4, 2025 16:58
rpopov added a commit to rpopov/airbyte-python-cdk that referenced this pull request Mar 5, 2025
* main:
  fix: update cryptography package to latest version to address CVE (airbytehq#377)
  fix: (CDK) (HttpRequester) - Make the `HttpRequester.path` optional (airbytehq#370)
  feat: improved custom components handling (airbytehq#350)
  feat: add microseconds timestamp format (airbytehq#373)
  fix: Replace Unidecode with anyascii for permissive license (airbytehq#367)
  feat: add IncrementingCountCursor (airbytehq#346)
  feat: (low-code cdk)  datetime format with milliseconds (airbytehq#369)
  fix: (CDK) (AsyncRetriever) - Improve UX on variable naming and interpolation (airbytehq#368)
  fix: (CDK) (AsyncRetriever) - Add the `request` and `response` to each `async` operations (airbytehq#356)
  fix: (CDK) (ConnectorBuilder) - Add `auxiliary requests` to slice; support `TestRead` for AsyncRetriever (part 1/2) (airbytehq#355)
  feat(concurrent perpartition cursor): Add parent state updates (airbytehq#343)
  fix: update csv parser for builder compatibility (airbytehq#364)
  feat(low-code cdk): add interpolation for limit field in Rate (airbytehq#353)
  feat(low-code cdk): add AbstractStreamFacade processing as concurrent streams in declarative source (airbytehq#347)
  fix: (CDK) (CsvParser) - Fix the `\\` escaping when passing the `delimiter` from Builder's UI (airbytehq#358)
  feat: expose `str_to_datetime` jinja macro (airbytehq#351)
  fix: update CDK migration for 6.34.0 (airbytehq#348)
  feat: Removes `stream_state` interpolation from CDK (airbytehq#320)
  fix(declarative): Pass `extra_fields` in `global_substream_cursor` (airbytehq#195)
  feat(concurrent perpartition cursor): Refactor ConcurrentPerPartitionCursor (airbytehq#331)
  feat(HttpMocker): adding support for PUT requests and bytes responses (airbytehq#342)
  chore: use certified source for manifest-only test (airbytehq#338)
  feat: check for request_option mapping conflicts in individual components (airbytehq#328)
  feat(file-based): sync file acl permissions and identities (airbytehq#260)
  fix: (CDK) (Connector Builder) - refactor the `MessageGrouper` > `TestRead` (airbytehq#332)
  fix(low code): Fix missing cursor for ClientSideIncrementalRecordFilterDecorator (airbytehq#334)
  feat(low-code): Add API Budget (airbytehq#314)
  chore(decoder): clean decoders and make csvdecoder available (airbytehq#326)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working security
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants