Skip to content

feat(low-code cdk): add dynamic schema loader #104

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 25 commits into from
Dec 12, 2024

Conversation

lazebnyi
Copy link
Contributor

@lazebnyi lazebnyi commented Dec 3, 2024

What

This PR add dynamic schema loader that can dynamically generate schema based on response.|
Fixed: https://github.com/airbytehq/airbyte-internal-issues/issues/10782

How

Described schema, models and data classes for DynamicSchemaLoader, SchemaTypeIdentifier and TypesPair. Defined behavior for schema resolving. Integrated the dynamic schema handling into the existing framework.

Summary by CodeRabbit

Release Notes

  • New Features

    • Introduced three new components for enhanced dynamic schema handling: TypesMap, SchemaTypeIdentifier, and DynamicSchemaLoader.
    • Expanded type mappings in the manifest to support the new components.
  • Bug Fixes

    • Improved error handling in the DynamicSchemaLoader for invalid keys and types.
  • Tests

    • Added comprehensive unit tests for the DynamicSchemaLoader to validate functionality and error handling.

@github-actions github-actions bot added the enhancement New feature or request label Dec 3, 2024
@lazebnyi lazebnyi marked this pull request as ready for review December 3, 2024 03:50
@lazebnyi
Copy link
Contributor Author

lazebnyi commented Dec 3, 2024

/autofix

Auto-Fix Job Info

This job attempts to auto-fix any linting or formating issues. If any fixes are made,
those changes will be automatically committed and pushed back to the PR.

Note: This job can only be run by maintainers. On PRs from forks, this command requires
that the PR author has enabled the Allow edits from maintainers option.

PR auto-fix job started... Check job output.

✅ Changes applied successfully.

Copy link
Contributor

coderabbitai bot commented Dec 3, 2024

Warning

Rate limit exceeded

@lazebnyi has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 2 minutes and 59 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between d7e4873 and 0dbea5d.

📒 Files selected for processing (1)
  • unit_tests/sources/declarative/schema/test_dynamic_schema_loader.py (1 hunks)
📝 Walkthrough
📝 Walkthrough
📝 Walkthrough

Walkthrough

This pull request introduces three new components—TypesMap, SchemaTypeIdentifier, and DynamicSchemaLoader—to the declarative_component_schema.yaml and corresponding classes in declarative_component_schema.py. It also updates the DEFAULT_MODEL_TYPES mapping in manifest_component_transformer.py and enhances the ModelToComponentFactory with methods for creating these new components. Minor formatting changes were made to existing components for clarity. Additionally, a new file for unit tests of the DynamicSchemaLoader was added, ensuring robust testing of its functionality.

Changes

File Change Summary
airbyte_cdk/sources/declarative/declarative_component_schema.yaml Added definitions for TypesMap, SchemaTypeIdentifier, and DynamicSchemaLoader.
airbyte_cdk/sources/declarative/models/declarative_component_schema.py Introduced new classes: TypesMap, SchemaTypeIdentifier, and DynamicSchemaLoader. Minor updates to OAuthConfigSpecification descriptions.
airbyte_cdk/sources/declarative/parsers/manifest_component_transformer.py Updated DEFAULT_MODEL_TYPES with new mappings for DynamicSchemaLoader.retriever and SchemaTypeIdentifier.types_map.
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py Added methods: create_dynamic_schema_loader, create_schema_type_identifier, and create_types_map. Refactored _merge_stream_slicers.
airbyte_cdk/sources/declarative/schema/__init__.py Added imports for DynamicSchemaLoader, TypesMap, and SchemaTypeIdentifier. Updated __all__ list.
airbyte_cdk/sources/declarative/schema/dynamic_schema_loader.py Introduced classes and methods for dynamic schema processing, including get_json_schema and various helper methods.
unit_tests/sources/declarative/schema/test_dynamic_schema_loader.py Created unit tests for DynamicSchemaLoader, including tests for valid and invalid scenarios.

Possibly related PRs

Suggested reviewers

  • natikgadzhi
  • maxi297

Would you like to consider any additional reviewers or related PRs? Wdyt?


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

‼️ IMPORTANT
Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.

  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Experiment)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Outside diff range and nitpick comments (7)
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (2)

1575-1581: Adjust default values for name and primary_key fields

Would you consider setting the default values of name and primary_key to None instead of an empty string? This can help avoid confusion with optional fields and ensure consistency. Wdyt?

Proposed change:

name: Optional[str] = Field(
-   "", description="The stream name.", example=["Users"], title="Name"
+   None, description="The stream name.", example="Users", title="Name"
)
primary_key: Optional[PrimaryKey] = Field(
-   "", description="The primary key of the stream.", title="Primary Key"
+   None, description="The primary key of the stream.", title="Primary Key"
)

654-679: Suggestion on is_nullable field in SchemaTypeIdentifier

Would it be clearer to use is_nullable: bool = Field(True, ...) instead of Optional[bool] since it has a default value? This can improve type clarity. Wdyt?

Proposed change:

- is_nullable: Optional[bool] = Field(
+ is_nullable: bool = Field(
    True,
    description="Add null to defined field type. This field is automatically set by the CDK.",
    title="Is Nullable",
)
unit_tests/sources/declarative/schema/test_dynamic_schema_loader.py (2)

12-48: Consider adding docstrings to fixtures?

The fixtures are well-structured, but adding docstrings would make their purpose and setup more clear to other developers. Here's a suggested format, wdyt?

 @pytest.fixture
 def mock_retriever():
+    """
+    Creates a mock retriever that returns predefined schema records.
+    Returns:
+        MagicMock: A mock retriever with read_records method returning sample schema data.
+    """
     retriever = MagicMock()

51-128: The test cases look great! Consider adding a docstring?

The parameterized test cases provide excellent coverage. Adding a docstring to explain the test's purpose and parameters would enhance documentation, wdyt?

 def test_dynamic_schema_loader(dynamic_schema_loader, retriever_data, expected_schema):
+    """
+    Tests DynamicSchemaLoader's schema generation for various input scenarios.
+    
+    Args:
+        dynamic_schema_loader: Fixture providing the loader instance
+        retriever_data: Input data simulating API responses
+        expected_schema: Expected JSON schema output
+    """
     dynamic_schema_loader.retriever.read_records.return_value = retriever_data
airbyte_cdk/sources/declarative/declarative_component_schema.yaml (3)

1666-1686: Consider adding examples for TypesPair?

The schema is well-defined, but adding examples would help users understand the expected format, wdyt?

       current_type:
         anyOf:
           - type: string
           - type: array
             items:
               type: string
+        examples:
+          - "string"
+          - ["string", "integer"]

1687-1732: Consider adding pattern validation for pointers?

The schema pointers could benefit from regex patterns to ensure valid paths, wdyt?

       schema_pointer:
         title: Schema Path
         description: List of potentially nested fields describing the full path of the field to extract.
         type: array
         items:
           - type: string
+          pattern: "^[a-zA-Z_][a-zA-Z0-9_]*$"

1733-1756: Consider enhancing experimental status documentation?

The experimental status warning could be more detailed about potential risks and limitations, wdyt?

-    description: (This component is experimental. Use at your own risk.) Loads a schema by extracting data from retrieved records.
+    description: |
+      (This component is experimental. Use at your own risk.) 
+      Loads a schema by extracting data from retrieved records.
+      
+      Note: As an experimental feature:
+      - The interface may change in non-backward compatible ways
+      - Support is limited
+      - Not recommended for production use without thorough testing
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 9587d4e and edb52e7.

📒 Files selected for processing (8)
  • airbyte_cdk/sources/declarative/declarative_component_schema.yaml (5 hunks)
  • airbyte_cdk/sources/declarative/models/declarative_component_schema.py (11 hunks)
  • airbyte_cdk/sources/declarative/parsers/manifest_component_transformer.py (1 hunks)
  • airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (9 hunks)
  • airbyte_cdk/sources/declarative/partition_routers/__init__.py (1 hunks)
  • airbyte_cdk/sources/declarative/schema/__init__.py (1 hunks)
  • airbyte_cdk/sources/declarative/schema/dynamic_schema_loader.py (1 hunks)
  • unit_tests/sources/declarative/schema/test_dynamic_schema_loader.py (1 hunks)
🔇 Additional comments (6)
airbyte_cdk/sources/declarative/schema/dynamic_schema_loader.py (1)

106-128: Overall implementation looks solid

The get_json_schema method effectively constructs the JSON Schema dynamically. Good job on handling the iterations and data extraction properly.

airbyte_cdk/sources/declarative/models/declarative_component_schema.py (1)

650-653: Definition of TypesPair looks good

The TypesPair class correctly models the mapping between current and target types. Nice addition!

airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)

1570-1629: Implementation of new factory methods

The methods create_types_pair, create_schema_type_identifier, and create_dynamic_schema_loader are well-implemented and integrate seamlessly with the factory pattern. Great work!

airbyte_cdk/sources/declarative/schema/__init__.py (1)

9-11: Exporting new schema components

Including DynamicSchemaLoader, TypesPair, and SchemaTypeIdentifier in __all__ makes sense. This ensures they are available for import elsewhere.

airbyte_cdk/sources/declarative/partition_routers/__init__.py (1)

9-11: Adding PartitionRouter to exports

Adding PartitionRouter to the __all__ list appropriately exposes it as a public interface. Good job!

airbyte_cdk/sources/declarative/parsers/manifest_component_transformer.py (1)

61-64: LGTM! The new mappings look well-structured. 🎯

The additions follow the existing pattern nicely and align with the dynamic schema loader feature. Just to double-check our integration - would it make sense to verify that these components are properly registered in their respective definition files? wdyt?

✅ Verification successful

Looking good! The component mappings are correctly aligned with their implementations. 👍

The verification shows that both mappings in manifest_component_transformer.py are valid:

  • DynamicSchemaLoader.retriever maps to SimpleRetriever, which is a valid option in the component schema
  • SchemaTypeIdentifier.types_map maps to TypesPair, which is correctly defined and used in the schema type identifier class

Both components are properly defined in the codebase with their respective dependencies, and the mappings follow the established patterns. The experimental nature of these components is also consistently documented.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the component definitions exist and match the mapped types

# Check for DynamicSchemaLoader definition and SimpleRetriever type
echo "Checking DynamicSchemaLoader definition..."
rg -l "class DynamicSchemaLoader.*SimpleRetriever" 

# Check for SchemaTypeIdentifier definition and TypesPair type
echo "Checking SchemaTypeIdentifier definition..."
rg -l "class SchemaTypeIdentifier.*TypesPair"

Length of output: 292


Script:

#!/bin/bash
# Let's try a different approach with ast-grep to find these class definitions
# and their relationships with the mapped types

echo "Checking DynamicSchemaLoader definition..."
ast-grep --pattern 'class DynamicSchemaLoader {
  $$$
}'

echo -e "\nChecking SchemaTypeIdentifier definition..."
ast-grep --pattern 'class SchemaTypeIdentifier {
  $$$
}'

echo -e "\nChecking for TypesPair usage..."
rg "TypesPair" -A 2 -B 2

echo -e "\nChecking for SimpleRetriever usage..."
rg "SimpleRetriever" -A 2 -B 2

Length of output: 66058


Script:

#!/bin/bash
# Let's check for the DynamicSchemaLoader and SchemaTypeIdentifier class definitions
# and their relationship with the mapped types in the schema files

echo "Checking DynamicSchemaLoader in schema files..."
rg "DynamicSchemaLoader" airbyte_cdk/sources/declarative/schema/ -A 5 -B 5

echo -e "\nChecking SchemaTypeIdentifier in schema files..."
rg "SchemaTypeIdentifier" airbyte_cdk/sources/declarative/schema/ -A 5 -B 5

echo -e "\nChecking component schema definitions..."
rg "DynamicSchemaLoader|SchemaTypeIdentifier" airbyte_cdk/sources/declarative/declarative_component_schema.yaml -A 5 -B 5

Length of output: 7591

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (1)
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (1)

851-851: Minor grammatical corrections in descriptions.

Consider decapitalizing "Specific" in the descriptions and refining wording for clarity. For example, changing "The DeclarativeOAuth Specific list..." to "The DeclarativeOAuth specific list...". WDYT?

You could apply these diffs:

For line 851:

-        description="The DeclarativeOAuth Specific list of strings to indicate which keys should be extracted and returned back to the input config.",
+        description="The DeclarativeOAuth specific list of strings indicating which keys should be extracted and returned to the input config.",

For line 857:

-        description="The DeclarativeOAuth Specific object to provide the criteria of how the `state` query param should be constructed,\nincluding length and complexity.",
+        description="The DeclarativeOAuth specific object providing the criteria for how the `state` query parameter should be constructed, including length and complexity.",

For line 881:

-        description="The DeclarativeOAuth Specific optional override to provide the custom `state` key name, if required by data-provider.",
+        description="The DeclarativeOAuth specific optional override to provide the custom `state` key name, if required by the data provider.",

For line 887:

-        description="The DeclarativeOAuth Specific optional override to provide the custom `code` key name to something like `auth_code` or `custom_auth_code`, if required by data-provider.",
+        description="The DeclarativeOAuth specific optional override to provide the custom `code` key name, such as `auth_code` or `custom_auth_code`, if required by the data provider.",

Also applies to: 857-857, 881-881, 887-887

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between edb52e7 and 7387131.

📒 Files selected for processing (1)
  • airbyte_cdk/sources/declarative/models/declarative_component_schema.py (5 hunks)
🔇 Additional comments (1)
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (1)

648-650: LGTM!

The TypesPair class definition looks correct and follows code conventions.

…tehq/airbyte-python-cdk into lazebnyi/add-dynamic-schema-loader
@lazebnyi lazebnyi requested a review from maxi297 December 3, 2024 04:45
Copy link
Contributor

@maxi297 maxi297 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments from sync discussion

@aldogonzalez8 can you have a look a this? You'll have to use this for google sheets

@lazebnyi lazebnyi requested a review from maxi297 December 3, 2024 17:20
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (5)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)

1585-1610: Consider simplifying the pointer list comprehensions.

The list comprehensions for schema_pointer, key_pointer, and type_pointer could be made more readable. WDYT about:

-        model_schema_pointer: List[Union[InterpolatedString, str]] = (
-            [x for x in model.schema_pointer] if model.schema_pointer else []
-        )
-        model_key_pointer: List[Union[InterpolatedString, str]] = [x for x in model.key_pointer]
-        model_type_pointer: Optional[List[Union[InterpolatedString, str]]] = (
-            [x for x in model.type_pointer] if model.type_pointer else None
-        )
+        model_schema_pointer = list(model.schema_pointer) if model.schema_pointer else []
+        model_key_pointer = list(model.key_pointer)
+        model_type_pointer = list(model.type_pointer) if model.type_pointer else None
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (1)

653-656: Consider adding docstrings for better documentation?

The TypesMap class looks good, but would it be helpful to add docstrings explaining the purpose of target_type and current_type fields and their relationship? This could help other developers understand how to use this mapping. WDYT?

airbyte_cdk/sources/declarative/schema/dynamic_schema_loader.py (3)

112-121: Consider handling the case when no records are retrieved.

Currently, if retrieved_record is None, raw_schema is set to an empty list, resulting in an empty properties in the generated JSON Schema. Would you consider adding error handling or logging to inform users when no records are retrieved? Wdyt?

Possible code change:

 def get_json_schema(self) -> Mapping[str, Any]:
     """
     Constructs a JSON Schema based on retrieved data.
     """
     properties = {}
     retrieved_record = next(self.retriever.read_records({}), None)

+    if retrieved_record is None:
+        # Log a warning or raise an exception
+        raise ValueError("No records retrieved. Cannot generate schema.")

     raw_schema = (
         self._extract_data(
             retrieved_record,
             self.schema_type_identifier.schema_pointer,
         )
         if retrieved_record
         else []
     )

175-177: Consider improving the error message for invalid data types.

The error message might be unclear to users. Would you consider rephrasing it to be more descriptive? For example:

raise ValueError(
-    f"Invalid data type. Available string or two items list of string. Got {mapped_field_type}."
+    f"Invalid data type '{mapped_field_type}'. Expected a string or a list of two strings."
)

Wdyt?


199-199: Question about the necessity of deepcopy.

In _get_airbyte_type, we are performing a deepcopy on the AIRBYTE_DATA_TYPES entry. If the returned mapping is not modified elsewhere, would it be acceptable to return the original mapping to improve performance? Wdyt?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between fd44be1 and 05e4f74.

📒 Files selected for processing (4)
  • airbyte_cdk/sources/declarative/declarative_component_schema.yaml (5 hunks)
  • airbyte_cdk/sources/declarative/models/declarative_component_schema.py (5 hunks)
  • airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (9 hunks)
  • airbyte_cdk/sources/declarative/schema/dynamic_schema_loader.py (1 hunks)
🔇 Additional comments (9)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (3)

Line range hint 1303-1382: LGTM! Nice refactoring of the stream slicer logic.

The decomposition into smaller, focused methods improves readability and maintainability. The separation of concerns between partition router and paginator logic is particularly clean.


1581-1583: LGTM! Simple and straightforward implementation.

The create_types_map method is clean and follows the factory pattern well.


1612-1636: LGTM! Well-structured dynamic schema loader creation.

The method follows the established pattern for retriever-based components and properly handles the stream slicer creation.

airbyte_cdk/sources/declarative/models/declarative_component_schema.py (3)

658-677: LGTM! Well-structured class with clear field definitions.

The SchemaTypeIdentifier class has appropriate field types and good defaults. The optional fields provide flexibility while maintaining required functionality through the mandatory key_pointer field.


1734-1743: LGTM! Clean implementation of the dynamic schema loader.

The DynamicSchemaLoader follows the established pattern of other schema loaders while introducing new dynamic schema loading capabilities. The required fields ensure all necessary components are provided.


1928-1928: LGTM! Proper handling of forward references.

The addition of DynamicSchemaLoader to the forward references ensures proper model resolution.

airbyte_cdk/sources/declarative/declarative_component_schema.yaml (3)

1679-1699: LGTM! Clear schema definition with appropriate experimental warning.

The schema for TypesMap is well-defined and matches the Python implementation. The experimental warning is a good practice for managing user expectations.


1700-1741: LGTM! Comprehensive schema definition with clear field descriptions.

The schema for SchemaTypeIdentifier is well-documented with clear descriptions for each field. The experimental warning and field requirements match the Python implementation.


1742-1764: LGTM! Well-defined schema with appropriate requirements.

The schema for DynamicSchemaLoader properly defines the component's structure with clear requirements and descriptions. The experimental warning is consistently applied.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (2)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (2)

1613-1637: Should we provide a more descriptive name for the retriever?

In the create_dynamic_schema_loader method, the name parameter for the retriever is set to an empty string. Providing a descriptive name might help with debugging and logging. Perhaps we could use something like "dynamic_schema_retriever"? WDYT?


1303-1337: Consider refactoring duplicate logic in slicer methods?

I noticed that _build_stream_slicer_from_partition_router and _build_resumable_cursor_from_paginator have similar patterns and are used together when building stream slicers. Do you think it might be beneficial to refactor this logic into a shared helper function to reduce duplication and improve maintainability? WDYT?

Also applies to: 1382-1382

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 05e4f74 and d41f54c.

📒 Files selected for processing (3)
  • airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (9 hunks)
  • airbyte_cdk/sources/declarative/schema/dynamic_schema_loader.py (1 hunks)
  • unit_tests/sources/declarative/schema/test_dynamic_schema_loader.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • unit_tests/sources/declarative/schema/test_dynamic_schema_loader.py
  • airbyte_cdk/sources/declarative/schema/dynamic_schema_loader.py
🔇 Additional comments (2)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (2)

1581-1637: Great addition of the dynamic schema components!

The implementation of create_types_map, create_schema_type_identifier, and create_dynamic_schema_loader looks solid and aligns well with the existing factory patterns. This should enhance the schema management capabilities effectively.


1615-1618: Is combined_slicers correctly assigned?

In create_dynamic_schema_loader, after calling _build_resumable_cursor_from_paginator, combined_slicers might end up being None if there's no paginator. Should we handle this case to ensure that the stream_slicer passed to the retriever is valid? Perhaps adding a check or default value could prevent potential issues. WDYT?

Copy link
Contributor

@aldogonzalez8 aldogonzalez8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

APPROVED

@lazebnyi
Copy link
Contributor Author

lazebnyi commented Dec 12, 2024

@aldogonzalez8 poetry run poe build will re-generate models based on yaml schema

@lazebnyi lazebnyi merged commit 5801cd8 into main Dec 12, 2024
17 of 19 checks passed
@lazebnyi lazebnyi deleted the lazebnyi/add-dynamic-schema-loader branch December 12, 2024 15:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants