-
Notifications
You must be signed in to change notification settings - Fork 24
feat(low-code cdk): add dynamic schema loader #104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
/autofix
|
Warning Rate limit exceeded@lazebnyi has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 2 minutes and 59 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (1)
📝 Walkthrough📝 Walkthrough📝 WalkthroughWalkthroughThis pull request introduces three new components— Changes
Possibly related PRs
Suggested reviewers
Would you like to consider any additional reviewers or related PRs? Wdyt? Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Outside diff range and nitpick comments (7)
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (2)
1575-1581
: Adjust default values forname
andprimary_key
fieldsWould you consider setting the default values of
name
andprimary_key
toNone
instead of an empty string? This can help avoid confusion with optional fields and ensure consistency. Wdyt?Proposed change:
name: Optional[str] = Field( - "", description="The stream name.", example=["Users"], title="Name" + None, description="The stream name.", example="Users", title="Name" ) primary_key: Optional[PrimaryKey] = Field( - "", description="The primary key of the stream.", title="Primary Key" + None, description="The primary key of the stream.", title="Primary Key" )
654-679
: Suggestion onis_nullable
field inSchemaTypeIdentifier
Would it be clearer to use
is_nullable: bool = Field(True, ...)
instead ofOptional[bool]
since it has a default value? This can improve type clarity. Wdyt?Proposed change:
- is_nullable: Optional[bool] = Field( + is_nullable: bool = Field( True, description="Add null to defined field type. This field is automatically set by the CDK.", title="Is Nullable", )unit_tests/sources/declarative/schema/test_dynamic_schema_loader.py (2)
12-48
: Consider adding docstrings to fixtures?The fixtures are well-structured, but adding docstrings would make their purpose and setup more clear to other developers. Here's a suggested format, wdyt?
@pytest.fixture def mock_retriever(): + """ + Creates a mock retriever that returns predefined schema records. + Returns: + MagicMock: A mock retriever with read_records method returning sample schema data. + """ retriever = MagicMock()
51-128
: The test cases look great! Consider adding a docstring?The parameterized test cases provide excellent coverage. Adding a docstring to explain the test's purpose and parameters would enhance documentation, wdyt?
def test_dynamic_schema_loader(dynamic_schema_loader, retriever_data, expected_schema): + """ + Tests DynamicSchemaLoader's schema generation for various input scenarios. + + Args: + dynamic_schema_loader: Fixture providing the loader instance + retriever_data: Input data simulating API responses + expected_schema: Expected JSON schema output + """ dynamic_schema_loader.retriever.read_records.return_value = retriever_dataairbyte_cdk/sources/declarative/declarative_component_schema.yaml (3)
1666-1686
: Consider adding examples for TypesPair?The schema is well-defined, but adding examples would help users understand the expected format, wdyt?
current_type: anyOf: - type: string - type: array items: type: string + examples: + - "string" + - ["string", "integer"]
1687-1732
: Consider adding pattern validation for pointers?The schema pointers could benefit from regex patterns to ensure valid paths, wdyt?
schema_pointer: title: Schema Path description: List of potentially nested fields describing the full path of the field to extract. type: array items: - type: string + pattern: "^[a-zA-Z_][a-zA-Z0-9_]*$"
1733-1756
: Consider enhancing experimental status documentation?The experimental status warning could be more detailed about potential risks and limitations, wdyt?
- description: (This component is experimental. Use at your own risk.) Loads a schema by extracting data from retrieved records. + description: | + (This component is experimental. Use at your own risk.) + Loads a schema by extracting data from retrieved records. + + Note: As an experimental feature: + - The interface may change in non-backward compatible ways + - Support is limited + - Not recommended for production use without thorough testing
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (8)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml
(5 hunks)airbyte_cdk/sources/declarative/models/declarative_component_schema.py
(11 hunks)airbyte_cdk/sources/declarative/parsers/manifest_component_transformer.py
(1 hunks)airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
(9 hunks)airbyte_cdk/sources/declarative/partition_routers/__init__.py
(1 hunks)airbyte_cdk/sources/declarative/schema/__init__.py
(1 hunks)airbyte_cdk/sources/declarative/schema/dynamic_schema_loader.py
(1 hunks)unit_tests/sources/declarative/schema/test_dynamic_schema_loader.py
(1 hunks)
🔇 Additional comments (6)
airbyte_cdk/sources/declarative/schema/dynamic_schema_loader.py (1)
106-128
: Overall implementation looks solid
The get_json_schema
method effectively constructs the JSON Schema dynamically. Good job on handling the iterations and data extraction properly.
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (1)
650-653
: Definition of TypesPair
looks good
The TypesPair
class correctly models the mapping between current and target types. Nice addition!
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)
1570-1629
: Implementation of new factory methods
The methods create_types_pair
, create_schema_type_identifier
, and create_dynamic_schema_loader
are well-implemented and integrate seamlessly with the factory pattern. Great work!
airbyte_cdk/sources/declarative/schema/__init__.py (1)
9-11
: Exporting new schema components
Including DynamicSchemaLoader
, TypesPair
, and SchemaTypeIdentifier
in __all__
makes sense. This ensures they are available for import elsewhere.
airbyte_cdk/sources/declarative/partition_routers/__init__.py (1)
9-11
: Adding PartitionRouter
to exports
Adding PartitionRouter
to the __all__
list appropriately exposes it as a public interface. Good job!
airbyte_cdk/sources/declarative/parsers/manifest_component_transformer.py (1)
61-64
: LGTM! The new mappings look well-structured. 🎯
The additions follow the existing pattern nicely and align with the dynamic schema loader feature. Just to double-check our integration - would it make sense to verify that these components are properly registered in their respective definition files? wdyt?
✅ Verification successful
Looking good! The component mappings are correctly aligned with their implementations. 👍
The verification shows that both mappings in manifest_component_transformer.py
are valid:
DynamicSchemaLoader.retriever
maps toSimpleRetriever
, which is a valid option in the component schemaSchemaTypeIdentifier.types_map
maps toTypesPair
, which is correctly defined and used in the schema type identifier class
Both components are properly defined in the codebase with their respective dependencies, and the mappings follow the established patterns. The experimental nature of these components is also consistently documented.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify the component definitions exist and match the mapped types
# Check for DynamicSchemaLoader definition and SimpleRetriever type
echo "Checking DynamicSchemaLoader definition..."
rg -l "class DynamicSchemaLoader.*SimpleRetriever"
# Check for SchemaTypeIdentifier definition and TypesPair type
echo "Checking SchemaTypeIdentifier definition..."
rg -l "class SchemaTypeIdentifier.*TypesPair"
Length of output: 292
Script:
#!/bin/bash
# Let's try a different approach with ast-grep to find these class definitions
# and their relationships with the mapped types
echo "Checking DynamicSchemaLoader definition..."
ast-grep --pattern 'class DynamicSchemaLoader {
$$$
}'
echo -e "\nChecking SchemaTypeIdentifier definition..."
ast-grep --pattern 'class SchemaTypeIdentifier {
$$$
}'
echo -e "\nChecking for TypesPair usage..."
rg "TypesPair" -A 2 -B 2
echo -e "\nChecking for SimpleRetriever usage..."
rg "SimpleRetriever" -A 2 -B 2
Length of output: 66058
Script:
#!/bin/bash
# Let's check for the DynamicSchemaLoader and SchemaTypeIdentifier class definitions
# and their relationship with the mapped types in the schema files
echo "Checking DynamicSchemaLoader in schema files..."
rg "DynamicSchemaLoader" airbyte_cdk/sources/declarative/schema/ -A 5 -B 5
echo -e "\nChecking SchemaTypeIdentifier in schema files..."
rg "SchemaTypeIdentifier" airbyte_cdk/sources/declarative/schema/ -A 5 -B 5
echo -e "\nChecking component schema definitions..."
rg "DynamicSchemaLoader|SchemaTypeIdentifier" airbyte_cdk/sources/declarative/declarative_component_schema.yaml -A 5 -B 5
Length of output: 7591
airbyte_cdk/sources/declarative/schema/dynamic_schema_loader.py
Outdated
Show resolved
Hide resolved
airbyte_cdk/sources/declarative/schema/dynamic_schema_loader.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (1)
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (1)
851-851
: Minor grammatical corrections in descriptions.Consider decapitalizing "Specific" in the descriptions and refining wording for clarity. For example, changing "The DeclarativeOAuth Specific list..." to "The DeclarativeOAuth specific list...". WDYT?
You could apply these diffs:
For line 851:
- description="The DeclarativeOAuth Specific list of strings to indicate which keys should be extracted and returned back to the input config.", + description="The DeclarativeOAuth specific list of strings indicating which keys should be extracted and returned to the input config.",For line 857:
- description="The DeclarativeOAuth Specific object to provide the criteria of how the `state` query param should be constructed,\nincluding length and complexity.", + description="The DeclarativeOAuth specific object providing the criteria for how the `state` query parameter should be constructed, including length and complexity.",For line 881:
- description="The DeclarativeOAuth Specific optional override to provide the custom `state` key name, if required by data-provider.", + description="The DeclarativeOAuth specific optional override to provide the custom `state` key name, if required by the data provider.",For line 887:
- description="The DeclarativeOAuth Specific optional override to provide the custom `code` key name to something like `auth_code` or `custom_auth_code`, if required by data-provider.", + description="The DeclarativeOAuth specific optional override to provide the custom `code` key name, such as `auth_code` or `custom_auth_code`, if required by the data provider.",Also applies to: 857-857, 881-881, 887-887
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
airbyte_cdk/sources/declarative/models/declarative_component_schema.py
(5 hunks)
🔇 Additional comments (1)
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (1)
648-650
: LGTM!
The TypesPair
class definition looks correct and follows code conventions.
…tehq/airbyte-python-cdk into lazebnyi/add-dynamic-schema-loader
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments from sync discussion
@aldogonzalez8 can you have a look a this? You'll have to use this for google sheets
airbyte_cdk/sources/declarative/declarative_component_schema.yaml
Outdated
Show resolved
Hide resolved
airbyte_cdk/sources/declarative/declarative_component_schema.yaml
Outdated
Show resolved
Hide resolved
airbyte_cdk/sources/declarative/declarative_component_schema.yaml
Outdated
Show resolved
Hide resolved
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
Outdated
Show resolved
Hide resolved
airbyte_cdk/sources/declarative/schema/dynamic_schema_loader.py
Outdated
Show resolved
Hide resolved
…tehq/airbyte-python-cdk into lazebnyi/add-dynamic-schema-loader
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (5)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)
1585-1610
: Consider simplifying the pointer list comprehensions.The list comprehensions for schema_pointer, key_pointer, and type_pointer could be made more readable. WDYT about:
- model_schema_pointer: List[Union[InterpolatedString, str]] = ( - [x for x in model.schema_pointer] if model.schema_pointer else [] - ) - model_key_pointer: List[Union[InterpolatedString, str]] = [x for x in model.key_pointer] - model_type_pointer: Optional[List[Union[InterpolatedString, str]]] = ( - [x for x in model.type_pointer] if model.type_pointer else None - ) + model_schema_pointer = list(model.schema_pointer) if model.schema_pointer else [] + model_key_pointer = list(model.key_pointer) + model_type_pointer = list(model.type_pointer) if model.type_pointer else Noneairbyte_cdk/sources/declarative/models/declarative_component_schema.py (1)
653-656
: Consider adding docstrings for better documentation?The
TypesMap
class looks good, but would it be helpful to add docstrings explaining the purpose oftarget_type
andcurrent_type
fields and their relationship? This could help other developers understand how to use this mapping. WDYT?airbyte_cdk/sources/declarative/schema/dynamic_schema_loader.py (3)
112-121
: Consider handling the case when no records are retrieved.Currently, if
retrieved_record
isNone
,raw_schema
is set to an empty list, resulting in an emptyproperties
in the generated JSON Schema. Would you consider adding error handling or logging to inform users when no records are retrieved? Wdyt?Possible code change:
def get_json_schema(self) -> Mapping[str, Any]: """ Constructs a JSON Schema based on retrieved data. """ properties = {} retrieved_record = next(self.retriever.read_records({}), None) + if retrieved_record is None: + # Log a warning or raise an exception + raise ValueError("No records retrieved. Cannot generate schema.") raw_schema = ( self._extract_data( retrieved_record, self.schema_type_identifier.schema_pointer, ) if retrieved_record else [] )
175-177
: Consider improving the error message for invalid data types.The error message might be unclear to users. Would you consider rephrasing it to be more descriptive? For example:
raise ValueError( - f"Invalid data type. Available string or two items list of string. Got {mapped_field_type}." + f"Invalid data type '{mapped_field_type}'. Expected a string or a list of two strings." )Wdyt?
199-199
: Question about the necessity ofdeepcopy
.In
_get_airbyte_type
, we are performing adeepcopy
on theAIRBYTE_DATA_TYPES
entry. If the returned mapping is not modified elsewhere, would it be acceptable to return the original mapping to improve performance? Wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (4)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml
(5 hunks)airbyte_cdk/sources/declarative/models/declarative_component_schema.py
(5 hunks)airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
(9 hunks)airbyte_cdk/sources/declarative/schema/dynamic_schema_loader.py
(1 hunks)
🔇 Additional comments (9)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (3)
Line range hint 1303-1382
: LGTM! Nice refactoring of the stream slicer logic.
The decomposition into smaller, focused methods improves readability and maintainability. The separation of concerns between partition router and paginator logic is particularly clean.
1581-1583
: LGTM! Simple and straightforward implementation.
The create_types_map method is clean and follows the factory pattern well.
1612-1636
: LGTM! Well-structured dynamic schema loader creation.
The method follows the established pattern for retriever-based components and properly handles the stream slicer creation.
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (3)
658-677
: LGTM! Well-structured class with clear field definitions.
The SchemaTypeIdentifier
class has appropriate field types and good defaults. The optional fields provide flexibility while maintaining required functionality through the mandatory key_pointer
field.
1734-1743
: LGTM! Clean implementation of the dynamic schema loader.
The DynamicSchemaLoader
follows the established pattern of other schema loaders while introducing new dynamic schema loading capabilities. The required fields ensure all necessary components are provided.
1928-1928
: LGTM! Proper handling of forward references.
The addition of DynamicSchemaLoader
to the forward references ensures proper model resolution.
airbyte_cdk/sources/declarative/declarative_component_schema.yaml (3)
1679-1699
: LGTM! Clear schema definition with appropriate experimental warning.
The schema for TypesMap
is well-defined and matches the Python implementation. The experimental warning is a good practice for managing user expectations.
1700-1741
: LGTM! Comprehensive schema definition with clear field descriptions.
The schema for SchemaTypeIdentifier
is well-documented with clear descriptions for each field. The experimental warning and field requirements match the Python implementation.
1742-1764
: LGTM! Well-defined schema with appropriate requirements.
The schema for DynamicSchemaLoader
properly defines the component's structure with clear requirements and descriptions. The experimental warning is consistently applied.
airbyte_cdk/sources/declarative/schema/dynamic_schema_loader.py
Outdated
Show resolved
Hide resolved
airbyte_cdk/sources/declarative/schema/dynamic_schema_loader.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (2)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (2)
1613-1637
: Should we provide a more descriptive name for the retriever?In the
create_dynamic_schema_loader
method, thename
parameter for the retriever is set to an empty string. Providing a descriptive name might help with debugging and logging. Perhaps we could use something like"dynamic_schema_retriever"
? WDYT?
1303-1337
: Consider refactoring duplicate logic in slicer methods?I noticed that
_build_stream_slicer_from_partition_router
and_build_resumable_cursor_from_paginator
have similar patterns and are used together when building stream slicers. Do you think it might be beneficial to refactor this logic into a shared helper function to reduce duplication and improve maintainability? WDYT?Also applies to: 1382-1382
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (3)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
(9 hunks)airbyte_cdk/sources/declarative/schema/dynamic_schema_loader.py
(1 hunks)unit_tests/sources/declarative/schema/test_dynamic_schema_loader.py
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- unit_tests/sources/declarative/schema/test_dynamic_schema_loader.py
- airbyte_cdk/sources/declarative/schema/dynamic_schema_loader.py
🔇 Additional comments (2)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (2)
1581-1637
: Great addition of the dynamic schema components!
The implementation of create_types_map
, create_schema_type_identifier
, and create_dynamic_schema_loader
looks solid and aligns well with the existing factory patterns. This should enhance the schema management capabilities effectively.
1615-1618
: Is combined_slicers
correctly assigned?
In create_dynamic_schema_loader
, after calling _build_resumable_cursor_from_paginator
, combined_slicers
might end up being None
if there's no paginator. Should we handle this case to ensure that the stream_slicer
passed to the retriever is valid? Perhaps adding a check or default value could prevent potential issues. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
APPROVED
@aldogonzalez8 |
…tehq/airbyte-python-cdk into lazebnyi/add-dynamic-schema-loader
What
This PR add dynamic schema loader that can dynamically generate schema based on response.|
Fixed: https://github.com/airbytehq/airbyte-internal-issues/issues/10782
How
Described schema, models and data classes for
DynamicSchemaLoader
,SchemaTypeIdentifier
andTypesPair
. Defined behavior for schema resolving. Integrated the dynamic schema handling into the existing framework.Summary by CodeRabbit
Release Notes
New Features
TypesMap
,SchemaTypeIdentifier
, andDynamicSchemaLoader
.Bug Fixes
DynamicSchemaLoader
for invalid keys and types.Tests
DynamicSchemaLoader
to validate functionality and error handling.