-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[low code connectors] perform schema validation of the input config against the declarative language schema #15543
Conversation
|
||
# Leaving this test here to document a limitation of the validator. Decoder has no meaningful fields to validate on so it accepts | ||
# the MinMaxDatetime despite being the wrong type | ||
def test_validation_wrong_object_type(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a limitation of the validator itself
|
||
# This test should fail because the extractor doesn't match the Array of resolved classes. However, despite the schema being correct | ||
# validation passes. Leaving this here to document it and revisit at another time. This is another validator limitation. | ||
def test_validate_types_nested_in_list(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've actually tested this by explicitly replacing the Interface in our code with a union of relevant types, but interestingly the validator still couldn't pick up on it.
|
||
|
||
@dataclass | ||
class DeclarativeAuthenticator(ABC): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This abstraction is needed so that we only attempt to expand certain authenticators during validation. It is a little bit of a concern that schema generation is coupled very closely to our exact object model which could get a little unwieldly in the future.
else: | ||
# Because the component's data fields definitions use interfaces, we need to resolve the underlying types into the | ||
# concrete classes that implement the interface before generating the schema | ||
DeclarativeComponentFactory._transform_interface_to_union(class_) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interestingly, even though we make a copy of the class and replace those annotations, it does seem to modifying the incoming class argument. I tried returning the relevant class and the way it is currently shown, to the same effect of expansion. However, if I comment this line out, the expansion will not go through (which is expected)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is almost certainly because the fields don't get copied.
Looking at dataclasses.fields
, it seems like they are in the "dataclass_fields" field of the class' __dict__
.
They don't get copied because we don't do a deepcopy of the dict here.
That being said, I don't know if there's an easy way to copy them because mappingproxy
s can't be serialized.
I doubt this matters much...
copy_cls.__annotations__[field.name] = unpacked_field_types | ||
|
||
@staticmethod | ||
def unpack(field_type: type): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is where it gets particularly fun. This whole recursive function is necessary to retain the existing structure of any declarative components that must be expanded, without losing the shape of the container they are in. We basically retrieve the underlying type and use that as the next recursive call. And once we have unpacked the underlying classes, we package it back into its original container. I added a handful of tests to verify this was where the subclass replacement became less than trivial.
Happy to discuss this part more since this was something I realized we needed midway through.
@brianjlai could you show some examples of the error message associated with an invalid YAML? |
yup i'll add some sample test cases and the validation message that comes back! |
if self.request_options_provider is None: | ||
self._request_options_provider = InterpolatedRequestOptionsProvider(config=self.config, options=options) | ||
elif isinstance(self.request_options_provider, dict): | ||
self._request_options_provider = InterpolatedRequestOptionsProvider(config=self.config, **self.request_options_provider) | ||
else: | ||
self._request_options_provider = self.request_options_provider | ||
self.authenticator = self.authenticator or NoAuth() | ||
self.authenticator = self.authenticator or BasicHttpAuthenticator("", config=self.config, options={}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is BasicHttp the default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah that's a good point. I think it makes sene to just have define a declarative NoAuth
that implements the methods and returns empty strings. Will fix!
|
||
def _read_and_parse_yaml_file(self, path_to_yaml_file): | ||
with open(path_to_yaml_file, "r") as f: | ||
config_content = f.read() | ||
return YamlParser().parse(config_content) | ||
|
||
def _validate_source(self): | ||
full_config = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: can we also check the "version" field we added last week?
Confirming its presence is enough for now. no need for additional validation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah yes will add!
else: | ||
# Because the component's data fields definitions use interfaces, we need to resolve the underlying types into the | ||
# concrete classes that implement the interface before generating the schema | ||
DeclarativeComponentFactory._transform_interface_to_union(class_) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is almost certainly because the fields don't get copied.
Looking at dataclasses.fields
, it seems like they are in the "dataclass_fields" field of the class' __dict__
.
They don't get copied because we don't do a deepcopy of the dict here.
That being said, I don't know if there's an easy way to copy them because mappingproxy
s can't be serialized.
I doubt this matters much...
/publish-cdk dry-run=true
|
/publish-cdk dry-run=false
|
/publish-cdk dry-run=true
|
/publish-cdk dry-run=false
|
…gainst the declarative language schema (#15543) * draft: first pass at complete schema language generation and factory validator * actually a working validator and fixes to the schema that went uncaught * remove extra spike file * fix formatting file * pr feedback and a little bit of refactoring * fix some types that were erroneously marked as invalid schema * some comments * add jsonschemamixin to interfaces * update changelog * bump version
What
This PR adds schema validation of the declarative language schema against a connector's YAML file implementation. At a high level, we are perform the schema validation at each level of component creation. So the creation of the schema for a given class and validation integrated into the creation flow which is then controlled with a boolean
instantiate
. For validation we pass False and for the actual existing creation flow, we pass True.How
Because we're leveraging the existing create() flow, we do not have to perform a complex traversal of object relationships. Instead we traverse the object like we already do during creation, however, we have an additional flow in
factory.py
whereinstantiate
is false, to perform the validation. We do this for each component of the YAML as well as at the top level which doesn't use the factory.The general flow is made up of expanding a components interface into concrete classes, generating the json schema, performing the validation against the component definition arguments.
The more interesting bit is how we perform this expansion of interfaces which is critical to validating correctness
note: This accounts for Optional (which is a union of type and none), lists, and Unions. So far in our schema i didn't see any cases for mappings that contained internal interfaces that needed expansion
Recommended reading order
factory.py
yaml_declarative_source.py
declarative_authenticator.py
test_factory.py
test_yaml_declarative_source.py
Example Error Validations:
Wrong type than expected:
Error:
Missing a required field:
Error:
Wrong type passed for a reference type (this one is interesting because validator doesn't know how to compare literal types, but it identifies that the passed in type is missing a field (
cursor_value
) that is expected on classes that implement pagination_strategy.Error:
The errors are bit more verbose