Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

validate does not raise errors in a predictable manner #953

Closed
ssbarnea opened this issue May 20, 2022 · 13 comments
Closed

validate does not raise errors in a predictable manner #953

ssbarnea opened this issue May 20, 2022 · 13 comments

Comments

@ssbarnea
Copy link
Contributor

Based on #481 I was able to build a simple reproducer for this bug:

from jsonschema import validate

schema = {
  "$schema": "http://json-schema.org/draft-07/schema#",
  "properties": {
    "environment": {
      "type": "object",
      "additionalProperties": {
        "type": "string"
      }
    }
  }
}

instance = {
  "environment": {
    "a": False,
    "b": False,
    "c": True,
    "d": "foo"
  }
}

for i in range(1, 20):
    try:
        validate(instance=instance, schema=schema)
    except Exception as exc:
        print(exc)

Desired behavior

Subsequent runs with the same inputs (schema + instance) should produce the same exceptions, not randomly returning different results.

@Julian
Copy link
Member

Julian commented May 20, 2022

The desired behavior isn't a guarantee validate makes today, and likely not one it can/will make -- if you want the same exception you can use best_match or sort the errors yourself, but I can have a look to see if there's any room for improvement.

@ssbarnea
Copy link
Contributor Author

ssbarnea commented May 20, 2022

To be honest, I wanted to use best_match but I was not able to find a simple way to use it similar to validate. As you probably seen calling validate is strait simple: giving only the schema and instance as inputs. Still the examples I seen with best_match required me to already know which schema draft version was, instantiate it,... Maybe I failed to read the right example?

I wonder if it would be possible to adapt validate to attach the list of errors to the exception, so consumer could dig into it.

Now I realise why one of my collegues implemented his own validator at https://github.com/ansible/ansible-navigator/blob/main/src/ansible_navigator/utils/json_schema.py#L72-L118 as that one returns all errors -- I wonder if we could contribute something similar to the library, so it would be easier to get more consistent errors?

Update: apparently his implementation does produce the same result each time, probably due to the use of sorted. I could try to move his implementation into a shared library that we used across our projects, but ideally we should be able to get this library upgraded. I bet others would be interested about the same kind of behavior. WDYT?

@Julian
Copy link
Member

Julian commented May 20, 2022

Docs definitely could be better (as usual :( ), sorry. There's a distinct lack of examples unfortunately.

(n.b. now that I ran your code, here at least, it always gives the same exception)

I'll try to come back in a bit with more guidance, but:

required me to already know which schema draft version was

This is jsonschema.validators.validator_for, which e.g. for your schema gives:

>>> import jsonschema.validators
>>> jsonschema.validators.validator_for(schema)
<class 'jsonschema.validators.Draft7Validator'>

so if you have an arbitrary schema, you'd use e.g.:

validator = jsonschema.validators.validator_for(schema)(schema)

and then when you validate, e.g. jsonschema.exceptions.best_match(validator.iter_errors(instance)) or

sorted(validator.iter_errors(instance), key=jsonschema.exceptions.relevance)

where relevance will sort using the same heuristics as best_match.

I'll look at the code you linked in a bit, but I suspect it basically needs #728 and #698 implemented to give clearer errors, neither of which should be too terrible.

@ssbarnea
Copy link
Contributor Author

What is your feeling about me making a pull request to add a new method called validate_iter that receives the same a parameters as current validate() but returns a sorted list of errors using the logic you mentioned?

@Julian
Copy link
Member

Julian commented May 20, 2022

Not really for it honestly -- I think the current world (where there's jsonschema.validate for simplistic use cases, and the draft-specific validators for normal or performance critical ones) already confuses users who don't realize they shouldn't call validate in a loop with the same schema over and over again.

Open to be convinced, but it's 3 lines to do with the existing API, or even one as just sorted(validator_for(schema)(schema).iter_errors(instance), key=jsonschema.exceptions.by_relevance), so it'd have to be a pretty good argument.

@Julian
Copy link
Member

Julian commented May 20, 2022

I'm slightly more amenable to adding a relevant_errors() method to validator classes which returns the sorted list and would chop a line off, turning it into validator_for(schema)(schema).relevant_errors()

@ssbarnea
Copy link
Contributor Author

You might be surprised to find that even the exceptions.relevance sort function produce random results with the same I gave in this report.

Before doing any optimizations we should probably look into ensuring that the library behavior is reproducible / predictable. Nobody will like a tool that returns different results, even between two identical calls as that will make it unreliable.

@Julian
Copy link
Member

Julian commented May 20, 2022

With all due respect the library is already quite popular, so saying things like "nobody will like a tool that returns different results" isn't really a welcome way to provide feedback.

As I say, I'm happy to look at specific examples and improvements. You're however again expecting a guarantee that isn't provided today, and as I said, I don't yet view as essential -- providing exceptions in a fixed order across runs isn't a current goal. It can undoubtedly be done.

@Julian
Copy link
Member

Julian commented May 24, 2022

Seems like you've addressed this downstream. As I say, as of today, it's not really a goal of the library to guarantee it produces the same exceptions in the same order across program runs, so users are indeed encouraged to call sorted() if that's what you want. We can revisit in the future if that's something a significant number of people have a need for.

@Julian Julian closed this as completed May 24, 2022
@cognifloyd
Copy link

I think keeping the order of output consistent across runs would be very beneficial.

I wonder what the source of the varied output is. Are the errors getting emitted from a set?

@Julian
Copy link
Member

Julian commented May 27, 2022

No, they are iterated over normally -- PRs to address are certainly welcome, but I'd probably suspect it's due to hash randomization of the dictionary that contains which validators to validate (i.e. the metaschema)?

If you fix a hash seed, does that help?

But in short it just doesn't seem feasible to guarantee -- don't forget this library is not just an implementation of the spec, it's also a way to create extensions to the spec, so one has to deal with a possibly infinite list of additional behavior, and allowing those to configure their ordering.

@cognifloyd
Copy link

Weird. I thought iterating over a dictionary with .items() always yielded the same order.

But, oh! I see what you're saying. The order of a dict is consistent within a run, not across runs. Hmm.

@ssbarnea
Copy link
Contributor Author

IMHO, this bug should be reopened. The only place where random behaviour is desirable in programming, is random function generators, and even these do often have a configurable seed that makes the behaviour reproducible.

The fact that current implementation has unpredictable behaviour which may prove tricky to fix should not be the reason for fixing it.

I also suspect that testing coverage of the code may prove to be partial as that would be one of the first places where this would be uncovered.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants