Description
After trying out atheris
, based on your example (it is awesome, I'd say!) I found an interesting bug in caching that comes from the following fact:
>>> hash(-2)
-2
>>> hash(-1)
-2
From PEP-456:
The internal interface code between the hash function and the tp_hash slots implements special cases for zero length input and a return value of -1. An input of length 0 is mapped to hash value 0. The output -1 is mapped to -2.
It leads to a problem with the wrong canonicalisation, e.g. if {'exclusiveMaximum': 1, 'exclusiveMinimum': -1, 'type': 'number'}
was cached first, then applying canonicalisation on {'exclusiveMaximum': 1, 'exclusiveMinimum': -2, 'type': 'number'}
will return 'exclusiveMaximum': 1, 'exclusiveMinimum': -2, 'type': 'number'}
:(
-1
is quite common, and these cache collisions make me think about the current implementation - I am not completely sure how to implement caching efficiently enough. However, in #69, after reducing how many schemas are inlined, the performance improved dramatically, and I am not sure if this caching layer worth having (at least in the current implementation)
What do you think?