Skip to content

Bug in the caching implementation #71

Closed
@Stranger6667

Description

@Stranger6667

After trying out atheris, based on your example (it is awesome, I'd say!) I found an interesting bug in caching that comes from the following fact:

>>> hash(-2)
-2
>>> hash(-1)
-2

From PEP-456:

The internal interface code between the hash function and the tp_hash slots implements special cases for zero length input and a return value of -1. An input of length 0 is mapped to hash value 0. The output -1 is mapped to -2.

It leads to a problem with the wrong canonicalisation, e.g. if {'exclusiveMaximum': 1, 'exclusiveMinimum': -1, 'type': 'number'} was cached first, then applying canonicalisation on {'exclusiveMaximum': 1, 'exclusiveMinimum': -2, 'type': 'number'} will return 'exclusiveMaximum': 1, 'exclusiveMinimum': -2, 'type': 'number'} :(

-1 is quite common, and these cache collisions make me think about the current implementation - I am not completely sure how to implement caching efficiently enough. However, in #69, after reducing how many schemas are inlined, the performance improved dramatically, and I am not sure if this caching layer worth having (at least in the current implementation)

What do you think?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions