Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Built-in hash() method is not deterministic, which breaks cachier since #17 #18

Closed
louismartin opened this issue Feb 6, 2020 · 7 comments
Assignees
Labels

Comments

@louismartin
Copy link
Contributor

louismartin commented Feb 6, 2020

Cachier is broken for me since #17 for string arguments (probably for other built-in objects as well).
This comes from the fact that the built-in hash() method is not deterministic between runs in python.
Therefore calling it on the arguments of the function for caching purpose does not work between multiple runs because it will hash to different values.

The problematic call to hash() is here.

Repro

Run this script twice in a row and it will re-execute the method test() each time without relying on the cache:

import time
from cachier import cachier

@cachier(pickle_reload=False)
def test(text):
    time.sleep(1)
    print(text)
    print(hash(text))
    return 0

start_time = time.time()
text = 'foo'
print(hash(text))
test(text)
print(time.time() - start_time)
@aleksejkozin
Copy link

@louismartin encountered the same issue yesterday, took me like 5 hours to get that v 1.3.0 broke my code 😄 1.2.8 works fine

Hope it will be fixed

Here is a github action that reproduces the issue
https://github.com/flowbrew/cachier-1.3.0-bug/blob/master/.github/workflows/main.yaml

Here is the github action run logs
https://github.com/flowbrew/cachier-1.3.0-bug/runs/431907332?check_suite_focus=true

@louismartin
Copy link
Contributor Author

It took me a few hours to narrow it down to the hash function as well :)

@shaypal5 shaypal5 added the bug label Feb 8, 2020
@shaypal5
Copy link
Collaborator

shaypal5 commented Feb 8, 2020

On it. I've added a test that reproduces this on my machine. I now want to make I reproduce this on every Travis test configuration, and then I can move forward with test cross-kernel stable hash functions.

@shaypal5 shaypal5 self-assigned this Feb 8, 2020
@shaypal5
Copy link
Collaborator

shaypal5 commented Feb 8, 2020

Done. Now let's see if zlib.adler32() will solve this, as suggested here:
https://stackoverflow.com/questions/27954892/deterministic-hashing-in-python-3

@shaypal5
Copy link
Collaborator

shaypal5 commented Feb 8, 2020

Should be solved by release v1.3.3.
Please check and let me know how it goes; you should definitely reopen this issue if this doesn't fix the issue.

@shaypal5 shaypal5 closed this as completed Feb 8, 2020
@louismartin
Copy link
Contributor Author

Thanks for fixing this so quickly!

@shaypal5
Copy link
Collaborator

shaypal5 commented Feb 8, 2020

Sure thing! Any pickle-able object should now be a valid argument to a cache-wrapper function - with special treatment given to pandas DataFrames and numpy arrays - so feel free to try things out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants