Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid deadlock in SyncedCollection #530

Merged
merged 1 commit into from
Mar 14, 2021
Merged

Conversation

joaander
Copy link
Member

Description

Add locks needed to avoid deadlocks in SyncedCollection

Motivation and Context

This MWE deadlocks because SycedCollections._locks[lock_id] gets overwritten while it is being used:

import json
import tempfile
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor
from signac.synced_collections.backends.collection_json import JSONAttrDict
def update_status(data):
    s, filename = data
    d = JSONAttrDict(filename=filename)
    d.update(dict(a=1, b=s-1))
    return False
if __name__ == "__main__":
    N = 1
    n_tasks = 20
    with tempfile.TemporaryDirectory() as d:
        for s in range(N):
            with open(f'{d}/{s}.json', 'w') as f:
                json.dump(dict(a=s, b=0), f)
        tasks = []
        for s in range(N):
            for t in range(n_tasks):
                tasks.append((s, f'{d}/{s}.json'))
        with ThreadPoolExecutor() as e:
            e.map(update_status, tasks)
        print(JSONAttrDict._locks)

Types of Changes

  • Documentation update
  • Bug fix
  • New feature
  • Breaking change1

1The change breaks (or has the potential to break) existing functionality.

Checklist:

If necessary:

  • I have updated the API documentation as part of the package doc-strings.
  • I have created a separate pull request to update the framework documentation on signac-docs and linked it here.
  • I have updated the changelog and added all related issue and pull request numbers for future reference (if applicable). See example below.

@joaander joaander requested review from a team as code owners March 12, 2021 19:32
@joaander joaander requested review from bdice and cbkerr and removed request for a team March 12, 2021 19:32
@codecov
Copy link

codecov bot commented Mar 12, 2021

Codecov Report

Merging #530 (21f11de) into master (786d75f) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #530   +/-   ##
=======================================
  Coverage   78.16%   78.17%           
=======================================
  Files          65       65           
  Lines        7036     7039    +3     
  Branches     1310     1311    +1     
=======================================
+ Hits         5500     5503    +3     
  Misses       1231     1231           
  Partials      305      305           
Impacted Files Coverage Δ
...synced_collections/data_types/synced_collection.py 92.14% <100.00%> (+0.17%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 786d75f...21f11de. Read the comment docs.

@bdice bdice requested a review from vyasr March 12, 2021 19:34
Copy link
Member

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Thanks again for your help, @joaander. If you can think of a test that would fail (but not deadlock), that would be a good addition. I don't know how I would construct such a case unless there is a way to force a timeout.

@joaander
Copy link
Member Author

Looks good to me. Thanks again for your help, @joaander. If you can think of a test that would fail (but not deadlock), that would be a good addition. I don't know how I would construct such a case unless there is a way to force a timeout.

I tried to get failing behavior in my first attempt on the way to this MWE but wan't successful. I had tried to trigger conflicting writes to the file.

If inspection of private variables is allowed in the test, a test that would fail before this change would construct two JSONAttrDicts with the same file and assert that the identify of the lock JSONAttrDict._locks[filename] stays the same.

Copy link
Member

@cbkerr cbkerr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With help from @bdice, I was able to reproduce the problem and fix even though I didn't experience a deadlock by printing threading.get_ident() and "done" inside the update_status function.

@vyasr vyasr merged commit 5709232 into master Mar 14, 2021
@vyasr vyasr deleted the fix-syncedcollection-deadlock branch March 14, 2021 15:22
@vyasr
Copy link
Contributor

vyasr commented Mar 14, 2021

In the interest of getting things merged I'll look into constructing a test later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug: Incoherent job statepoint access in subprocess
5 participants