Skip to content

Bug: Incoherent job statepoint access in subprocess #528

Closed
@bdice

Description

@bdice

Description

I was facing a strange bug in signac-flow's tests and have identified it is occurring because of some issue in signac.

I did a git bisect and it looks like the issue was introduced in PR #497 but I can't tell why.

There's some kind of a race condition that leads to job.sp returning {} even though there should be data in the state point.

This happens both with and without buffering.

The job is opened by statepoint, so there is no lazy state point access.

To reproduce

Here's a minimal failing example.

import signac
import subprocess
import sys
from concurrent.futures import ThreadPoolExecutor

def b_is_positive(job):
    sp = job.sp
    try:
        return sp.b >= 0
    except AttributeError:
        print("Incorrect state point:", sp)
        return False

def compute_status(data):
    job, func = data
    result = func(job)
    return job.id, func.__name__, result

if __name__ == "__main__":
    if len(sys.argv) == 1:
        # Launch the main process
        with signac.TemporaryProject() as project:
            for b in range(5):
                project.open_job({"b": b}).init()
            cmd = f"python {__file__} {project.root_directory()}"
            output = subprocess.check_output(cmd.split()).decode("utf-8")
            print(output)
    else:
        # Launch the subprocess
        print("Launched subprocess.")
        project_dir = sys.argv[1]
        project = signac.get_project(project_dir)
        tasks = []
        for job in project:
            # Launching more concurrent tasks causes a higher failure rate
            for func in [b_is_positive, b_is_positive, b_is_positive]:
                tasks.append((job, func))
        with ThreadPoolExecutor() as e:
            results = list(e.map(compute_status, tasks))
        failures = 0
        for r in results:
            if r[1] == 'b_is_positive' and not r[2]:
                print("FAIL", r)
                failures += 1
        print("Total failures:", failures)

System configuration

Please complete the following information:

  • Operating System [e.g. macOS]: Ubuntu 20.04 (WSL)
  • Version of Python [e.g. 3.7]: 3.9
  • Version of signac [e.g. 1.0]: e5b8057 or newer (master is pointing to 786d75f)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions