Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue compiling pytensor functions in multiple processes #6818

Open
PL-EduardoSanchez opened this issue Jul 5, 2023 · 17 comments
Open

Issue compiling pytensor functions in multiple processes #6818

PL-EduardoSanchez opened this issue Jul 5, 2023 · 17 comments

Comments

@PL-EduardoSanchez
Copy link

Describe the issue:

Given a table with three columns X1, X2, and Y, I am trying to get independent Bayesian models for the variable Y for each value of X1 and X2. This is why I'm trying to parallelise the MCMC sampling with Parallel+delayed from joblib (see code below). In principle the workflow runs smoothly, but at some point, in principle at random, the code collapses (see the error message below).

For the record, I am running the process as a python script from my terminal like this:

conda activate my-conda-env
python my_script.py # see the code below

If you need more details, please, let me know.

Thanks in advance!

Best,

Eduardo

Reproduceable code example:

import pymc as pm
from joblib import Parallel, delayed
import multiprocessing
import pandas as pd

# here some code to read a table into a pandas DataFrame, df, with the columns
# X1 (str), X2 (str), and Y (int); I cannot share its content because of
# confidential issues

# the goal is to get a different model for each value of X1 and X2,
# so we build a list of the combinations of X1 and X2 with at least
# one row
df_X1_X2_counts = (
    df[["X1", "X2"]].
    value_counts().
    reset_index().
    drop(columns="count")
)
list_X1_X2 = [tuple(x) for x in df_X1_X2_counts.to_numpy()]

# define function to get models
def bayes_independent_model(random_variable, draws, tune):
    with pm.Model() as model:
        # Prior distributions: Half-normals
        
        # alpha
        alpha = pm.HalfNormal(
            name='alpha',
            sigma=10,
            )
        # theta
        theta = pm.HalfNormal(
            name='theta',
            sigma=10,
            )

        # Likelihood: Gamma
        likelihood = pm.Gamma('likelihood',
                              alpha=alpha,
                              beta=1/theta,
                              observed=random_variable)
    
    # MCMC simulations
    with model:
        trace = pm.sample(
            draws=draws,
            tune=tune,
            chains=2,
            cores=1,
            progressbar=False,
            )
        
    # Then, sample the posterior
    with model:
        trace = pm.sample_posterior_predictive(trace=trace,
                                               extend_inferencedata=True,
                                               progressbar=False)
    

    return trace

# get the number of cores minus 1 to parallelise
n_cores = multiprocessing.cpu_count() - 1

# parallelisation; this is where the code collapses
traces = Parallel(n_jobs=n_cores,
                  verbose=10)(delayed(bayes_independent_model)(
                          random_variable=df.loc[(df.X1 == x1) & (df.X2 == x2), "Y"],
                          draws=2000,
                          tune=1000,) for x1, x2 in list_X1_X2
                          )

Error message:

<details>
ERROR (pytensor.graph.rewriting.basic): Rewrite failure due to: constant_folding
</details>

PyMC version information:

pymc: 5.5.0
pytensor: 2.12.3
python: 3.11.4
OS: Windows 10
Installation: With conda (see code below)

conda create -c conda-forge -n optiwaste-hierarchical-models "pymc>=5"

Context for the issue:

I think it would be very convenient to be able to run in parallel a bunch of different Bayesian models to save some computation time.

@welcome
Copy link

welcome bot commented Jul 5, 2023

Welcome Banner
🎉 Welcome to PyMC! 🎉 We're really excited to have your input into the project! 💖

If you haven't done so already, please make sure you check out our Contributing Guidelines and Code of Conduct.

@ricardoV94
Copy link
Member

Is this caused by the parallelization? Does it work otherwise?

@PL-EduardoSanchez
Copy link
Author

Otherwise it works, so most likely has to do with the parallelisation. Actually, I've got roughly 750 combinations and it collapses typically around 600.

By the way, for the record, funnily enough, I just run the code from VS Code in interactive mode and it worked...

@ricardoV94
Copy link
Member

Can you report the whole traceback?

@PL-EduardoSanchez
Copy link
Author

PL-EduardoSanchez commented Jul 5, 2023

Indeed. It's a little bit messy; my feeling is that this is because, as I'm parallelising, when the code collapses there're several pm.sample() running in parallel and we get somehow several errors in parallel. Anyway, here it is:

ERROR (pytensor.graph.rewriting.basic): Rewrite failure due to: constant_folding
ERROR (pytensor.graph.rewriting.basic): node: Sum{axes=None}([3.5835189 ... .98898405])
ERROR (pytensor.graph.rewriting.basic): TRACEBACK:
ERROR (pytensor.graph.rewriting.basic): Rewrite failure due to: constant_folding
ERROR (pytensor.graph.rewriting.basic): node: Log(likelihood{[ 15. 74. ... 47. 65.]})
ERROR (pytensor.graph.rewriting.basic): TRACEBACK:
ERROR (pytensor.graph.rewriting.basic): Rewrite failure due to: constant_folding
ERROR (pytensor.graph.rewriting.basic): Rewrite failure due to: constant_folding
ERROR (pytensor.graph.rewriting.basic): Traceback (most recent call last):
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\graph\rewriting\basic.py", line 1914, in process_node
replacements = node_rewriter.transform(fgraph, node)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\graph\rewriting\basic.py", line 1074, in transform
return self.fn(fgraph, node)
^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\tensor\rewriting\basic.py", line 1138, in constant_folding
thunk = node.op.make_thunk(node, storage_map, compute_map, no_recycling=[])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\link\c\op.py", line 131, in make_thunk
return self.make_c_thunk(node, storage_map, compute_map, no_recycling)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\link\c\op.py", line 96, in make_c_thunk
outputs = cl.make_thunk(
^^^^^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\link\c\basic.py", line 1200, in make_thunk
cthunk, module, in_storage, out_storage, error_storage = self.compile(
^^^^^^^^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\link\c\basic.py", line 1120, in compile
thunk, module = self.cthunk_factory(
^^^^^^^^^^^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\link\c\basic.py", line 1644, in cthunk_factory
module = cache.module_from_key(key=key, lnk=self)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\link\c\cmodule.py", line 1206, in module_from_key
module = self._get_from_hash(module_hash, key)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\link\c\cmodule.py", line 1109, in _get_from_hash
with lock_ctx():
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\contextlib.py", line 137, in enter
return next(self.gen)
^^^^^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\compile\compilelock.py", line 74, in lock_ctx
fl.acquire(timeout=timeout)
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\filelock_api.py", line 222, in acquire
raise Timeout(lock_filename) # noqa: TRY301
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
filelock._error.Timeout: The file lock 'C:\Users\EDUARDO\AppData\Local\PyTensor\compiledir_Windows-10-10.0.19045-SP0-Intel64_Family_6_Model_158_Stepping_13_GenuineIntel-3.11.4-64.lock' could not be acquired.

ERROR (pytensor.graph.rewriting.basic): node: Mul([-1.], likelihood{[ 77. 14. ... 25. 52.]}, [1.])
ERROR (pytensor.graph.rewriting.basic): TRACEBACK:
ERROR (pytensor.graph.rewriting.basic): node: Mul([2.6390573 ... .73766962], [1.])
ERROR (pytensor.graph.rewriting.basic): Traceback (most recent call last):
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\graph\rewriting\basic.py", line 1914, in process_node
replacements = node_rewriter.transform(fgraph, node)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\graph\rewriting\basic.py", line 1074, in transform
return self.fn(fgraph, node)
^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\tensor\rewriting\basic.py", line 1138, in constant_folding
thunk = node.op.make_thunk(node, storage_map, compute_map, no_recycling=[])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\link\c\op.py", line 131, in make_thunk
return self.make_c_thunk(node, storage_map, compute_map, no_recycling)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\link\c\op.py", line 96, in make_c_thunk
outputs = cl.make_thunk(
^^^^^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\link\c\basic.py", line 1200, in make_thunk
cthunk, module, in_storage, out_storage, error_storage = self.compile(
^^^^^^^^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\link\c\basic.py", line 1120, in compile
thunk, module = self.cthunk_factory(
^^^^^^^^^^^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\link\c\basic.py", line 1644, in cthunk_factory
module = cache.module_from_key(key=key, lnk=self)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\link\c\cmodule.py", line 1206, in module_from_key
module = self._get_from_hash(module_hash, key)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\link\c\cmodule.py", line 1109, in _get_from_hash
with lock_ctx():
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\contextlib.py", line 137, in enter
return next(self.gen)
^^^^^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\compile\compilelock.py", line 74, in lock_ctx
fl.acquire(timeout=timeout)
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\filelock_api.py", line 222, in acquire
raise Timeout(lock_filename) # noqa: TRY301
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
filelock._error.Timeout: The file lock 'C:\Users\EDUARDO\AppData\Local\PyTensor\compiledir_Windows-10-10.0.19045-SP0-Intel64_Family_6_Model_158_Stepping_13_GenuineIntel-3.11.4-64.lock' could not be acquired.

ERROR (pytensor.graph.rewriting.basic): TRACEBACK:
ERROR (pytensor.graph.rewriting.basic): Traceback (most recent call last):
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\graph\rewriting\basic.py", line 1914, in process_node
replacements = node_rewriter.transform(fgraph, node)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\graph\rewriting\basic.py", line 1074, in transform
return self.fn(fgraph, node)
^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\tensor\rewriting\basic.py", line 1138, in constant_folding
thunk = node.op.make_thunk(node, storage_map, compute_map, no_recycling=[])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\link\c\op.py", line 131, in make_thunk
return self.make_c_thunk(node, storage_map, compute_map, no_recycling)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\link\c\op.py", line 96, in make_c_thunk
outputs = cl.make_thunk(
^^^^^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\link\c\basic.py", line 1200, in make_thunk
cthunk, module, in_storage, out_storage, error_storage = self.compile(
^^^^^^^^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\link\c\basic.py", line 1120, in compile
thunk, module = self.cthunk_factory(
^^^^^^^^^^^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\link\c\basic.py", line 1644, in cthunk_factory
module = cache.module_from_key(key=key, lnk=self)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\link\c\cmodule.py", line 1206, in module_from_key
module = self._get_from_hash(module_hash, key)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\link\c\cmodule.py", line 1109, in _get_from_hash
with lock_ctx():
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\contextlib.py", line 137, in enter
return next(self.gen)
^^^^^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\compile\compilelock.py", line 74, in lock_ctx
fl.acquire(timeout=timeout)
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\filelock_api.py", line 222, in acquire
raise Timeout(lock_filename) # noqa: TRY301
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
filelock._error.Timeout: The file lock 'C:\Users\EDUARDO\AppData\Local\PyTensor\compiledir_Windows-10-10.0.19045-SP0-Intel64_Family_6_Model_158_Stepping_13_GenuineIntel-3.11.4-64.lock' could not be acquired.

Sampling 2 chains for 1_000 tune and 2_000 draw iterations (2_000 + 4_000 draws total) took 672 seconds.
Sequential sampling (2 chains in 1 job)
ERROR (pytensor.graph.rewriting.basic): Traceback (most recent call last):
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\graph\rewriting\basic.py", line 1914, in process_node
replacements = node_rewriter.transform(fgraph, node)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\graph\rewriting\basic.py", line 1074, in transform
return self.fn(fgraph, node)
^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\tensor\rewriting\basic.py", line 1138, in constant_folding
thunk = node.op.make_thunk(node, storage_map, compute_map, no_recycling=[])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\link\c\op.py", line 131, in make_thunk
return self.make_c_thunk(node, storage_map, compute_map, no_recycling)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\link\c\op.py", line 96, in make_c_thunk
outputs = cl.make_thunk(
^^^^^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\link\c\basic.py", line 1200, in make_thunk
cthunk, module, in_storage, out_storage, error_storage = self.compile(
^^^^^^^^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\link\c\basic.py", line 1120, in compile
thunk, module = self.cthunk_factory(
^^^^^^^^^^^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\link\c\basic.py", line 1644, in cthunk_factory
module = cache.module_from_key(key=key, lnk=self)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\link\c\cmodule.py", line 1206, in module_from_key
module = self._get_from_hash(module_hash, key)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\link\c\cmodule.py", line 1109, in _get_from_hash
with lock_ctx():
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\contextlib.py", line 137, in enter
return next(self.gen)
^^^^^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\compile\compilelock.py", line 74, in lock_ctx
fl.acquire(timeout=timeout)
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\filelock_api.py", line 222, in acquire
raise Timeout(lock_filename) # noqa: TRY301
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
filelock._error.Timeout: The file lock 'C:\Users\EDUARDO\AppData\Local\PyTensor\compiledir_Windows-10-10.0.19045-SP0-Intel64_Family_6_Model_158_Stepping_13_GenuineIntel-3.11.4-64.lock' could not be acquired.

NUTS: [alpha, theta]
ERROR (pytensor.graph.rewriting.basic): Rewrite failure due to: constant_folding
ERROR (pytensor.graph.rewriting.basic): node: Mul([-1.], likelihood{[ 14. 35. ... 56. 42.]}, [1.])
ERROR (pytensor.graph.rewriting.basic): TRACEBACK:
ERROR (pytensor.graph.rewriting.basic): Traceback (most recent call last):
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\graph\rewriting\basic.py", line 1914, in process_node
replacements = node_rewriter.transform(fgraph, node)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\graph\rewriting\basic.py", line 1074, in transform
return self.fn(fgraph, node)
^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\tensor\rewriting\basic.py", line 1138, in constant_folding
thunk = node.op.make_thunk(node, storage_map, compute_map, no_recycling=[])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\link\c\op.py", line 131, in make_thunk
return self.make_c_thunk(node, storage_map, compute_map, no_recycling)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\link\c\op.py", line 96, in make_c_thunk
outputs = cl.make_thunk(
^^^^^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\link\c\basic.py", line 1200, in make_thunk
cthunk, module, in_storage, out_storage, error_storage = self.compile(
^^^^^^^^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\link\c\basic.py", line 1120, in compile
thunk, module = self.cthunk_factory(
^^^^^^^^^^^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\link\c\basic.py", line 1644, in cthunk_factory
module = cache.module_from_key(key=key, lnk=self)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\link\c\cmodule.py", line 1206, in module_from_key
module = self._get_from_hash(module_hash, key)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\link\c\cmodule.py", line 1120, in _get_from_hash
self.check_key(key, key_data.key_pkl)
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\link\c\cmodule.py", line 1290, in check_key
key_data = pickle.load(f)
^^^^^^^^^^^^^^
_pickle.UnpicklingError: invalid load key, '3'.

We recommend running at least 4 chains for robust computation of convergence diagnostics
Sampling: [likelihood]
ERROR (pytensor.graph.rewriting.basic): Rewrite failure due to: constant_folding
ERROR (pytensor.graph.rewriting.basic): node: Mul([-1.], likelihood{[ 14. 35. ... 56. 42.]}, [1.])
ERROR (pytensor.graph.rewriting.basic): TRACEBACK:
ERROR (pytensor.graph.rewriting.basic): Traceback (most recent call last):
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\graph\rewriting\basic.py", line 1914, in process_node
replacements = node_rewriter.transform(fgraph, node)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\graph\rewriting\basic.py", line 1074, in transform
return self.fn(fgraph, node)
^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\tensor\rewriting\basic.py", line 1138, in constant_folding
thunk = node.op.make_thunk(node, storage_map, compute_map, no_recycling=[])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\link\c\op.py", line 131, in make_thunk
return self.make_c_thunk(node, storage_map, compute_map, no_recycling)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\link\c\op.py", line 96, in make_c_thunk
outputs = cl.make_thunk(
^^^^^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\link\c\basic.py", line 1200, in make_thunk
cthunk, module, in_storage, out_storage, error_storage = self.compile(
^^^^^^^^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\link\c\basic.py", line 1120, in compile
thunk, module = self.cthunk_factory(
^^^^^^^^^^^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\link\c\basic.py", line 1644, in cthunk_factory
module = cache.module_from_key(key=key, lnk=self)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\link\c\cmodule.py", line 1206, in module_from_key
module = self._get_from_hash(module_hash, key)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\link\c\cmodule.py", line 1111, in _get_from_hash
key_data.add_key(key, save_pkl=bool(key[0]))
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\link\c\cmodule.py", line 544, in add_key
assert key not in self.keys
^^^^^^^^^^^^^^^^^^^^
AssertionError

ERROR (pytensor.graph.rewriting.basic): Rewrite failure due to: constant_folding
ERROR (pytensor.graph.rewriting.basic): node: Sum{axes=None}([-36. -29. ... -64. -54.])
ERROR (pytensor.graph.rewriting.basic): TRACEBACK:
ERROR (pytensor.graph.rewriting.basic): Traceback (most recent call last):
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\graph\rewriting\basic.py", line 1914, in process_node
replacements = node_rewriter.transform(fgraph, node)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\graph\rewriting\basic.py", line 1074, in transform
return self.fn(fgraph, node)
^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\tensor\rewriting\basic.py", line 1138, in constant_folding
thunk = node.op.make_thunk(node, storage_map, compute_map, no_recycling=[])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\link\c\op.py", line 131, in make_thunk
return self.make_c_thunk(node, storage_map, compute_map, no_recycling)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\link\c\op.py", line 96, in make_c_thunk
outputs = cl.make_thunk(
^^^^^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\link\c\basic.py", line 1200, in make_thunk
cthunk, module, in_storage, out_storage, error_storage = self.compile(
^^^^^^^^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\link\c\basic.py", line 1120, in compile
thunk, module = self.cthunk_factory(
^^^^^^^^^^^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\link\c\basic.py", line 1644, in cthunk_factory
module = cache.module_from_key(key=key, lnk=self)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\link\c\cmodule.py", line 1206, in module_from_key
module = self._get_from_hash(module_hash, key)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\link\c\cmodule.py", line 1120, in _get_from_hash
self.check_key(key, key_data.key_pkl)
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\link\c\cmodule.py", line 1290, in check_key
key_data = pickle.load(f)
^^^^^^^^^^^^^^
_pickle.UnpicklingError: state is not a dictionary

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
joblib.externals.loky.process_executor._RemoteTraceback:
"""
Traceback (most recent call last):
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\joblib\externals\loky\process_executor.py", line 463, in _process_worker
r = call_item()
^^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\joblib\externals\loky\process_executor.py", line 291, in call
return self.fn(*self.args, **self.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\joblib\parallel.py", line 588, in call
return [func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\joblib\parallel.py", line 588, in
return [func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\EDUARDO\Documents\repos\myclient-myproject\scripts\lib\bayes_independent_model.py", line 57, in bayes_independent_model
trace = pm.sample(
^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pymc\sampling\mcmc.py", line 653, in sample
step = assign_step_methods(model, step, methods=pm.STEP_METHODS, step_kwargs=kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pymc\sampling\mcmc.py", line 233, in assign_step_methods
return instantiate_steppers(model, steps, selected_steps, step_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pymc\sampling\mcmc.py", line 134, in instantiate_steppers
step = step_class(vars=vars, model=model, **args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pymc\step_methods\hmc\nuts.py", line 180, in init
super().init(vars, **kwargs)
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pymc\step_methods\hmc\base_hmc.py", line 109, in init
super().init(vars, blocked=blocked, model=self._model, dtype=dtype, **pytensor_kwargs)
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pymc\step_methods\arraystep.py", line 164, in init
func = model.logp_dlogp_function(vars, dtype=dtype, **pytensor_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pymc\model.py", line 655, in logp_dlogp_function
return ValueGradFunction(costs, grad_vars, extra_vars_and_values, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pymc\model.py", line 394, in init
self.pytensor_function = compile_pymc(inputs, outputs, givens=givens, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pymc\pytensorf.py", line 1196, in compile_pymc
pytensor_function = pytensor.function(
^^^^^^^^^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\compile\function_init
.py", line 315, in function
fn = pfunc(
^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\compile\function\pfunc.py", line 367, in pfunc
return orig_function(
^^^^^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\compile\function\types.py", line 1744, in orig_function
m = Maker(
^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\compile\function\types.py", line 1518, in init
self.prepare_fgraph(inputs, outputs, found_updates, fgraph, mode, profile)
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\compile\function\types.py", line 1411, in prepare_fgraph
rewriter_profile = rewriter(fgraph)
^^^^^^^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\graph\rewriting\basic.py", line 125, in call
return self.rewrite(fgraph)
^^^^^^^^^^^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\graph\rewriting\basic.py", line 121, in rewrite
return self.apply(fgraph, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\graph\rewriting\basic.py", line 292, in apply
sub_prof = rewriter.apply(fgraph)
^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\graph\rewriting\basic.py", line 2450, in apply
sub_prof = grewrite.apply(fgraph)
^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\graph\rewriting\basic.py", line 2032, in apply
nb += self.process_node(fgraph, node)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\graph\rewriting\basic.py", line 1917, in process_node
self.failure_callback(
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\graph\rewriting\basic.py", line 1770, in warn_inplace
return cls.warn(exc, nav, repl_pairs, node_rewriter, node)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\graph\rewriting\basic.py", line 1758, in warn
raise exc
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\graph\rewriting\basic.py", line 1914, in process_node
replacements = node_rewriter.transform(fgraph, node)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\graph\rewriting\basic.py", line 1074, in transform
return self.fn(fgraph, node)
^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\tensor\rewriting\basic.py", line 1138, in constant_folding
thunk = node.op.make_thunk(node, storage_map, compute_map, no_recycling=[])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\link\c\op.py", line 131, in make_thunk
return self.make_c_thunk(node, storage_map, compute_map, no_recycling)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\link\c\op.py", line 96, in make_c_thunk
outputs = cl.make_thunk(
^^^^^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\link\c\basic.py", line 1200, in make_thunk
cthunk, module, in_storage, out_storage, error_storage = self.compile(
^^^^^^^^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\link\c\basic.py", line 1120, in compile
thunk, module = self.cthunk_factory(
^^^^^^^^^^^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\link\c\basic.py", line 1644, in cthunk_factory
module = cache.module_from_key(key=key, lnk=self)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\link\c\cmodule.py", line 1206, in module_from_key
module = self._get_from_hash(module_hash, key)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\link\c\cmodule.py", line 1111, in _get_from_hash
key_data.add_key(key, save_pkl=bool(key[0]))
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\pytensor\link\c\cmodule.py", line 544, in add_key
assert key not in self.keys
^^^^^^^^^^^^^^^^^^^^
AssertionError
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "C:\Users\EDUARDO\Documents\repos\myclient-myproject\dev\hierarchical_models.py", line 159, in
traces = Parallel(n_jobs=n_cores,
^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\joblib\parallel.py", line 1944, in call
return output if self.return_generator else list(output)
^^^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\joblib\parallel.py", line 1587, in _get_outputs
yield from self._retrieve()
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\joblib\parallel.py", line 1691, in _retrieve
self._raise_error_fast()
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\joblib\parallel.py", line 1726, in _raise_error_fast
error_job.get_result(self.timeout)
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\joblib\parallel.py", line 735, in get_result
return self._return_or_raise()
^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\EDUARDO\anaconda3\envs\myproject-hierarchical-models\Lib\site-packages\joblib\parallel.py", line 753, in _return_or_raise
raise self._result
AssertionError

@ricardoV94
Copy link
Member

Looks like a timeout due to the locking mechanism of the c compilation in PyTensor.

Pinging @ferrine @lucianopaz to see if they have any advice. I think this is a known limitation of PyTensor.

Taking a step back, what are you trying to achieve exactly? Run the same model on different datasets?

@PL-EduardoSanchez
Copy link
Author

Taking a step back, what are you trying to achieve exactly? Run the same model on different datasets?

Indeed, this is what I'm doing.

@ferrine
Copy link
Member

ferrine commented Jul 6, 2023

This is a known limitation (compile/cache lock), I suggest to use https://github.com/pymc-devs/nutpie, specifically compiled_model.with_data(...)

@PL-EduardoSanchez
Copy link
Author

Thank you very much, @ferrine. I'll take a look at this solution.

@ricardoV94 ricardoV94 changed the title BUG: ERROR (pytensor.graph.rewriting.basic): Rewrite failure due to: constant_folding Issue compiling pytensor functions in multiple processes Jul 12, 2023
@kshhhv
Copy link

kshhhv commented Sep 14, 2023

@PL-EduardoSanchez, Did you find the solution to this issue? I am getting the same error when running the model on different datasets in parallel.

I also tried increasing the compile_timeout of the PyTensor mentioned here, but it didn't work.

@PL-EduardoSanchez
Copy link
Author

Hi @kshhhv. I tried with "nutpie", with compiled_model(), but I got the same kinds of problems.

@isilber
Copy link

isilber commented Sep 19, 2023

@PL-EduardoSanchez @kshhhv , I came across the same issue upon using Pymc in multiple processes.
After quite some trial and error, including several interesting solutions, I found here, and in other related threads (compile_timeout, reinstalling packages using special directives, etc.), I was eventually able to find a working solution by deleting the temp directory that interrupts pytensor compilation.
So, in @PL-EduardoSanchez 's case, given the error:
filelock._error.Timeout: The file lock 'C:\Users\EDUARDO\AppData\Local\PyTensor\compiledir_Windows-10-10.0.19045-SP0-Intel64_Family_6_Model_158_Stepping_13_GenuineIntel-3.11.4-64.lock' could not be acquired.
try deleting the 'compiledir_Windows-10-10.0.19045-SP0-Intel64_Family_6_Model_158_Stepping_13_GenuineIntel-3.11.4-64' sub-directory. I'm not sure whether this will solve your issue but it's worth a try.

@PL-EduardoSanchez
Copy link
Author

Thank you very much, @isilber. Currently I'm not actively working on the project where I used "pymc", but I'll give it a try in the future.

@philpatton
Copy link

I was having the same issue until trying the solution proposed by @isilber. So far, that seems to have worked. Thank you @isilber, lifesaver!!

@isilber
Copy link

isilber commented Sep 27, 2023

Thanks, @philpatton.
A quick update. This method started to fail once I spawned 100 processes simultaneously, but I came up with a robust automated workaround by specifying different compilation directories for PyTensor for each process, which is the source of this issue, and deleting the directory if it already exists.
This would look something like the following as part of a bash script:

for i in {1..100..1}
do
  RUN_NUM=i
  COMP_DIR="/path/to/.pytensor_compiles"
  COMP_FORM="compiledir_$((RUN_NUM))_mcmc"  # compilation directory name format
  PYT_FLG="PYTENSOR_FLAGS='compiledir_format=${COMP_FORM},base_compiledir=${COMP_DIR}'"  # PyTensor flags
  RUN_STR=" python pymc_script_name.py $((RUN_NUM))&"
  RUN_COM=$PYT_FLG$RUN_STR  # command string
  if [ -d "$COMP_DIR/$COMP_FORM" ]
  then
    # delete the zombie directory
    rm -r "$COMP_DIR/$COMP_FORM"
  fi
  eval "$RUN_COM"  # Run command string
done

@Hope2925
Copy link

Any ideas on how you would apply this solution when you are applying the multiprocessing in python rather than bash? I have some pre-fitting data in python dictionary format that is identical for all runs to the point where running them each separately on bash doesn't make sense.

The relevant multiprocessing code is below:

import multiprocessing as mp
    pool = mp.Pool(cpu_num)
    res = pool.starmap(
        fit_routine, 
        [(i, config, pad_dict) for i in mpargs.items()]
    )
    pool.close()

fit_rountine is the fitting function and the (i, config, pad_dict) for i in mpargs.items()] refers to the arguments for the fitting function
Thank you for any guidance!

@brendan-m-murphy
Copy link

brendan-m-murphy commented Apr 27, 2024

@Hope2925 you could write a python script that loads the dictionary then samples, then call that in bash. This is roughly how I run things, except I need to use SLURM on our cluster. But all the python code goes in a python script that's called in a bash script. (You need to write the trace to a file in the python script.)

There's probably a way to set environment variables in multiprocessing, but I'm not familiar with it. Maybe the function you pass to the worker could use 'os.environ' to set the environment variables. Basically you want to set the values from PYT_FLAG as environment variables.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants