Skip to content

Add example for get_cache_data() #625

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Mar 29, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 10 additions & 2 deletions executorlib/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,11 +9,19 @@
SlurmJobExecutor,
)

__version__ = _get_versions()["version"]
__all__: list = [
__all__: list[str] = [
"FluxJobExecutor",
"FluxClusterExecutor",
"SingleNodeExecutor",
"SlurmJobExecutor",
"SlurmClusterExecutor",
]

try:
from executorlib.standalone.hdf import get_cache_data
except ImportError:
pass

Check warning on line 23 in executorlib/__init__.py

View check run for this annotation

Codecov / codecov/patch

executorlib/__init__.py#L22-L23

Added lines #L22 - L23 were not covered by tests
else:
__all__ += ["get_cache_data"]

__version__ = _get_versions()["version"]
187 changes: 136 additions & 51 deletions notebooks/1-single-node.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -26,13 +26,17 @@
"id": "b1907f12-7378-423b-9b83-1b65fc0a20f5",
"metadata": {},
"outputs": [],
"source": "from executorlib import SingleNodeExecutor"
"source": [
"from executorlib import SingleNodeExecutor"
]
},
{
"cell_type": "markdown",
"id": "1654679f-38b3-4699-9bfe-b48cbde0b2db",
"metadata": {},
"source": "It is recommended to use the `SingleNodeExecutor` class in combination with a `with`-statement. This guarantees the processes created by the `SingleNodeExecutor` class to evaluate the Python functions are afterward closed and do not remain ghost processes. A function is then submitted using the `submit(fn, /, *args, **kwargs)` function which executes a given function `fn` as `fn(*args, **kwargs)`. The `submit()` function returns a [concurrent.futures.Future](https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.Future) object, as defined by the Python Standard Library. As a first example we submit the function `sum()` to calculate the sum of the list `[1, 1]`:"
"source": [
"It is recommended to use the `SingleNodeExecutor` class in combination with a `with`-statement. This guarantees the processes created by the `SingleNodeExecutor` class to evaluate the Python functions are afterward closed and do not remain ghost processes. A function is then submitted using the `submit(fn, /, *args, **kwargs)` function which executes a given function `fn` as `fn(*args, **kwargs)`. The `submit()` function returns a [concurrent.futures.Future](https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.Future) object, as defined by the Python Standard Library. As a first example we submit the function `sum()` to calculate the sum of the list `[1, 1]`:"
]
},
{
"cell_type": "code",
Expand All @@ -45,8 +49,8 @@
"output_type": "stream",
"text": [
"2\n",
"CPU times: user 100 ms, sys: 70.7 ms, total: 171 ms\n",
"Wall time: 1.94 s\n"
"CPU times: user 84.4 ms, sys: 59.3 ms, total: 144 ms\n",
"Wall time: 482 ms\n"
]
}
],
Expand All @@ -61,7 +65,9 @@
"cell_type": "markdown",
"id": "a1109584-9db2-4f9d-b3ed-494d96241396",
"metadata": {},
"source": "As expected the result of the summation `sum([1, 1])` is `2`. The same result is retrieved from the [concurrent.futures.Future](https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.Future) object received from the submission of the `sum()` as it is printed here `print(future.result())`. For most Python functions and especially the `sum()` function it is computationally not efficient to initialize the `SingleNodeExecutor` class only for the execution of a single function call, rather it is more computationally efficient to initialize the `SingleNodeExecutor` class once and then submit a number of functions. This can be achieved with a loop. For example the sum of the pairs `[2, 2]`, `[3, 3]` and `[4, 4]` can be achieved with a for-loop inside the context of the `SingleNodeExecutor()` class as provided by the `with`-statement."
"source": [
"As expected the result of the summation `sum([1, 1])` is `2`. The same result is retrieved from the [concurrent.futures.Future](https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.Future) object received from the submission of the `sum()` as it is printed here `print(future.result())`. For most Python functions and especially the `sum()` function it is computationally not efficient to initialize the `SingleNodeExecutor` class only for the execution of a single function call, rather it is more computationally efficient to initialize the `SingleNodeExecutor` class once and then submit a number of functions. This can be achieved with a loop. For example the sum of the pairs `[2, 2]`, `[3, 3]` and `[4, 4]` can be achieved with a for-loop inside the context of the `SingleNodeExecutor()` class as provided by the `with`-statement."
]
},
{
"cell_type": "code",
Expand All @@ -74,8 +80,8 @@
"output_type": "stream",
"text": [
"[4, 6, 8]\n",
"CPU times: user 49.4 ms, sys: 29.2 ms, total: 78.7 ms\n",
"Wall time: 1.75 s\n"
"CPU times: user 39.7 ms, sys: 26.8 ms, total: 66.5 ms\n",
"Wall time: 524 ms\n"
]
}
],
Expand Down Expand Up @@ -105,8 +111,8 @@
"output_type": "stream",
"text": [
"[10, 12, 14]\n",
"CPU times: user 40.5 ms, sys: 28.1 ms, total: 68.6 ms\n",
"Wall time: 1.09 s\n"
"CPU times: user 28 ms, sys: 23.1 ms, total: 51.1 ms\n",
"Wall time: 517 ms\n"
]
}
],
Expand All @@ -121,7 +127,9 @@
"cell_type": "markdown",
"id": "ac86bf47-4eb6-4d7c-acae-760b880803a8",
"metadata": {},
"source": "These three examples cover the general functionality of the `SingleNodeExecutor` class. Following the [Executor](https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.Executor) interface as it is defined in the Python standard library."
"source": [
"These three examples cover the general functionality of the `SingleNodeExecutor` class. Following the [Executor](https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.Executor) interface as it is defined in the Python standard library."
]
},
{
"cell_type": "markdown",
Expand Down Expand Up @@ -349,8 +357,8 @@
"output_type": "stream",
"text": [
"2\n",
"CPU times: user 37.1 ms, sys: 21.8 ms, total: 58.9 ms\n",
"Wall time: 1.09 s\n"
"CPU times: user 31.1 ms, sys: 19.1 ms, total: 50.1 ms\n",
"Wall time: 394 ms\n"
]
}
],
Expand Down Expand Up @@ -388,7 +396,9 @@
"cell_type": "markdown",
"id": "9e1212c4-e3fb-4e21-be43-0a4f0a08b856",
"metadata": {},
"source": "Still the resource dictionary parameter can still be set during the initialisation of the `SingleNodeExecutor` class. Internally, this groups the created Python processes in fixed allocations and afterwards submit Python functions to these allocations."
"source": [
"Still the resource dictionary parameter can still be set during the initialisation of the `SingleNodeExecutor` class. Internally, this groups the created Python processes in fixed allocations and afterwards submit Python functions to these allocations."
]
},
{
"cell_type": "code",
Expand All @@ -413,34 +423,7 @@
"experience performance degradation.\n",
"\n",
" Local host: MacBook-Pro.local\n",
" System call: unlink(2) /var/folders/z7/3vhrmssx60v240x_ndq448h80000gn/T//ompi.MacBook-Pro.501/pid.22031/1/vader_segment.MacBook-Pro.501.17620001.1\n",
" Error: No such file or directory (errno 2)\n",
"--------------------------------------------------------------------------\n",
"--------------------------------------------------------------------------\n",
"A system call failed during shared memory initialization that should\n",
"not have. It is likely that your MPI job will now either abort or\n",
"experience performance degradation.\n",
"\n",
" Local host: MacBook-Pro.local\n",
" System call: unlink(2) /var/folders/z7/3vhrmssx60v240x_ndq448h80000gn/T//ompi.MacBook-Pro.501/pid.22028/1/vader_segment.MacBook-Pro.501.17610001.1\n",
" Error: No such file or directory (errno 2)\n",
"--------------------------------------------------------------------------\n",
"--------------------------------------------------------------------------\n",
"A system call failed during shared memory initialization that should\n",
"not have. It is likely that your MPI job will now either abort or\n",
"experience performance degradation.\n",
"\n",
" Local host: MacBook-Pro.local\n",
" System call: unlink(2) /var/folders/z7/3vhrmssx60v240x_ndq448h80000gn/T//ompi.MacBook-Pro.501/pid.22030/1/vader_segment.MacBook-Pro.501.17630001.1\n",
" Error: No such file or directory (errno 2)\n",
"--------------------------------------------------------------------------\n",
"--------------------------------------------------------------------------\n",
"A system call failed during shared memory initialization that should\n",
"not have. It is likely that your MPI job will now either abort or\n",
"experience performance degradation.\n",
"\n",
" Local host: MacBook-Pro.local\n",
" System call: unlink(2) /var/folders/z7/3vhrmssx60v240x_ndq448h80000gn/T//ompi.MacBook-Pro.501/pid.22029/1/vader_segment.MacBook-Pro.501.17600001.1\n",
" System call: unlink(2) /var/folders/z7/3vhrmssx60v240x_ndq448h80000gn/T//ompi.MacBook-Pro.501/pid.55070/1/vader_segment.MacBook-Pro.501.96730001.1\n",
" Error: No such file or directory (errno 2)\n",
"--------------------------------------------------------------------------\n"
]
Expand Down Expand Up @@ -486,7 +469,9 @@
"cell_type": "markdown",
"id": "d07cf107-3627-4cb0-906c-647497d6e0d2",
"metadata": {},
"source": "The function `calc_with_preload()` requires three inputs `i`, `j` and `k`. But when the function is submitted to the executor only two inputs are provided `fs = exe.submit(calc, 2, j=5)`. In this case the first input parameter is mapped to `i=2`, the second input parameter is specified explicitly `j=5` but the third input parameter `k` is not provided. So the `SingleNodeExecutor` automatically checks the keys set in the `init_function()` function. In this case the returned dictionary `{\"j\": 4, \"k\": 3, \"l\": 2}` defines `j=4`, `k=3` and `l=2`. For this specific call of the `calc_with_preload()` function, `i` and `j` are already provided so `j` is not required, but `k=3` is used from the `init_function()` and as the `calc_with_preload()` function does not define the `l` parameter this one is also ignored."
"source": [
"The function `calc_with_preload()` requires three inputs `i`, `j` and `k`. But when the function is submitted to the executor only two inputs are provided `fs = exe.submit(calc, 2, j=5)`. In this case the first input parameter is mapped to `i=2`, the second input parameter is specified explicitly `j=5` but the third input parameter `k` is not provided. So the `SingleNodeExecutor` automatically checks the keys set in the `init_function()` function. In this case the returned dictionary `{\"j\": 4, \"k\": 3, \"l\": 2}` defines `j=4`, `k=3` and `l=2`. For this specific call of the `calc_with_preload()` function, `i` and `j` are already provided so `j` is not required, but `k=3` is used from the `init_function()` and as the `calc_with_preload()` function does not define the `l` parameter this one is also ignored."
]
},
{
"cell_type": "code",
Expand Down Expand Up @@ -538,8 +523,8 @@
"output_type": "stream",
"text": [
"[2, 4, 6]\n",
"CPU times: user 547 ms, sys: 161 ms, total: 708 ms\n",
"Wall time: 1.33 s\n"
"CPU times: user 512 ms, sys: 138 ms, total: 650 ms\n",
"Wall time: 865 ms\n"
]
}
],
Expand Down Expand Up @@ -571,8 +556,8 @@
"output_type": "stream",
"text": [
"[2, 4, 6]\n",
"CPU times: user 52.1 ms, sys: 41.1 ms, total: 93.2 ms\n",
"Wall time: 1.13 s\n"
"CPU times: user 56.7 ms, sys: 32.5 ms, total: 89.2 ms\n",
"Wall time: 620 ms\n"
]
}
],
Expand All @@ -583,6 +568,106 @@
" print([f.result() for f in future_lst])"
]
},
{
"cell_type": "markdown",
"id": "5144a035-633e-4e60-a362-f3b15b28848b",
"metadata": {},
"source": [
"An additional advantage of the cache is the option to gather the results of previously submitted functions. Using the `get_cache_data()` function the results of each Python function is converted to a dictionary. This list of dictionaries can be converted to a `pandas.DataFrame` for further processing:"
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "f574b9e1-de55-4e38-aef7-a4bed540e040",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>function</th>\n",
" <th>input_args</th>\n",
" <th>input_kwargs</th>\n",
" <th>output</th>\n",
" <th>runtime</th>\n",
" <th>filename</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>&lt;built-in function sum&gt;</td>\n",
" <td>([1, 1],)</td>\n",
" <td>{}</td>\n",
" <td>2</td>\n",
" <td>0.001686</td>\n",
" <td>sum0d968285d17368d1c34ea7392309bcc5.h5out</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>&lt;built-in function sum&gt;</td>\n",
" <td>([3, 3],)</td>\n",
" <td>{}</td>\n",
" <td>6</td>\n",
" <td>0.136151</td>\n",
" <td>sum0102e33bb2921ae07a3bbe3db5d3dec9.h5out</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>&lt;built-in function sum&gt;</td>\n",
" <td>([2, 2],)</td>\n",
" <td>{}</td>\n",
" <td>4</td>\n",
" <td>0.136006</td>\n",
" <td>sum6270955d7c8022a0c1027aafaee64439.h5out</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" function input_args input_kwargs output runtime \\\n",
"0 <built-in function sum> ([1, 1],) {} 2 0.001686 \n",
"1 <built-in function sum> ([3, 3],) {} 6 0.136151 \n",
"2 <built-in function sum> ([2, 2],) {} 4 0.136006 \n",
"\n",
" filename \n",
"0 sum0d968285d17368d1c34ea7392309bcc5.h5out \n",
"1 sum0102e33bb2921ae07a3bbe3db5d3dec9.h5out \n",
"2 sum6270955d7c8022a0c1027aafaee64439.h5out "
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas\n",
"from executorlib import get_cache_data\n",
"\n",
"df = pandas.DataFrame(get_cache_data(cache_directory=\"./cache\"))\n",
"df"
]
},
{
"cell_type": "markdown",
"id": "68092479-e846-494a-9ac9-d9638b102bd8",
Expand All @@ -593,15 +678,15 @@
},
{
"cell_type": "code",
"execution_count": 19,
"execution_count": 20,
"id": "34a9316d-577f-4a63-af14-736fb4e6b219",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['sumb6a5053f96b7031239c2e8d0e7563ce4.h5out', 'sum5171356dfe527405c606081cfbd2dffe.h5out', 'sumd1bf4ee658f1ac42924a2e4690e797f4.h5out']\n"
"['sum0d968285d17368d1c34ea7392309bcc5.h5out', 'sum0102e33bb2921ae07a3bbe3db5d3dec9.h5out', 'sum6270955d7c8022a0c1027aafaee64439.h5out']\n"
]
}
],
Expand Down Expand Up @@ -637,7 +722,7 @@
},
{
"cell_type": "code",
"execution_count": 20,
"execution_count": 21,
"id": "d8b75a26-479d-405e-8895-a8d56b3f0f4b",
"metadata": {},
"outputs": [],
Expand All @@ -658,7 +743,7 @@
},
{
"cell_type": "code",
"execution_count": 21,
"execution_count": 22,
"id": "35fd5747-c57d-4926-8d83-d5c55a130ad6",
"metadata": {},
"outputs": [
Expand Down Expand Up @@ -692,7 +777,7 @@
},
{
"cell_type": "code",
"execution_count": 22,
"execution_count": 23,
"id": "f67470b5-af1d-4add-9de8-7f259ca67324",
"metadata": {},
"outputs": [
Expand Down
2 changes: 1 addition & 1 deletion tests/test_singlenodeexecutor_cache.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
from executorlib.standalone.serialize import cloudpickle_register

try:
from executorlib.standalone.hdf import get_cache_data
from executorlib import get_cache_data

skip_h5py_test = False
except ImportError:
Expand Down
Loading