You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/usage/database_credential_file.rst
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -19,7 +19,7 @@ Below is an example of a database credential file, that connects to a server wit
19
19
server: example.mysqlserver.com
20
20
21
21
However, for security reasons, databases might only be accessible from a specific IP address. In these cases, one can use an ssh jumphost. This means that ``PyExperimenter`` will first connect to the ssh server
22
-
that has access to the database and then connect to the database server from there. This is done by adding an additional ``Ssh`` section to the database credential file.
22
+
that has access to the database and then connect to the database server from there. This is done by adding an additional ``Ssh`` section to the database credential file, and can be activated either by a ``PyExperimenter`` keyword argument or in the :ref:`experimenter configuration file <experiment_configuration_file>`.
23
23
The following example shows how to connect to a database server using an SSH server with the address ``ssh_hostname`` and the port ``optional_ssh_port``.
To distribute the execution of experiments across multiple machines, you can follow the standard :ref:`procedure of using PyExperimenter <execution>`, with the following additional considerations.
7
+
8
+
--------------
9
+
Database Setup
10
+
--------------
11
+
You need to have a shared database that is accessible to all the machines and supports concurrent access. Thus, ``SQLite`` is not a good choice for this purpose, which is why we recommend using a ``MySQL`` database instead.
12
+
13
+
--------
14
+
Workflow
15
+
--------
16
+
While it is theoretically possible for multiple jobs to create new experiments, this introduces the possibility of creating the same experiment multiple times. To prevent this, we recommend the following workflow, where a process is either the ``database handler``, i.e. responsible to create/reset experiment, or a ``experiment executer`` actually executing experiments.
17
+
18
+
.. note::
19
+
Make sure to use the same :ref:`experiment configuration file <experiment_configuration_file>`, and :ref:`database credential file <database_credential_file>` for both types.
20
+
21
+
22
+
Database Handling
23
+
-----------------
24
+
25
+
The ``database handler`` process creates/resets the experiments and stores them in the database once in advance.
26
+
27
+
.. code-block:: python
28
+
29
+
from py_experimenter.experimenter import PyExperimenter
Multiple ``experiment executer`` processes execute the experiments in parallel on different machines, all using the same code. In a typical HPC context, each job starts a single ``experiment executer`` process on a different node.
42
+
43
+
.. code-block:: python
44
+
45
+
from py_experimenter.experimenter import PyExperimenter
When executing jobs on clusters one might want to use `hydra combined with submitit <hydra_submitit_>`_ or a similar software that configures different jobs. If so it makes sense to create the database initially
Copy file name to clipboardExpand all lines: docs/source/usage/execution.rst
+76-1Lines changed: 76 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -109,6 +109,78 @@ An experiment can be executed easily with the following call:
109
109
- ``max_experiments`` determines how many experiments will be executed by this ``PyExperimenter``. If set to ``-1``, it will execute experiments in a sequential fashion until no more open experiments are available.
110
110
- ``random_order`` determines if the experiments will be executed in a random order. By default, the parameter is set to ``False``, meaning that experiments will be executed ordered by their ``id``.
111
111
112
+
.. _add_experiment_and_execute:
113
+
114
+
--------------------------
115
+
Add Experiment and Execute
116
+
--------------------------
117
+
118
+
Instead of filling the database table with rows and then executing the experiments, it is also possible to add an experiment and execute it directly. This can be done with the following call:
This function may be useful in case of dependencies, where the result of one experiment is needed to configure the next one, or if the experiments are supposed to be configured with software such as `Hydra <hydra_>`_.
128
+
129
+
.. _attach:
130
+
131
+
----------------------------
132
+
Attach to Running Experiment
133
+
----------------------------
134
+
135
+
For cases of multiprocessing, where the ``experiment_function`` contains a main job, that runs multiple additional workers in other processes (maybe on a different machine), it is inconvenient to log all information through the main job. Therefore, we allow these workers to also attach to the database and log their information about the same experiment.
136
+
137
+
First, a worker experiment function wrapper has to be defined, which handles the parallel execution of something in a different process. The actual worker experiment function is defined inside the wrapper. The worker function is then attached to the experiment and logs its information on its own. In case more arguments are needed within the worker function, they can be passed to the wrapper function as keyword arguments.
The ``experimenter.attach`` function returns the result of ``worker_experiment_function``.
158
+
159
+
Second, the main experiment function has to be defined calling the above created wrapper, which is provided with the ``experiment_id`` and started in a different process:
Copy file name to clipboardExpand all lines: docs/source/usage/experiment_configuration_file.rst
+2Lines changed: 2 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -14,6 +14,7 @@ The experiment configuration file is primarily used to define the database backe
14
14
Database:
15
15
provider: sqlite
16
16
database: py_experimenter
17
+
use_ssh_tunnel: False
17
18
table:
18
19
name: example_general_usage
19
20
keyfields:
@@ -69,6 +70,7 @@ The ``Database`` section defines the database and its structure.
69
70
70
71
- ``provider``: The provider of the database connection. Currently, ``sqlite`` and ``mysql`` are supported. In the case of ``mysql`` an additional :ref:`database credential file <database_credential_file>` has to be created.
71
72
- ``database``: The name of the database to create or connect to.
73
+
- ``use_ssh_tunnel``: Flag to decide if the database is connected via ssh as defined in the :ref:`database credential file <database_credential_file>`. This is ignored if ``sqlite`` is chosen as provider. Optional Parameter, default is False.
72
74
- ``table``: Defines the structure and predefined values for the experiment table.
73
75
74
76
- ``name``: The name of the experiment table to create or connect to.
0 commit comments