Skip to content

Commit 9851b1b

Browse files
Document scikit-learn develop guide
1 parent a82300c commit 9851b1b

File tree

1 file changed

+5
-0
lines changed

1 file changed

+5
-0
lines changed

docs/dev.rst

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,8 @@ translated to a pipeline configuration, fitted and saved to disc using the funct
2626
reads a dataset from disc, fits a pipeline, and collect the performance result which is communicated back to the main process via a Queue. This worker manages
2727
resources using `Pynisher <https://github.com/automl/pynisher>`_, and it usually does so by creating a new process.
2828

29+
The Scikit-learn pipeline inherits from the `BaseEstimator <https://scikit-learn.org/stable/modules/generated/sklearn.base.BaseEstimator.html>`_, which implies that we have to honor the `Scikit-Learn development Guidelines <https://scikit-learn.org/stable/developers/develop.html>`_. Of particular interest is that any estimator must define as attributes, the arguments that the class constructor receives (see `get_params and set_params` from the above documentation).
30+
2931
Regarding multiprocessing, AutoPyTorch and SMAC work with `Dask.distributed <https://distributed.dask.org/en/latest/>`_. We only submits jobs to Dask up to the number of
3032
workers, and wait for a worker to be available before continuing.
3133

@@ -46,3 +48,6 @@ The AutoML Part
4648

4749
The ensemble builder and the individual model constructions are both regulated by the `BaseTask`. This entity fundamentally calls the aforementioned task, and wait until
4850
the time resource is exhausted.
51+
52+
We also rely on the `ConfigSpace <https://automl.github.io/ConfigSpace/master/index.html>`_ package to build a configuration space and sample configurations from it. A configuration in this context, determines the content of a pipeline (for example, that the final estimator will be a MLP, or that it will have PCA as preprocessing).The set of valid configurations is determined by the configuration space. The configuration space is build using the dataset characteristics, like type
53+
of features (categorical, numerical) or the target type (classification, regression).

0 commit comments

Comments
 (0)