-
Notifications
You must be signed in to change notification settings - Fork 73
Enable Optimizer state and Multi-Fidelity passthrough in SMAC #751
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
7f8a43b
8afb5f0
08575af
fcfca53
838c1db
7533b4e
3904020
4ffff6c
b7de120
7278994
c79294a
048269c
019192a
88d63c1
4e36f28
3326ac9
2399d3e
1f210b5
87a5af9
48af70f
cd8deff
271a79b
98c7398
9726410
bf4602b
8d2a894
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -72,6 +72,7 @@ | |
"sklearn", | ||
"skopt", | ||
"smac", | ||
"SOBOL", | ||
"sqlalchemy", | ||
"srcpaths", | ||
"subcmd", | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
# Optimizers | ||
|
||
This is a directory that contains wrappers for different optimizers to integrate into MLOS. | ||
This is implemented though child classes for the `BaseOptimizer` class defined in `optimizer.py`. | ||
|
||
The main goal of these optimizers is to `suggest` configurations, possibly based on prior trial data to find an optimum based on some objective(s). | ||
This process is interacted with through `register` and `suggest` interfaces. | ||
|
||
The following defintions are useful for understanding the implementation | ||
- `configuration`: a vector representation of a configuration of a system to be evaluated. | ||
- `score`: the objective(s) associated with a configuration | ||
- `metadata`: additional information about the evaluation, such as the runtime budget used during evaluation. | ||
- `context`: additional (static) information about the evaluation used to extend the internal model used for suggesting samples. | ||
For instance, a descriptor of the VM size (vCore count and # of GB of RAM), and some descriptor of the workload. | ||
The intent being to allow either sharing or indexing of trial info between "similar" experiments in order to help make the optimization process more efficient for new scenarios. | ||
> Note: This is not yet implemented. | ||
|
||
The interface for these classes can be described as follows: | ||
|
||
- `register`: this is a function that takes a configuration, a score, and, optionally, metadata about the evaluation to update the model for future evaluations. | ||
- `suggest`: this function returns a new configuration for evaluation. | ||
|
||
Some optimizers will return additional metadata for evaluation, that should be used during the register phase. | ||
This function can also optionally take context (not yet implemented), and an argument to force the function to return the default configuration. | ||
- `register_pending`: registers a configuration and metadata pair as pending to the optimizer. | ||
- `get_observations`: returns all observations reproted to the optimizer as a triplet of DataFrames (config, score, context, metadata). | ||
- `get_best_observations`: returns the best observation as a triplet of best (config, score, context, metadata) DataFrames. |
Large diffs are not rendered by default.
Original file line number | Diff line number | Diff line change | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
|
@@ -56,9 +56,11 @@ def __init__(self, *, | |||||||||
raise ValueError("Number of weights must match the number of optimization targets") | ||||||||||
|
||||||||||
self._space_adapter: Optional[BaseSpaceAdapter] = space_adapter | ||||||||||
self._observations: List[Tuple[pd.DataFrame, pd.DataFrame, Optional[pd.DataFrame]]] = [] | ||||||||||
self._observations: List[Tuple[pd.DataFrame, pd.DataFrame, Optional[pd.DataFrame], Optional[pd.DataFrame]]] = [] | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this gets a bit unwieldy. Should we have a named tuple for class PendingObservation(NamedTuple):
"""A named tuple representing a pending observation."""
configurations: pd.DataFrame
context: Optional[pd.DataFrame]
meta: Optional[pd.DataFrame] and do the same for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. or, maybe, have |
||||||||||
self._has_context: Optional[bool] = None | ||||||||||
self._pending_observations: List[Tuple[pd.DataFrame, Optional[pd.DataFrame]]] = [] | ||||||||||
self._pending_observations: List[Tuple[pd.DataFrame, Optional[pd.DataFrame], Optional[pd.DataFrame]]] = [] | ||||||||||
self.delayed_config: Optional[pd.DataFrame] = None | ||||||||||
self.delayed_metadata: Optional[pd.DataFrame] = None | ||||||||||
Comment on lines
+62
to
+63
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
probably should be private, right? |
||||||||||
|
||||||||||
def __repr__(self) -> str: | ||||||||||
return f"{self.__class__.__name__}(space_adapter={self.space_adapter})" | ||||||||||
|
@@ -69,7 +71,7 @@ def space_adapter(self) -> Optional[BaseSpaceAdapter]: | |||||||||
return self._space_adapter | ||||||||||
|
||||||||||
def register(self, configurations: pd.DataFrame, scores: pd.DataFrame, | ||||||||||
context: Optional[pd.DataFrame] = None) -> None: | ||||||||||
context: Optional[pd.DataFrame] = None, metadata: Optional[pd.DataFrame] = None) -> None: | ||||||||||
"""Wrapper method, which employs the space adapter (if any), before registering the configurations and scores. | ||||||||||
|
||||||||||
Parameters | ||||||||||
|
@@ -78,34 +80,37 @@ def register(self, configurations: pd.DataFrame, scores: pd.DataFrame, | |||||||||
Dataframe of configurations / parameters. The columns are parameter names and the rows are the configurations. | ||||||||||
scores : pd.DataFrame | ||||||||||
Scores from running the configurations. The index is the same as the index of the configurations. | ||||||||||
|
||||||||||
context : pd.DataFrame | ||||||||||
Not Yet Implemented. | ||||||||||
Not implemented yet. | ||||||||||
metadata : pd.DataFrame | ||||||||||
Implementation depends on instance (e.g., saved optimizer state to return). | ||||||||||
""" | ||||||||||
# Do some input validation. | ||||||||||
assert set(scores.columns) == set(self._optimization_targets), \ | ||||||||||
"Mismatched optimization targets." | ||||||||||
assert set(scores.columns) >= set(self._optimization_targets), "Mismatched optimization targets." | ||||||||||
assert self._has_context is None or self._has_context ^ (context is None), \ | ||||||||||
"Context must always be added or never be added." | ||||||||||
assert len(configurations) == len(scores), \ | ||||||||||
"Mismatched number of configurations and scores." | ||||||||||
if context is not None: | ||||||||||
assert len(configurations) == len(context), \ | ||||||||||
"Mismatched number of configurations and context." | ||||||||||
if metadata is not None: | ||||||||||
assert len(configurations) == len(metadata), \ | ||||||||||
"Mismatched number of configurations and metadata." | ||||||||||
assert configurations.shape[1] == len(self.parameter_space.values()), \ | ||||||||||
"Mismatched configuration shape." | ||||||||||
self._observations.append((configurations, scores, context)) | ||||||||||
self._observations.append((configurations, scores, context, metadata)) | ||||||||||
self._has_context = context is not None | ||||||||||
|
||||||||||
if self._space_adapter: | ||||||||||
configurations = self._space_adapter.inverse_transform(configurations) | ||||||||||
assert configurations.shape[1] == len(self.optimizer_parameter_space.values()), \ | ||||||||||
"Mismatched configuration shape after inverse transform." | ||||||||||
return self._register(configurations, scores, context) | ||||||||||
return self._register(configurations, scores, context, metadata) | ||||||||||
|
||||||||||
@abstractmethod | ||||||||||
def _register(self, configurations: pd.DataFrame, scores: pd.DataFrame, | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
Can force the args to be named to help avoid param ordering mistakes. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same for the elsewhere (e.g., public methods and suggest), though this might be a larger API change that needs its own PR first in prepration for this one since callers will also be affected. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agree. Let's fix |
||||||||||
context: Optional[pd.DataFrame] = None) -> None: | ||||||||||
context: Optional[pd.DataFrame] = None, metadata: Optional[pd.DataFrame] = None) -> None: | ||||||||||
"""Registers the given configurations and scores. | ||||||||||
|
||||||||||
Parameters | ||||||||||
|
@@ -114,13 +119,16 @@ def _register(self, configurations: pd.DataFrame, scores: pd.DataFrame, | |||||||||
Dataframe of configurations / parameters. The columns are parameter names and the rows are the configurations. | ||||||||||
scores : pd.DataFrame | ||||||||||
Scores from running the configurations. The index is the same as the index of the configurations. | ||||||||||
|
||||||||||
context : pd.DataFrame | ||||||||||
Not Yet Implemented. | ||||||||||
Not implemented yet. | ||||||||||
metadata : pd.DataFrame | ||||||||||
Implementaton depends on instance. | ||||||||||
""" | ||||||||||
pass # pylint: disable=unnecessary-pass # pragma: no cover | ||||||||||
|
||||||||||
def suggest(self, context: Optional[pd.DataFrame] = None, defaults: bool = False) -> pd.DataFrame: | ||||||||||
def suggest( | ||||||||||
self, context: Optional[pd.DataFrame] = None, defaults: bool = False | ||||||||||
) -> Tuple[pd.DataFrame, Optional[pd.DataFrame]]: | ||||||||||
""" | ||||||||||
Wrapper method, which employs the space adapter (if any), after suggesting a new configuration. | ||||||||||
|
||||||||||
|
@@ -136,13 +144,25 @@ def suggest(self, context: Optional[pd.DataFrame] = None, defaults: bool = False | |||||||||
------- | ||||||||||
configuration : pd.DataFrame | ||||||||||
Pandas dataframe with a single row. Column names are the parameter names. | ||||||||||
metadata : pd.DataFrame | ||||||||||
Pandas dataframe with a single row containing the metadata. | ||||||||||
Column names are the budget, seed, and instance of the evaluation, if valid. | ||||||||||
""" | ||||||||||
if defaults: | ||||||||||
configuration = config_to_dataframe(self.parameter_space.get_default_configuration()) | ||||||||||
self.delayed_config, self.delayed_metadata = self._suggest(context) | ||||||||||
|
||||||||||
configuration: pd.DataFrame = config_to_dataframe( | ||||||||||
self.parameter_space.get_default_configuration() | ||||||||||
) | ||||||||||
metadata = self.delayed_metadata | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: when creating PRs - try to keep your changes smaller. It's easier to review and debug. |
||||||||||
if self.space_adapter is not None: | ||||||||||
configuration = self.space_adapter.inverse_transform(configuration) | ||||||||||
else: | ||||||||||
configuration = self._suggest(context) | ||||||||||
if self.delayed_config is None: | ||||||||||
configuration, metadata = self._suggest(metadata) | ||||||||||
else: | ||||||||||
configuration, metadata = self.delayed_config, self.delayed_metadata | ||||||||||
self.delayed_config, self.delayed_metadata = None, None | ||||||||||
assert len(configuration) == 1, \ | ||||||||||
"Suggest must return a single configuration." | ||||||||||
assert set(configuration.columns).issubset(set(self.optimizer_parameter_space)), \ | ||||||||||
|
@@ -151,10 +171,12 @@ def suggest(self, context: Optional[pd.DataFrame] = None, defaults: bool = False | |||||||||
configuration = self._space_adapter.transform(configuration) | ||||||||||
assert set(configuration.columns).issubset(set(self.parameter_space)), \ | ||||||||||
"Space adapter produced a configuration that does not match the expected parameter space." | ||||||||||
return configuration | ||||||||||
return configuration, metadata | ||||||||||
|
||||||||||
@abstractmethod | ||||||||||
def _suggest(self, context: Optional[pd.DataFrame] = None) -> pd.DataFrame: | ||||||||||
def _suggest( | ||||||||||
self, context: Optional[pd.DataFrame] = None | ||||||||||
) -> Tuple[pd.DataFrame, Optional[pd.DataFrame]]: | ||||||||||
"""Suggests a new configuration. | ||||||||||
|
||||||||||
Parameters | ||||||||||
|
@@ -166,12 +188,16 @@ def _suggest(self, context: Optional[pd.DataFrame] = None) -> pd.DataFrame: | |||||||||
------- | ||||||||||
configuration : pd.DataFrame | ||||||||||
Pandas dataframe with a single row. Column names are the parameter names. | ||||||||||
|
||||||||||
metadata : pd.DataFrame | ||||||||||
Pandas dataframe with a single row containing the metadata. | ||||||||||
Column names are the budget, seed, and instance of the evaluation, if valid. | ||||||||||
""" | ||||||||||
pass # pylint: disable=unnecessary-pass # pragma: no cover | ||||||||||
|
||||||||||
@abstractmethod | ||||||||||
def register_pending(self, configurations: pd.DataFrame, | ||||||||||
context: Optional[pd.DataFrame] = None) -> None: | ||||||||||
context: Optional[pd.DataFrame] = None, metadata: Optional[pd.DataFrame] = None) -> None: | ||||||||||
"""Registers the given configurations as "pending". | ||||||||||
That is it say, it has been suggested by the optimizer, and an experiment trial has been started. | ||||||||||
This can be useful for executing multiple trials in parallel, retry logic, etc. | ||||||||||
|
@@ -181,30 +207,34 @@ def register_pending(self, configurations: pd.DataFrame, | |||||||||
configurations : pd.DataFrame | ||||||||||
Dataframe of configurations / parameters. The columns are parameter names and the rows are the configurations. | ||||||||||
context : pd.DataFrame | ||||||||||
Not Yet Implemented. | ||||||||||
Not implemented yet. | ||||||||||
metadata : pd.DataFrame | ||||||||||
Implementaton depends on instance. | ||||||||||
""" | ||||||||||
pass # pylint: disable=unnecessary-pass # pragma: no cover | ||||||||||
|
||||||||||
def get_observations(self) -> Tuple[pd.DataFrame, pd.DataFrame, Optional[pd.DataFrame]]: | ||||||||||
def get_observations(self) -> Tuple[pd.DataFrame, pd.DataFrame, Optional[pd.DataFrame], Optional[pd.DataFrame]]: | ||||||||||
""" | ||||||||||
Returns the observations as a triplet of DataFrames (config, score, context). | ||||||||||
Returns the observations as a triplet of DataFrames (config, score, context, metadata). | ||||||||||
|
||||||||||
Returns | ||||||||||
------- | ||||||||||
observations : Tuple[pd.DataFrame, pd.DataFrame, Optional[pd.DataFrame]] | ||||||||||
A triplet of (config, score, context) DataFrames of observations. | ||||||||||
A triplet of (config, score, metadata) DataFrames of observations. | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
(or, better yet, a |
||||||||||
""" | ||||||||||
if len(self._observations) == 0: | ||||||||||
raise ValueError("No observations registered yet.") | ||||||||||
configs = pd.concat([config for config, _, _ in self._observations]).reset_index(drop=True) | ||||||||||
scores = pd.concat([score for _, score, _ in self._observations]).reset_index(drop=True) | ||||||||||
configs = pd.concat([config for config, _, _, _ in self._observations]).reset_index(drop=True) | ||||||||||
scores = pd.concat([score for _, score, _, _ in self._observations]).reset_index(drop=True) | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, let's have a |
||||||||||
contexts = pd.concat([pd.DataFrame() if context is None else context | ||||||||||
for _, _, context in self._observations]).reset_index(drop=True) | ||||||||||
return (configs, scores, contexts if len(contexts.columns) > 0 else None) | ||||||||||
for _, _, context, _ in self._observations]).reset_index(drop=True) | ||||||||||
metadatas = pd.concat([pd.DataFrame() if metadata is None else metadata | ||||||||||
for _, _, _, metadata in self._observations]).reset_index(drop=True) | ||||||||||
return (configs, scores, contexts, metadatas if len(metadatas.columns) > 0 else None) | ||||||||||
|
||||||||||
def get_best_observations(self, n_max: int = 1) -> Tuple[pd.DataFrame, pd.DataFrame, Optional[pd.DataFrame]]: | ||||||||||
def get_best_observations(self, n_max: int = 1) -> Tuple[pd.DataFrame, pd.DataFrame, Optional[pd.DataFrame], Optional[pd.DataFrame]]: | ||||||||||
""" | ||||||||||
Get the N best observations so far as a triplet of DataFrames (config, score, context). | ||||||||||
Get the N best observations so far as a triplet of DataFrames (config, score, metadata). | ||||||||||
Default is N=1. The columns are ordered in ASCENDING order of the optimization targets. | ||||||||||
The function uses `pandas.DataFrame.nsmallest(..., keep="first")` method under the hood. | ||||||||||
|
||||||||||
|
@@ -215,15 +245,15 @@ def get_best_observations(self, n_max: int = 1) -> Tuple[pd.DataFrame, pd.DataFr | |||||||||
|
||||||||||
Returns | ||||||||||
------- | ||||||||||
observations : Tuple[pd.DataFrame, pd.DataFrame, Optional[pd.DataFrame]] | ||||||||||
A triplet of best (config, score, context) DataFrames of best observations. | ||||||||||
observations : Tuple[pd.DataFrame, pd.DataFrame, Optional[pd.DataFrame], Optional[pd.DataFrame]] | ||||||||||
A triplet of best (config, score, context, metadata) DataFrames of best observations. | ||||||||||
""" | ||||||||||
if len(self._observations) == 0: | ||||||||||
raise ValueError("No observations registered yet.") | ||||||||||
(configs, scores, contexts) = self.get_observations() | ||||||||||
(configs, scores, contexts, metadatas) = self.get_observations() | ||||||||||
idx = scores.nsmallest(n_max, columns=self._optimization_targets, keep="first").index | ||||||||||
return (configs.loc[idx], scores.loc[idx], | ||||||||||
None if contexts is None else contexts.loc[idx]) | ||||||||||
None if contexts is None else contexts.loc[idx], None if metadatas is None else metadatas.loc[idx]) | ||||||||||
|
||||||||||
def cleanup(self) -> None: | ||||||||||
""" | ||||||||||
|
Uh oh!
There was an error while loading. Please reload this page.