focus-opt is a Python package for performing multi-fidelity hyperparameter optimization on machine learning models. It implements optimization algorithms such as Hill Climbing and Genetic Algorithms with support for multi-fidelity evaluations. This allows for efficient exploration of hyperparameter spaces by evaluating configurations at varying levels of fidelity, balancing computational cost and optimization accuracy.
The package is designed to be flexible and extensible, enabling users to define custom hyperparameter spaces and evaluation functions for different machine learning models. In this guide, we'll demonstrate how to install and use focus-opt with a Decision Tree Classifier on the Breast Cancer Wisconsin dataset.
You can install focus-opt directly from PyPI:
pip install focus-optAlternatively, if you want to work with the latest version from the repository:
git clone https://github.com/eliottkalfon/focus_opt.git
cd focus_opt
pip install .It's recommended to use a virtual environment to manage dependencies.
Create a virtual environment using venv:
python -m venv venvActivate the virtual environment:
-
On Windows:
venv\Scripts\activate
-
On Unix or Linux:
source venv/bin/activate
Below is an example of how to use focus-opt to perform hyperparameter optimization on a Decision Tree Classifier using both Hill Climbing and Genetic Algorithms.
import logging
from typing import Dict, Any
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import StratifiedKFold
from sklearn.tree import DecisionTreeClassifier
# Import classes from focus_opt package
from focus_opt.hp_space import (
HyperParameterSpace,
CategoricalHyperParameter,
OrdinalHyperParameter,
ContinuousHyperParameter,
)
from focus_opt.optimizers import HillClimbingOptimizer, GeneticAlgorithmOptimizer
# Set up logging
logging.basicConfig(level=logging.INFO)
# Define the hyperparameter space for the Decision Tree Classifier
hp_space = HyperParameterSpace("Decision Tree Hyperparameter Space")
hp_space.add_hp(CategoricalHyperParameter(name="criterion", values=["gini", "entropy"]))
hp_space.add_hp(CategoricalHyperParameter(name="splitter", values=["best", "random"]))
hp_space.add_hp(
OrdinalHyperParameter(name="max_depth", values=[None] + list(range(1, 21)))
)
hp_space.add_hp(
ContinuousHyperParameter(
name="min_samples_split", min_value=2, max_value=20, is_int=True
)
)
hp_space.add_hp(
ContinuousHyperParameter(
name="min_samples_leaf", min_value=1, max_value=20, is_int=True
)
)
hp_space.add_hp(
ContinuousHyperParameter(name="max_features", min_value=0.0, max_value=1.0)
)
# Load the Breast Cancer dataset
data = load_breast_cancer()
X, y = data.data, data.target
# Define the evaluation function
def dt_evaluation(config: Dict[str, Any], fidelity: int) -> float:
"""
Evaluation function for a Decision Tree Classifier with cross-validation.
Args:
config (Dict[str, Any]): Hyperparameter configuration.
fidelity (int): Fidelity level (index of the cross-validation fold).
Returns:
float: Accuracy for the specified cross-validation fold.
"""
logging.info(f"Evaluating config: {config} at fidelity level: {fidelity}")
# Initialize the classifier with the given hyperparameters
clf = DecisionTreeClassifier(random_state=42, **config)
# Stratified K-Fold Cross-Validation
skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
# Get the train and test indices for the specified fold
for fold_index, (train_index, test_index) in enumerate(skf.split(X, y)):
if fold_index + 1 == fidelity:
X_train, X_test = X[train_index], X[test_index]
y_train, y_test = y[train_index], y[test_index]
clf.fit(X_train, y_train)
score = clf.score(X_test, y_test)
logging.info(f"Score for config {config} at fold {fidelity}: {score}")
return score
raise ValueError(f"Invalid fidelity level: {fidelity}")
# Instantiate the Hill Climbing Optimizer
hill_climbing_optimizer = HillClimbingOptimizer(
hp_space=hp_space,
evaluation_function=dt_evaluation,
max_fidelity=5, # Number of cross-validation folds
maximize=True, # We aim to maximize accuracy
log_results=True,
warm_start=20, # Number of initial configurations to explore
random_restarts=5, # Number of random restarts to avoid local optima
)
# Run the Hill Climbing optimization
best_candidate_hill_climbing = hill_climbing_optimizer.optimize(budget=500)
print(
f"Best candidate from Hill Climbing: {best_candidate_hill_climbing.config} "
f"with score: {best_candidate_hill_climbing.evaluation_score}"
)
# Instantiate the Genetic Algorithm Optimizer
ga_optimizer = GeneticAlgorithmOptimizer(
hp_space=hp_space,
evaluation_function=dt_evaluation,
max_fidelity=5, # Number of cross-validation folds
maximize=True, # We aim to maximize accuracy
population_size=20, # Size of the population in each generation
crossover_rate=0.8, # Probability of crossover between parents
mutation_rate=0.1, # Probability of mutation in offspring
elitism=1, # Number of top individuals to carry over to the next generation
tournament_size=3, # Number of individuals competing in tournament selection
min_population_size=5, # Minimum population size to maintain diversity
log_results=True,
)
# Run the Genetic Algorithm optimization
best_candidate_ga = ga_optimizer.optimize(budget=500)
print(
f"Best candidate from Genetic Algorithm: {best_candidate_ga.config} "
f"with score: {best_candidate_ga.evaluation_score}"
)- Importing
focus_opt: We import the necessary classes from thefocus_optpackage. - Hyperparameter Space Definition: We define a hyperparameter space that includes parameters such as
criterion,splitter,max_depth,min_samples_split,min_samples_leaf, andmax_features. - Evaluation Function: The
dt_evaluationfunction evaluates a given hyperparameter configuration using cross-validation. Thefidelityparameter corresponds to the cross-validation fold index, enabling multi-fidelity optimization. - Optimizers: We use both
HillClimbingOptimizerandGeneticAlgorithmOptimizerfromfocus_opt.optimizersto search for the best hyperparameter configuration within the defined budget. - Running the Optimization: We specify a computational budget (e.g.,
budget=500), which limits the total number of evaluations performed during the optimization process.
focus_opt allows you to define a hyperparameter space by creating instances of different hyperparameter types:
- CategoricalHyperParameter: For hyperparameters that take on a set of discrete categories.
- OrdinalHyperParameter: For hyperparameters that have an inherent order.
- ContinuousHyperParameter: For hyperparameters with continuous values, including integers and floats.
Example:
from focus_opt.hp_space import (
HyperParameterSpace,
CategoricalHyperParameter,
ContinuousHyperParameter
)
hp_space = HyperParameterSpace("Model Hyperparameters")
hp_space.add_hp(
CategoricalHyperParameter(
name="activation_function",
values=["relu", "tanh", "sigmoid"]
)
)
hp_space.add_hp(
ContinuousHyperParameter(
name="learning_rate",
min_value=0.0001,
max_value=0.1
)
)Your evaluation function should accept a hyperparameter configuration and a fidelity level, then return a performance score. Here's a template:
from typing import Dict, Any
def evaluation_function(config: Dict[str, Any], fidelity: int) -> float:
"""
Custom evaluation function.
Args:
config (Dict[str, Any]): Hyperparameter configuration.
fidelity (int): Fidelity level (e.g., amount of data or number of epochs).
Returns:
float: Performance score.
"""
# Implement your model training and evaluation logic here
passfrom focus_opt.optimizers import HillClimbingOptimizer
optimizer = HillClimbingOptimizer(
hp_space=hp_space,
evaluation_function=evaluation_function,
max_fidelity=10, # Adjust based on your fidelity levels
maximize=True, # Set to False if minimizing
log_results=True,
warm_start=10, # Initial random configurations
random_restarts=3, # Number of restarts to avoid local optima
)
best_candidate = optimizer.optimize(budget=100)
print(f"Best configuration: {best_candidate.config}")
print(f"Best score: {best_candidate.evaluation_score}")from focus_opt.optimizers import GeneticAlgorithmOptimizer
optimizer = GeneticAlgorithmOptimizer(
hp_space=hp_space,
evaluation_function=evaluation_function,
max_fidelity=10,
maximize=True,
population_size=50,
crossover_rate=0.7,
mutation_rate=0.1,
elitism=2,
tournament_size=5,
log_results=True,
)
best_candidate = optimizer.optimize(budget=500)
print(f"Best configuration: {best_candidate.config}")
print(f"Best score: {best_candidate.evaluation_score}")You can adjust various parameters of the optimizers to suit your needs:
-
For
HillClimbingOptimizer:warm_start: Number of random initial configurations.random_restarts: Number of times the optimizer restarts from a new random position.neighbor_selection: Strategy for selecting neighboring configurations.
-
For
GeneticAlgorithmOptimizer:population_size: Number of configurations in each generation.crossover_rate: Probability of crossover between parent configurations.mutation_rate: Probability of mutation in offspring configurations.elitism: Number of top configurations to carry over to the next generation.tournament_size: Number of configurations competing during selection.
focus_opt enables multi-fidelity optimization by allowing you to specify varying levels of fidelity in your evaluation function. This can help reduce computational costs by evaluating more configurations at lower fidelities and fewer configurations at higher fidelities.
You can define your own fidelity scheduling within your evaluation function or rely on the built-in mechanisms:
def evaluation_function(config: Dict[str, Any], fidelity: int) -> float:
# Use 'fidelity' to adjust the evaluation, such as training epochs or data size
passEnsure you have the following packages installed:
- Python: 3.8 or higher
- numpy: 1.26
These dependencies are automatically installed when you install focus_opt using pip.
Contributions are welcome! If you find a bug or have an idea for a new feature, please open an issue or submit a pull request.
To contribute:
- Fork the repository.
- Create a new branch (
git checkout -b feature/YourFeature). - Commit your changes (
git commit -am 'Add YourFeature'). - Push to the branch (
git push origin feature/YourFeature). - Open a Pull Request.
Please ensure your code adheres to the existing style standards and includes appropriate tests.
This project is licensed under the Apache License 2.0. See the LICENSE for details.
Author: Eliott Kalfon
Feel free to reach out if you have any questions or need further assistance!
Additional Notes
- Documentation: Comprehensive documentation is available on Read the Docs.
- Continuous Integration: The project uses GitHub Actions for automated testing and code quality checks.
- Code Style: The codebase follows the Black code style for consistency.