asreview-optuna

This repository provides tools for testing and optimizing the models found in ASReview via the Optuna package. It includes a Makita template that will generate a folder infrastructure with a jobs file to run the optimizations for a specific model.

Installation

To get started, clone this repository:

git clone https://github.com/asreview/asreview-optuna.git

Make sure you have all dependencies installed:

pip install -r requirements.txt

Run Local

Simply execute the main.py file:

python ./src/main.py

And, to see the results, start up the dashboard:

optuna-dashboard sqlite:///src/db.sqlite3

Run on Exoscale

Two options here:

A hosted, centralized DB
A local DB

Exoscale Hosted DB

To setup a DB

Create a PostgreSQL DB
Get the full URI using exo cli on a local machine exo dbaas -z [DB ZONE] show [DB NAME] --uri
Add the IP addresses from your study and dashboard servers to the IP filter

To start optuna-dashboard docker

Create a new instance (e.g., Ubuntu 24.04 Standard->Small, 50GIB)
- Make sure to set your own SSH key
Update and reboot sudo apt update && sudo apt upgrade and sudo reboot
Install docker using the official docker install instructions
Check installation: docker compose version
Create dir and move into it mkdir optuna-dashboard && cd optuna-dashboard/
Create nginx.conf nano nginx.conf (example in dashboard/nginx.conf)
Install deps sudo apt install -y apache2-utils
Create htpasswd file sudo htpasswd -c ./htpasswd admin
Create docker-compose.yml nano docker-compose.yml (example in deployment/dashboard/docker-compose.yml, make sure to fill in the DB URI)
Start docker using docker-compose.yml docker-compose up -d

To Start a Study

Create a new instance (e.g., Ubuntu 24.04 CPU->Mega, 50GIB)
- Make sure to set your own SSH key
- Make sure to set the asreview-and-optuna-dashboard security group
Update and reboot sudo apt update && sudo apt upgrade and sudo reboot
Clone this repo git clone https://github.com/asreview/asreview-optuna.git
Move into dir cd asreview-optuna
Pull and checkout the correct study branch git pull && git checkout [BRANCH_NAME]
Install venv sudo apt install python3.12-venv
Create Python venv python3 -m venv .venv
Activate venv source .venv/bin/activate
Install Python packages pip3 install -r requirements.txt
Create dataset pickles python3 ./src/feature_matrix_scripts/tfidf.py (± 1.5 minutes)
Set DB_URI environment variable export DB_URI=[FULL DB URI]
Create a tmux environment so optuna keeps running when we close the connection tmux new -s optuna In the optuna tmux env run the following commands to start the study:
1. source .venv/bin/activate
2. python3 src/main.py
3. Detach from the tmux environment using ctrl + b followed by d (you can always reattach using tmux attach -t optuna)
You are all set! Check the dashboard on your local machine through a browser: [Exoscale instance ip]:8080
You can see CPU usage using htop

Local DB

Create a new instance (e.g., Ubuntu 24.04 CPU->Mega, 50GIB)
- Make sure to set your own SSH key
- Make sure to set the asreview-and-optuna-dashboard security group
Update and reboot sudo apt update && sudo apt upgrade and sudo reboot
Clone this repo git clone https://github.com/asreview/asreview-optuna.git
Move into dir cd asreview-optuna
Install venv sudo apt install python3.12-venv
Create Python venv python3 -m venv .venv
Activate venv source .venv/bin/activate
Install Python packages pip3 install -r requirements.txt
Create dataset pickles python3 ./src/feature_matrix_scripts/tfidf.py (± 1.5 minutes)
Set your simulation parameters in main.py using a cli editor such as nano main.py
Create a tmux environment so optuna keeps running when we close the connection tmux new -s optuna In the optuna tmux env run the following commands to start the study:
1. source .venv/bin/activate
2. python3 src/main.py
3. Detach from the tmux environment using ctrl + b followed by d (you can always reattach using tmux attach -t optuna)
Create a tmux environment for the dashboard tmux new -s dashboard In the dashboard tmux env run the following commands to start the dashboard:
1. optuna-dashboard sqlite:///src/db.sqlite3 --host 0.0.0.0
2. Detach from the tmux environment using ctrl + b followed by d (you can always reattach using tmux attach -t dashboard)
You are all set! Check the dashboard on your local machine through a browser: [Exoscale instance ip]:8080
You can see CPU usage using htop

Variables

Study variables

VERSION = Version number, reflected in the studyname
METRIC = The metric used for optimization. Options:
- "loss": Loss
- "ndcg": Gain
STUDY_SET = The combinations priors for synergy datasets. Options:
- "demo": 2 prior combinations per synergy dataset (14 * 2 simulations)
- "full": 10 prior combinations per synergy dataset (24 * 10 simulations)
CLASSIFIER_TYPE = The ASReview2 classifier to use. Options:
- "nb": Naive-Bayes
- "log": Logistic Classifier
- "svm": SVM
- "rf": Random Forest
FEATURE_EXTRACTOR_TYPE = The ASReview2 feature extractor to use. Options:
- "tfidf": tfidf
- "onehot": onehot
PICKLE_FOLDER_PATH = Path to optional preprocessed feature matrices that can be created using optuna/feature_matrix_scripts/tfidf.py
PRE_PROCESSED_FMS = Flag to decide whether or not to use these preprocessed FMs, or to generate them on the fly.
PARALLELIZE_OBJECTIVE = Flag to decide whether to parallelize the objective function
AUTO_SHUTDOWN = Flag to decide whether or not to shut down after finishing a study (useful for exoscale)

Optuna variables

OPTUNA_N_TRIALS = Number of trials Optuna should run
OPTUNA_TIMEOUT = Time in seconds, after which the current trial is cleanly finished and the study is wrapped up
OPTUNA_N_JOBS = Number of Optuna trials to run in parallel (currently decided by PARALLELIZE_OBJECTIVE)

Early stopping condition variables

MIN_TRIALS = Number of trials before the stopping condition will be checked
- If curr_trial >= MIN_TRIALS -> check stopping condition
N_HISTORY = How far should the stopping condition look back?
STOPPING_THRESHOLD = Threshold for checking whether to stop the study or not

Questions and Contributions

If you have any questions or would like to contribute, please open an issue in the repository's issues section.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Authors

The ASReview team.

This extension is part of the ASReview project (asreview.ai). It is maintained by the maintainers of ASReview LAB. See ASReview LAB for contact information and more resources.

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
.github/workflows		.github/workflows
analysis		analysis
deployment		deployment
images		images
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Repository files navigation

asreview-optuna

Installation

Run Local

Run on Exoscale

Exoscale Hosted DB

To setup a DB

To start optuna-dashboard docker

To Start a Study

Local DB

Variables

Study variables

Optuna variables

Early stopping condition variables

Questions and Contributions

License

Authors

About

Uh oh!

Releases

Sponsor this project

Uh oh!

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

Uh oh!

License

asreview/asreview-optuna

Folders and files

Latest commit

History

Repository files navigation

asreview-optuna

Installation

Run Local

Run on Exoscale

Exoscale Hosted DB

To setup a DB

To start optuna-dashboard docker

To Start a Study

Local DB

Variables

Study variables

Optuna variables

Early stopping condition variables

Questions and Contributions

License

Authors

About

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages