This repository provides tools for testing and optimizing the models found in ASReview via the Optuna package. It includes a Makita template that will generate a folder infrastructure with a jobs file to run the optimizations for a specific model.
To get started, clone this repository:
git clone https://github.com/asreview/asreview-optuna.git
Make sure you have all dependencies installed:
pip install -r requirements.txt
Simply execute the main.py
file:
python ./src/main.py
And, to see the results, start up the dashboard:
optuna-dashboard sqlite:///src/db.sqlite3
Two options here:
- A hosted, centralized DB
- A local DB
- Create a PostgreSQL DB
- Get the full URI using exo cli on a local machine
exo dbaas -z [DB ZONE] show [DB NAME] --uri
- Add the IP addresses from your study and dashboard servers to the IP filter
- Create a new instance (e.g., Ubuntu 24.04 Standard->Small, 50GIB)
- Make sure to set your own SSH key
- Update and reboot
sudo apt update && sudo apt upgrade
andsudo reboot
- Install docker using the official docker install instructions
- Check installation:
docker compose version
- Create dir and move into it
mkdir optuna-dashboard && cd optuna-dashboard/
- Create nginx.conf
nano nginx.conf
(example indashboard/nginx.conf
) - Install deps
sudo apt install -y apache2-utils
- Create htpasswd file
sudo htpasswd -c ./htpasswd admin
- Create docker-compose.yml
nano docker-compose.yml
(example indeployment/dashboard/docker-compose.yml
, make sure to fill in the DB URI) - Start docker using docker-compose.yml
docker-compose up -d
- Create a new instance (e.g., Ubuntu 24.04 CPU->Mega, 50GIB)
- Make sure to set your own SSH key
- Make sure to set the
asreview-and-optuna-dashboard
security group
- Update and reboot
sudo apt update && sudo apt upgrade
andsudo reboot
- Clone this repo
git clone https://github.com/asreview/asreview-optuna.git
- Move into dir
cd asreview-optuna
- Pull and checkout the correct study branch
git pull && git checkout [BRANCH_NAME]
- Install venv
sudo apt install python3.12-venv
- Create Python venv
python3 -m venv .venv
- Activate venv
source .venv/bin/activate
- Install Python packages
pip3 install -r requirements.txt
- Create dataset pickles
python3 ./src/feature_matrix_scripts/tfidf.py
(± 1.5 minutes) - Set
DB_URI
environment variableexport DB_URI=[FULL DB URI]
- Create a tmux environment so optuna keeps running when we close the connection
tmux new -s optuna
In theoptuna
tmux env run the following commands to start the study:source .venv/bin/activate
python3 src/main.py
- Detach from the tmux environment using
ctrl
+
b
followed byd
(you can always reattach usingtmux attach -t optuna
)
- You are all set! Check the dashboard on your local machine through a browser:
[Exoscale instance ip]:8080
- You can see CPU usage using
htop
- Create a new instance (e.g., Ubuntu 24.04 CPU->Mega, 50GIB)
- Make sure to set your own SSH key
- Make sure to set the
asreview-and-optuna-dashboard
security group
- Update and reboot
sudo apt update && sudo apt upgrade
andsudo reboot
- Clone this repo
git clone https://github.com/asreview/asreview-optuna.git
- Move into dir
cd asreview-optuna
- Install venv
sudo apt install python3.12-venv
- Create Python venv
python3 -m venv .venv
- Activate venv
source .venv/bin/activate
- Install Python packages
pip3 install -r requirements.txt
- Create dataset pickles
python3 ./src/feature_matrix_scripts/tfidf.py
(± 1.5 minutes) - Set your simulation parameters in
main.py
using a cli editor such asnano main.py
- Create a tmux environment so optuna keeps running when we close the connection
tmux new -s optuna
In theoptuna
tmux env run the following commands to start the study:source .venv/bin/activate
python3 src/main.py
- Detach from the tmux environment using
ctrl
+
b
followed byd
(you can always reattach usingtmux attach -t optuna
)
- Create a tmux environment for the dashboard
tmux new -s dashboard
In thedashboard
tmux env run the following commands to start the dashboard:optuna-dashboard sqlite:///src/db.sqlite3 --host 0.0.0.0
- Detach from the tmux environment using
ctrl
+
b
followed byd
(you can always reattach usingtmux attach -t dashboard
)
- You are all set! Check the dashboard on your local machine through a browser:
[Exoscale instance ip]:8080
- You can see CPU usage using
htop
VERSION
= Version number, reflected in the studynameMETRIC
= The metric used for optimization. Options:"loss"
: Loss"ndcg"
: Gain
STUDY_SET
= The combinations priors for synergy datasets. Options:"demo"
: 2 prior combinations per synergy dataset (14 * 2 simulations)"full"
: 10 prior combinations per synergy dataset (24 * 10 simulations)
CLASSIFIER_TYPE
= The ASReview2 classifier to use. Options:"nb"
: Naive-Bayes"log"
: Logistic Classifier"svm"
: SVM"rf"
: Random Forest
FEATURE_EXTRACTOR_TYPE
= The ASReview2 feature extractor to use. Options:"tfidf"
: tfidf"onehot"
: onehot
PICKLE_FOLDER_PATH
= Path to optional preprocessed feature matrices that can be created usingoptuna/feature_matrix_scripts/tfidf.py
PRE_PROCESSED_FMS
= Flag to decide whether or not to use these preprocessed FMs, or to generate them on the fly.PARALLELIZE_OBJECTIVE
= Flag to decide whether to parallelize the objective functionAUTO_SHUTDOWN
= Flag to decide whether or not to shut down after finishing a study (useful for exoscale)
OPTUNA_N_TRIALS
= Number of trials Optuna should runOPTUNA_TIMEOUT
= Time in seconds, after which the current trial is cleanly finished and the study is wrapped upOPTUNA_N_JOBS
= Number of Optuna trials to run in parallel (currently decided byPARALLELIZE_OBJECTIVE
)
MIN_TRIALS
= Number of trials before the stopping condition will be checked- If
curr_trial
>=MIN_TRIALS
-> check stopping condition
- If
N_HISTORY
= How far should the stopping condition look back?STOPPING_THRESHOLD
= Threshold for checking whether to stop the study or not
If you have any questions or would like to contribute, please open an issue in the repository's issues section.
This project is licensed under the MIT License. See the LICENSE file for details.
The ASReview team.
This extension is part of the ASReview project (asreview.ai). It is maintained by the maintainers of ASReview LAB. See ASReview LAB for contact information and more resources.