libreoffice-ci

GSoC Project: LibreOffice CI Test Selection with Machine Learning

The goal of this project is to select unit tests based on (patch,test) pair. Three models (testlabelselect, testfailure, testoverall) are trained to predict unit tests results given a patch on different levels.

The work is based on Mozilla's bugbug and rust-code-analysis.

Models

testlabelselect model predicts the failing probability of each unit test given the patch.

	Fail (Predicted)	Pass (Predicted)
Fail (Actual)	3860	203
Pass (Actual)	191593	1109768

testfailure model predicts the overall failing probability of a patch based on patch features only.

	Fail (Predicted)	Pass (Predicted)
Fail (Actual)	614	527
Pass (Actual)	2155	4863

testoverall model improves upon testfailure by using testlabelselect predictions to predict whether a patch will fail any unit test.

	Fail (Predicted)	Pass (Predicted)
Fail (Actual)	810	331
Pass (Actual)	2413	4605

A smart inference is built based on testlabelselect and testoverall predictions. By setting a threshold for the number of failed unit tests, 91% of failures can be captured, while reducing computation by 57%.

	Fail (Predicted)	Pass (Predicted)
Fail (Actual)	10617	1054
Pass (Actual)	30103	39815

Currently, the smart inference is integrated into Jenkins to save computation. If a patch is likely to fail any unit test, the sequential fast track will be run because it is assumed that the patch will fail some unit tests and there is no need to run everything. If it is likely to pass, the normal track will be run to ensure code correctness.

testlabelselect is not directly used to select unit tests because it is not able to capture all failures, about 5% failures will escape and it could cause severe problem.

Environment

Install build-essential and zstd:

sudo apt install build-essential
sudo apt install zstd

Clone libreoffice:

git clone https://gerrit.libreoffice.org/core libreoffice

Install rust:

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
export PATH="~/.cargo/bin:$PATH"

Install rust-code-analysis:

cargo install rust-code-analysis-cli rust-code-analysis-web

Install conda:

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh

Clone libreoffice-ci:

git clone https://github.com/baolef/libreoffice-ci.git
cd libreoffice-ci

Install Python dependencies:

conda env create -f environment.yml
conda activate libreoffice-ci

Data

To extract features for past gerrit pushes, extract data/jenkinsfullstats.csv from data/jenkinsfullstats.csv.xz first, and then run:

python dataset/mining.py --path ../libreoffice

To extract all unit tests, extract pushes features data/commits.json first, and then run:

python dataset/mapping.py

To extract features for unit tests, extract pushes features data/commits.json and data/tests.json first, and then run:

python dataset/test_history.py --path data/commits.json

To convert one database format (eg. data/commits.json) into another (eg. data/commits.pickle.zstd):

python dataset/convert.py data/commits.json data/commits.pickle.zstd

Training

To train a model (eg. testlabelselect, testoverall) after extracting necessary data:

python train.py testlabelselect
python train.py testoverall

Training a model with full dataset may be time and memory consuming, --limit argument can be used to train a subset:

python train.py testlabelselect --limit 16384

Detailed training scripts are available for ungrouped data scripts/train.sh and grouped data scripts/train_group.sh.

Inference

To inference a model (eg. testlabelselect) after training necessary models (eg.testlabelselect, testoverall) for a commit hash (eg. a772976f047882918d5386a3ef9226c4aa2aa118):

python test.py testlabelselect --revision a772976f047882918d5386a3ef9226c4aa2aa118

If a commit hash is not specified, it will perform inference on the last commit.

Detailed inference script is available in scripts/test.sh.

Name		Name	Last commit message	Last commit date
Latest commit History 118 Commits
data		data
dataset		dataset
models		models
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
demo.py		demo.py
environment.yml		environment.yml
feature_importance.png		feature_importance.png
test.py		test.py
testfailure_metrics.json		testfailure_metrics.json
testfailuremodel		testfailuremodel
testlabelselect_metrics.json		testlabelselect_metrics.json
testlabelselectmodel		testlabelselectmodel
testlabelselectmodel_data_y_pred		testlabelselectmodel_data_y_pred
testoverall_metrics.json		testoverall_metrics.json
testoverallmodel		testoverallmodel
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

libreoffice-ci

Models

Environment

Data

Training

Inference

About

Uh oh!

Releases 2

Packages

Languages

License

baolef/libreoffice-ci

Folders and files

Latest commit

History

Repository files navigation

libreoffice-ci

Models

Environment

Data

Training

Inference

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages