A Python library for benchmarking gene function prediction.
Latest PyPI release
pip install geneeval
From source
# Install poetry for your system: https://python-poetry.org/docs/#installation
curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | python
# Clone and move into the repo
git clone https://github.com/BaderLab/GeneEval.git
cd GeneEval
# Install the package with poetry
poetry install
If you plan on evaluating fixed-length feature vectors (see Usage), please install with pip install "geneeval[features]"
(or poetry install -E "features"
if installing from source).
First, download the benchmark with the prepare
command
geneeval prepare "./benchmark.json"
There are two ways to run the evaluation, depending on your method.
If your method produces a fixed-length feature vector for each gene ID in the benchmark, collect these in a comma-separated file, e.g.
Q8W5R2, 0.2343, -0.1242, 0.5431, -0.3475, 0.9373
Q99732, -0.9323, 0.2212, -0.4331, -0.8634, 0.8373
P83774, 0.5633, -0.6242, 0.3723, -0.2375, -0.1673
Q1ENB6, 0.1433, -0.3242, 0.5323, -0.9975, -0.4573
Q9XF19, 0.5621, -0.4272, 0.9743, -0.1373, -0.2173
You can prepare a
.csv
,.tsv
,.txt
(separated by spaces) or a.json
file (where the vectors are keyed by gene IDs). We will correctly parse the file based on its file extension.
and then call the evaluate features
command
geneeval evaluate features "./features.csv"
These features will be used as input to simple classifiers, which will be evaluated with a grid search over the benchmark tasks.
For all other methods, you simply need to produce predictions for each task in the benchmark that you wish to evaluate on. Predictions should be collected in a .json
file keyed by task name, data partition, and gene ID, e.g.
{
"subcellular_localization": {
"train": {
"Q8W5R2": "M",
"Q99732": "M",
"P83774": "S"
},
"valid": {
"Q1ENB6": "S"
},
"test": {
"Q9XF19": "S"
}
}
}
We make no assumptions about how these predictions are obtained.
then, the evaluate predictions
command can be used to obtain a score on the tasks
geneeval predictions "./predictions.json"