_____ ______ _ _ _
/\ |_ _| ____| | /_\ |
/ \ | | | |__ __| | | |_
/ /\ \ | | | __| / _` | | __|
/ ____ \ _| |_| |___| (_| | | |_
/_/ \_\_____|______\__,_|_|\__|
Alignment-free genome assembly polisher with an ML model trained on spaced seed hit/miss patterns.
- C++ compiler with C++17 support
- Python 3.10+
- cmake
- btllib
- libtorch for CPU
- pybind11
- ntStat
- ntCard
- ntEdit
If you would like to train new models:
For development, install pybind11-stubgen so the aiedit/core.pyi file will be updated in case of changes in the C++ bindings.
AIEdit is available on Bioconda:
conda install bioconda::aieditThis will make the aiedit command available in the environment.
Build AIEdit in the build folder by running the following in the project's root folder:
cmake -S . -B build
cmake --build build
This will put a core*.so file in the aiedit package, which can now be used by adding the project root to $PYTHONPATH and running:
python -m aieditRunning cmake --install build will install AIEdit to your Python environment's site-packages, making python -m aiedit available without requiring changes to $PYTHONPATH.
If PyTorch/libtorch are installed in a conda environment, you might have you update the CMAKE_PREFIX_PATH environment variable. To find PyTorch's CMake prefix path, run:
python -c "import torch; print(torch.utils.cmake_prefix_path)"Then, pass the result to CMake:
cmake -DCMAKE_PREFIX_PATH=<TORCH_PREFIX_PATH> -S . -B build
cmake --build buildAIEdit will run all required polishing stages given a set of reads READS and an assembly ASSEMBLY. Results will be stored in the output path specified by -o, which is the current working directory by default:
aiedit polish -r READS -a ASSEMBLYRun aiedit polish --help for more details on the input parameters.
For polishing assemblies with ONT reads, we suggest setting -y 10 -p 0.8.
AIEdit uses half of the available CPUs on the machine by default. This can be adjusted with the -t parameter.
To list available pretrained models with their configurations, run:
aiedit list_modelsThe default model supports 5bp edit windows using 3 spaced seeds (aiedit/pretrained/s3m5i5.pt). More models are available in the pretrained directory. Additionally, new models can be trained using the aiedit train command. We recommend using the default model for balanced computational performance and polishing accuracy—feel free to train and experiment with other models.
The following files are created in the output folder (specified by -o). <input_file> is replaced by the draft assembly file's name:
<input_file>-aiedit_edited.fa, polished assembly in FASTA format<input_file>-aiedit_variants.vcf, list of AIEdit's changes<input_file>-ntedit_variants.vcf, list of ntEdit's changes
After compiling the project manually in build, run:
ctest --testdir build/testsAIEdit Copyright (c) 2025-present British Columbia Cancer Agency Branch. All rights reserved.
AIEdit is released under the GNU General Public License v3
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, version 3.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.
For commercial licensing options, please contact Patrick Rebstein prebstein@bccancer.bc.ca