Skip to content
/ AIEdit Public

Machine learning on spaced seed hit patterns for assembly polishing

License

Notifications You must be signed in to change notification settings

bcgsc/AIEdit

Repository files navigation

           _____ ______    _ _ _            
     /\   |_   _|  ____|  | /_\ |          
    /  \    | | | |__   __| | | |_         
   / /\ \   | | |  __| / _` | | __|       
  / ____ \ _| |_| |___| (_| | | |_         
 /_/    \_\_____|______\__,_|_|\__|

Alignment-free genome assembly polisher with an ML model trained on spaced seed hit/miss patterns.

Requirements

If you would like to train new models:

For development, install pybind11-stubgen so the aiedit/core.pyi file will be updated in case of changes in the C++ bindings.

Installation

Using conda (recommended)

AIEdit is available on Bioconda:

conda install bioconda::aiedit

This will make the aiedit command available in the environment.

Manually

Build AIEdit in the build folder by running the following in the project's root folder:

cmake -S . -B build
cmake --build build

This will put a core*.so file in the aiedit package, which can now be used by adding the project root to $PYTHONPATH and running:

python -m aiedit

Running cmake --install build will install AIEdit to your Python environment's site-packages, making python -m aiedit available without requiring changes to $PYTHONPATH.

If PyTorch/libtorch are installed in a conda environment, you might have you update the CMAKE_PREFIX_PATH environment variable. To find PyTorch's CMake prefix path, run:

python -c "import torch; print(torch.utils.cmake_prefix_path)"

Then, pass the result to CMake:

cmake -DCMAKE_PREFIX_PATH=<TORCH_PREFIX_PATH> -S . -B build
cmake --build build

Usage

AIEdit will run all required polishing stages given a set of reads READS and an assembly ASSEMBLY. Results will be stored in the output path specified by -o, which is the current working directory by default:

aiedit polish -r READS -a ASSEMBLY

Run aiedit polish --help for more details on the input parameters.

For polishing assemblies with ONT reads, we suggest setting -y 10 -p 0.8.

AIEdit uses half of the available CPUs on the machine by default. This can be adjusted with the -t parameter.

Models

To list available pretrained models with their configurations, run:

aiedit list_models

The default model supports 5bp edit windows using 3 spaced seeds (aiedit/pretrained/s3m5i5.pt). More models are available in the pretrained directory. Additionally, new models can be trained using the aiedit train command. We recommend using the default model for balanced computational performance and polishing accuracy—feel free to train and experiment with other models.

Output Files

The following files are created in the output folder (specified by -o). <input_file> is replaced by the draft assembly file's name:

  • <input_file>-aiedit_edited.fa, polished assembly in FASTA format
  • <input_file>-aiedit_variants.vcf, list of AIEdit's changes
  • <input_file>-ntedit_variants.vcf, list of ntEdit's changes

Running Tests

After compiling the project manually in build, run:

ctest --testdir build/tests

License

AIEdit Copyright (c) 2025-present British Columbia Cancer Agency Branch. All rights reserved.

AIEdit is released under the GNU General Public License v3

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, version 3.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.

For commercial licensing options, please contact Patrick Rebstein prebstein@bccancer.bc.ca

About

Machine learning on spaced seed hit patterns for assembly polishing

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published