This repository is a suite of tools and models designed to predict antimicrobial resistance (AMR) using transformer-based architectures. The project uses state-of-the-art techniques in natural language processing (NLP) to analyze genetic sequences and predict resistance profiles.
- Implementation of transformer and traditional architectures tailored for AMR prediction.
- Data preprocessing pipelines for genetic sequences.
- Automatic mixed precision training for improved performance.
- Support for various datasets and easy integration of new data sources.
- Clone the repository:
git clone https://github.com/debugst1ck/TARP.git
- Navigate to the project directory:
cd TARP - (Optional, Recommended) Create and activate a virtual environment:
For Windows PowerShell:
For Unix:
Set-ExecutionPolicy Unrestricted -Scope Process python -m venv .venv .venv\Scripts\activate
python -m venv .venv source .venv/bin/activate - Install the required dependencies (use the correct index URL for your CUDA version):
pip install -e . --extra-index-url https://download.pytorch.org/whl/cu128 # For CUDA 12.8
- Prepare your dataset in the required format (FASTA files with corresponding labels).
- Run the training script with your dataset:
tarp
The codebase is structured to facilitate easy experimentation with different transformer architectures and hyperparameters. The main components include data preprocessing, model training, evaluation, and visualization of results.
A value of 1 or True indicates that the model should attend to this position. This is for the actual content of the input. A value of 0 or False indicates that the model should not attend to this position, typically because it is padding.
Class weights are calculated to address class imbalance in the dataset. The weights are inversely proportional to the frequency of each class, ensuring that the model pays more attention to minority classes during training.