Skip to content

debugst1ck/TARP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

43 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🦠 TARP: Transformers for Antimicrobial Resistance Prediction

Python Version GitHub License GitHub Issues GitHub Stars GitHub Forks

This repository is a suite of tools and models designed to predict antimicrobial resistance (AMR) using transformer-based architectures. The project uses state-of-the-art techniques in natural language processing (NLP) to analyze genetic sequences and predict resistance profiles.

✨ Features

  • Implementation of transformer and traditional architectures tailored for AMR prediction.
  • Data preprocessing pipelines for genetic sequences.
  • Automatic mixed precision training for improved performance.
  • Support for various datasets and easy integration of new data sources.

🚀 Getting Started

  1. Clone the repository:
    git clone https://github.com/debugst1ck/TARP.git
  2. Navigate to the project directory:
    cd TARP
  3. (Optional, Recommended) Create and activate a virtual environment: For Windows PowerShell:
    Set-ExecutionPolicy Unrestricted -Scope Process
    python -m venv .venv
    .venv\Scripts\activate
    For Unix:
    python -m venv .venv
    source .venv/bin/activate
  4. Install the required dependencies (use the correct index URL for your CUDA version):
    pip install -e . --extra-index-url https://download.pytorch.org/whl/cu128 # For CUDA 12.8
  5. Prepare your dataset in the required format (FASTA files with corresponding labels).
  6. Run the training script with your dataset:
    tarp

👨‍💻 Developer's notes

The codebase is structured to facilitate easy experimentation with different transformer architectures and hyperparameters. The main components include data preprocessing, model training, evaluation, and visualization of results.

🧠 Attention Mask

A value of 1 or True indicates that the model should attend to this position. This is for the actual content of the input. A value of 0 or False indicates that the model should not attend to this position, typically because it is padding.

🏷️ Class Weights

Class weights are calculated to address class imbalance in the dataset. The weights are inversely proportional to the frequency of each class, ensuring that the model pays more attention to minority classes during training.

$$ \text{weight}_i = \frac{N}{C \cdot n_i} $$

About

🦠 Antimicrobial resistance prediction using transformer models.

Topics

Resources

License

Stars

Watchers

Forks