This repo provides a PyTorch re-implementation of the Basenji2 model published in "Cross-species regulatory sequence activity prediction" by David Kelley. This implementation was checked by verifying that the Tensorflow and PyTorch version yielded the same output on random data. Small deviations were found, likely due to differences in the underlying algorithms used by Tensorflow and PyTorch (e.g. different matrix multiplication algorithms). In addition, Qixiu Du kindly computed evaluation metrics and found that the PyTorch re-implementation achieves competitive performance on real data, further validating the port.
pip install basenji2-pytorch
import torch
from basenji2_pytorch import Basenji2, basenji2_params, basenji2_weights # or PLBasenji2 to also use training parameters from Kelley et al. 2020
# to use a headless model e.g. for transfer learning
# basenji2_params["model"].pop("head_human", None)
basenji2 = Basenji2(basenji2_params["model"])
basenji2.load_state_dict(torch.load(basenji2_weights()))basenji2_paramsis a dictionary of both training and model parameters matching the implementation in Kelley et al. 2020basenji2_weightsis a function that uses pooch to download weights from Zenodo and return the path as a string.Basenji2is a PyTorch nn.Module that can be initialized from the model parameters ofbasenji2_paramsPLBasenji2is a PyTorch Lightning module that can be initialized frombasenji2_paramsto match both the training and architectural parameters of Basenji2