SQUID (Surrogate Quantitative Interpretability for Deepnets) is a Python suite to interpret sequence-based deep learning models for regulatory genomics data with domain-specific surrogate models. For installation instructions, tutorials, and documentation, please refer to the SQUID website, https://squid-nn.readthedocs.io/. For an extended discussion of this approach and its applications, please refer to our paper:
- Seitz, E.E., McCandlish, D.M., Kinney, J.B., and Koo P.K. Interpreting cis-regulatory mechanisms from genomic deep neural networks using surrogate models. Nat Mach Intell (2024). https://doi.org/10.1038/s42256-024-00851-5
With Anaconda sourced, create a new environment via the command line:
conda create --name squid python==3.7.2
Next, activate this environment via conda activate squid
, and install the following packages:
pip install squid-nn
pip install logomaker
pip install mavenn --upgrade
Finally, when you are done using the environment, always exit via conda deactivate
.
SQUID has been tested on Mac and Linux operating systems. Typical installation time on a normal computer is less than 5 minutes.
If you have any issues installing SQUID, please see:
- https://squid-nn.readthedocs.io/en/latest/installation.html
- https://github.com/evanseitz/squid-nn/issues
For issues installing MAVE-NN, please see:
Older DNNs may require inference via Tensorflow 1.x or related packages not supported by MAVE-NN. Users will need to run SQUID piecewise within separate environments:
- Tensorflow 1.x environment for generating in silico MAVE data
- Tensorflow 2.x and Python>=3.72 environment for training MAVE-NN surrogate models
An example of this workflow using BPNet is provided in the examples/
folder.
SQUID provides a simple interface that takes as input a sequence-based deep-learning model (e.g., a DNN), which is used as an oracle to generate an in silico MAVE dataset representing a localized region of sequence space. The MAVE dataset can then be fit using a domain-specific surrogate model, with the resulting parameters visualized to reveal the cis-regulatory mechanisms driving model performance.
Google Colab examples for applying SQUID on previously-published deep learning models are available at the following links:
- Additive (local) surrogate modeling with DeepSTARR
- Pairwise (local) surrogate modeling with ResidualBind-32
- Variant effect (local) prediction with DeepSTARR–Kipoi
- Basic linear modeling using MAVE-NN, LIME and RidgeCV
Python script examples are provided in the examples/
folder for locally running SQUID and exporting outputs to file. Additional dependencies for these examples may be required and outlined at the top of each script. Examples include:
- Variant effect (local) prediction with DeepSTARR–Kipoi
- Additive (global) surrogate modeling with BPNet–Kipoi
- Pairwise (global) surrogate modeling with BPNet–Kipoi
As well, the squid-manuscript repository contains examples to reproduce results in the manuscript, including the application of SQUID on other DNNs such as ENFORMER
Expected run time for the "Variant effect (local) prediction with DeepSTARR–Kipoi" demo (above) is 4 minutes using Google Colab V100 GPU.
If this code is useful in your work, please cite our paper.
@article{seitz2023_squid,
author = {Evan E Seitz and David M McCandlish and Justin B Kinney and Peter K Koo},
title = {Interpreting cis-regulatory mechanisms from genomic deep neural networks using surrogate models},
year = {2024},
doi = {10.1038/s42256-024-00851-5},
URL = {https://doi.org/10.1038/s42256-024-00851-5},
journal = {Nature Machine Intelligence}
}
Copyright (C) 2022–2023 Evan Seitz, David McCandlish, Justin Kinney, Peter Koo
The software, code sample and their documentation made available on this website could include technical or other mistakes, inaccuracies or typographical errors. We may make changes to the software or documentation made available on its web site at any time without prior notice. We assume no responsibility for errors or omissions in the software or documentation available from its web site. For further details, please see the LICENSE file.