Skip to content

BioinformaticsToolsmith/EnhancerMatcher

Repository files navigation

EnhancerMatcher

Copyright (C) 2025 Luis M. Solis, William L. Melendez, Shantanu Hemantrao Fuke, Sayantain Paul, Anthony B. Garza, Rolando Garcia, Mark S. Halfon, and Hani Z. Girgis

Academic use: Affero General Public License version 1.

Any restrictions to use for profit or non-academics: Alternative commercial license is required.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Please contact Dr. Hani Z. Girgis (hzgirgis@buffalo.edu) if you need more information.

EnhancerMatcher is a computation tool for assessing the similarity among three sequences taking into account their enhancer activities. This repository contains a notebook and scripts in Python that are used for our tool. EnhancerMatcher uses a deep convolutional neural network classifier. This classifier is trained on the CATlas Project dataset (snATAC-seq transcribed human enhancers).

EnhancerMatcher is an updated approach to our EnhancerTracker Tool.

Files:

Models: This folder contains the network models used by EnhancerMatcher. The conv_model is used for enhancer predictions, while both cam_model and class_model are used to generate class activation maps.

Output: This folder stores the outputs generated by EnhancerMatcher, including enhancer predictions and class activation maps.

Test_Input: This folder contains test input files for EnhancerMatcher. These files demonstrate the required input format and can be used to test if the tool is functioning correctly. input1.fasta contains the two enhancers that are active in the same cell type in FASTA format. input2.fasta contains a list of sequences in FASTA format that will be evaluated by EnhancerMatcher.

EnhancerMatcher.ipynb: This Jupyter notebook contains the code to run EnhancerMatcher. If you are comfortable using Jupyter notebooks, you can modify the parameters in the first few cells and execute the notebook to generate an output evaluation from EnhancerMatcher.

Tool:

EnhancerMatcher.py: This python code contains the code to run EnhancerTracker. This code is to be executed via terminal, it takes two inputs of fasta files along with a input for whether the user wants class activation maps generated.

Requirements:

EnhancerTracker uses several libraries:

Python version: 3.10.13 (main, Sep 11 2023, 13:44:35) [GCC 11.2.0]

TensorFlow version: 2.13.0

Biopython version: 1.83

NumPy version: 1.24.3

Matplotlib version: 3.8.3

NOTE: These are the version the program was created on, future versions of these libraries may or may not work.

To Run Tool:

  1. Clone EnhancerMatcher and head to where it was cloned.

  2. Make sure EnhancerMatcher is unzipped.

  3. Run EnhancerMatcher.py

    EnhancerMatcher.py two_similar_enhancers.fa sequences.fa

  4. In the output folder, the output file is called Model_Output.txt

    Inside contains the given sequence and the percentage of similarity to the first two similar enhancers.

  5. If you want a CAM model generated for the input sequence run:

    EnhancerMatcher.py two_similar_enhancers.fa sequences.fa --cam

  6. In the output folder, the generated CAM is called sequence_CAM.pdf

    Opening it will show a heatmap for the given sequence, the dark red regions show the main area that influenced EnhancerMatcher's final decision. Please read the main paper for more details.

If you wish to run EnhancerMatcher via the jupyter notebook:

  1. Open EnhancerMatcher.ipynb

  2. Locate the third cell then locate and edit the following parameters:

    similar_sequences_file = directory_of_input/two_similar_enhancers.fa

    all_sequences_file = directory_of_input/sequences.fa

  3. If you want a CAM model generated then locate the second cell and locate output_cam_pdf.

    Set this parameter to output_cam_pdf = True

  4. If you want to change the output directory locate and edit output_dir with your output directory.

  5. Once you edit the parameters, run the entire notebook and the outputs will be generated in the output directory.

To Run our Tests:

  1. Look inside the Test_Input folder, inside are two fasta files:

    input1.fasta = Includes two similar human enhancers in fasta format

    input2.fasta = Includes ten sequences in fasta format, the first five are similar enhancers while the last five are non-enhancers.

  2. In the main directory run EnhancerMatcher.py

    EnhancerMatcher.py Test_Input/input1.fasta Test_Input/input2.fasta

    If you want to generate a CAM output for each sequence run:

    EnhancerMatcher.py Test_Input/input1.fasta Test_Input/input2.fasta --cam

  3. Inside the Output folder will be the results and CAM models for each given sequence.

  4. If you want to use the jupyter notebook then open EnhancerMatcher.ipynb

  5. By default the similar_sequences_file and all_sequences_file should already be set to the test cases.

  6. Set output_cam_pdf to true if you want to generate the CAM models.

  7. Run the entire notebook and the outputs will be generated in the Output folder.

About

An updated approach to our EnhancerTracker Tool

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published