A common biological question when dealing with ChIP-seq data is to identify transcription factor binding sites. This problem is called motif discovery. Motif discovery algorithms produce a large number of motifs (e.g., MEME) and they don’t evaluate motifs systematically. Therefore, several methods (e.g., PMID: 24335146) have been developed to select high-confidence motifs.
The motif selection problem seeks to identify a minimal set of regulatory motifs that characterize sequences of interest (e.g. ChIP-seq binding regions). The output motifs represent putative binding sites for primary transcription factors (ChIP-ed factors) and co-factors.
Please see the documentation for more details.
The following example has been tested in Ubuntu x64.
Step 0. Download the GLPK library
https://www.gnu.org/software/glpk/
Step 1. Compile the C code
.. code:: bash
cd RILP
g++ -std=c++11 RIPL_simplex2_07_31.cc -o msdc -L/PATH/glpk/lib -I/PATH/glpk/include -lglpk
Step 2. Run the RILP method
.. code:: bash
./msdc foreground.list background.list motif_mapping.fimo
Output
The output is a list of selected motifs (i.e., subset of the input motifs) defined in motif_mapping.fimo.
This folder contains all raw inputs (i.e., binary matrices) and the Fisher exact test p-values for every motif.
Please use the detailed installation steps below
https://www.anaconda.com/distribution/
conda create -n set_cover python=2.7
conda activate set_cover or source activate set_cover
- Pandas:
conda install -c anaconda pandas - Joblib:
conda install -c anaconda joblib - scikit-learn:
conda install -c anaconda scikit-learn
cd Evams
Note that Evams imports conf.py from the same directory. You can add Evams to your env path by doing export PATH=$PATH:/path_to_Evams_folder/
Evams2 -h
Evams2 -jid test -confFile example.conf
In example.conf, please change input to individual files in the Data folder.
Configuration files (.conf) used in this paper are provided in the conf_paper folder. In the same folder, running log files including CPU time and memory used are also provided.
-jid is job ID. An output folder will be created using job ID, in which all results will be put.