Motif Enrichment Positional Profiling (MEPP) quantifies a positional profile of motif enrichment along the length of DNA sequences centered on e.g. transcription start sites or transcription factor binding motifs.
To install MEPP, use pip:
pip install git+https://github.com/npdeloss/mepp@main
Or, if you only have user privileges:
pip install git+https://github.com/npdeloss/mepp@main --user
You may need to append the following to your ~/.bashrc:
export PATH="$HOME/.local/bin:$PATH"
Motif files for use with this program can be found in the data subdirectory. These are motifs from the HOMER suite in data/homer.motifs.txt, as well as a reduced-redundancy version with similar motifs clustered, allowing a faster analysis. The file data/ohler_motifs.txt contains Drosophila core promoter motifs from Ohler et al.. To get started, see our walkthrough notebook.
Command line help:
Usage: mepp [OPTIONS] Profile positional enrichment of motifs in a list of scored sequences. Generated MEPP (Motif Enrichment Positional Profile) plots. Options: --fa TEXT Path to a scored fasta file, where sequence headers are of the form: ">sequence_name sequence_score". [required] --motifs TEXT Path to a motif matrices file in JASPAR format. As a start, one can be obtained through the JASPAR website at: http://jaspar.genereg.net/downloads/ [required] --out TEXT Create this directory and write output to it. [required] --center INTEGER 0-based offset from the start of the sequence to center plots on. Default: Set the center to half the sequence length, rounded down --dgt INTEGER Percentage of sequence that can be degenerate (Not A, C, G, or T) before being rejected from the analysis. Useful for filtering out repeats. Default: 100 --perms INTEGER Number of permutations for permutation testing and confidence intervals. Can lead to significant GPU memory usage. Default: 1000 --batch INTEGER Size of batches for Tensorflow datasets. Default: 1000 --jobs INTEGER Number of jobs for CPU multiprocessing. Default: Use all cores --keepdata Set this flag to keep the Tensorflow dataset after MEPP has finished. Default: Delete the dataset after MEPP has finished. --orientations TEXT Comma-separated list of motif orientations to analyze for CPU multiprocessing. Values in list are limited to "+" (Match motif forward orientation), "-" (Match motif reverse orientation), "+/-" (Match to forward or reverse). Default: +,+/- --margin INTEGER Number of bases along either side of motif to "blur" motif matches for smoothing. Default: 2 --pcount FLOAT Pseudocount for setting motif match threshold via MOODS. Default: 0.0001 --pval FLOAT P-value for setting motif match threshold via MOODS. Default: 0.0001 --bg FLOATS Background DNA composition, for setting motif match threshold via MOODS, represented as a series of 4 floats. Default: 0.25 0.25 0.25 0.25 --ci FLOAT Confidence interval for positional profile, expressed as a percentage. Default: 95.0 --sigma FLOAT Adaptive scale for brightness of motif matches in motif heatmaps. Maximum brightness is achieved at sigma * std, where std is the standard deviation of nonzero motif match scores. Set lower for brighter pixels. Must be a positive value. Default: 0.5 --cmap TEXT Name of a matplotlib colormap. Used to color the central MEPP motif heatmap. Possible values can be viewed using matplotlib.pylot.colormaps() or at https://m atplotlib.org/stable/tutorials/colors/colorm aps.html . Default: gray_r. Set to gray to invert colors (black background). --smoothing INTEGER Factor by which to smooth motif density along ranks for visualization. This is multiplicative to smoothing that already occurs dependent on figure pixel resolution. Default: 5 --width INTEGER Width of generated MEPP plot, in inches. Default: 10 --height INTEGER Height of generated MEPP plot, in inches. Default: 10 --formats TEXT Comma-separated list of image formats for MEPP plots. Possible formats are png and svg. Default: png,svg --dpi INTEGER DPI of generated MEPP plot. Default: 300 --gjobs INTEGER Number of jobs for GPU multiprocessing. NOTE: Set this carefully to avoid jobs crowding each other out of GPU memory, causing profile generation to fail. If setting --nogpu, this will be the number of jobs used to process motifs in parallel. Default: 1 --nogpu Disable use of GPU. If setting --nogpu, --gjobs will be the number of jobs used to process motifs in parallel. --attempts INTEGER Number of attempts to retry making a plot. Default: 10 --minwait FLOAT Minimum wait between attempts to make a plot, in seconds. Default: 1.0 --maxwait FLOAT Maximum wait between attempts to make a plot, in seconds. Default: 1.0 --cmethod METHOD Clustering method for clustering MEPP profiles. For details, see "method" parameter of scipy.cluster.hierarchy.linkage. Default: average --cmetric METRIC Clustering metric for clustering MEPP profiles. For details, see "metric" parameter of scipy.cluster.hierarchy.linkage. Default: correlation --tdpi INTEGER DPI of inline plots for clustering table. Default: 100 --tformat [png|svg] Format of inline plots for clustering table. Use png for speed, svg for publication quality. Default: png --mtmethod METHOD Multiple testing method for adjusting p-values of positional correlations listed in the clustering table.For details, see "method" parameter of statsmodels.stats.multitest.multipletests. Default: fdr_by --mtalpha FLOAT Alpha (FWER, family-wise error rate) for adjusting p-values of positional correlations listed in the clustering table.For details, see "alpha" parameter of statsmodels.stats.multitest.multipletests. Default: 0.01 --thoroughmt Enables thorough multiple testing of positional correlation p-values: All p-values for all motifs at all positions will be adjusted simultaneously.Default: Thorough multiple testing is enabled --non-thoroughmt Disables thorough multiple testing of positional correlation p-values: Only extreme p-values will be adjusted for.Default: Thorough multiple testing is enabled --help Show this message and exit.
Command line help:
Usage: python -m mepp.learn_motifs [OPTIONS] Options: --fa TEXT Path to a scored fasta file, where sequence headers are of the form: ">sequence_name sequence_score". [required] --out TEXT Create this directory and write output to it. [required] --dgt FLOAT Percentage of sequence that can be degenerate (Not A, C, G, or T) before being rejected from the analysis. Useful for filtering out repeats. Default: 100 --batch INTEGER Size of batches for Tensorflow datasets. Default: 1000 --val FLOAT Fraction of data used for validation. Default: 0.10 --motifs INTEGER Number of motifs to learn. Default: 320 --length INTEGER Length of motifs to learn. Default: 8 --motif-prefix TEXT Prefix motif names with this string.Default: denovo_motif_ --model [deepbindlike|simpleconv] Type of network to use for learning motifs. Default: deepbindlike --seed INTEGER Random seed for shuffling and initialization. Default: 1000 --epochs INTEGER Maximum number of epochs for training. Default: 1000 --no-early-stopping Disable early stopping of training, to train for the maximum number of epochs. Default: Enable early stopping. --patience INTEGER Number of epochs to wait for early stopping. Default: 1000 --mindelta FLOAT Minimum delta for early stopping. Default: 0 --jobs INTEGER Number of jobs for CPU multiprocessing. Default: Use all cores --nogpu Disable use of GPU. --quiet Do not write combined motifs to stdout. Default: Write combined motifs to stdout. --help Show this message and exit.
Command line help:
Usage: python -m mepp.compare_motifs [OPTIONS] Options: --motifs TEXT Path to a motif matrices file in JASPAR format. Preferably a denovo motif matrices file. if --known- motifs is not specified, this will be compared against itself. As a start, one can be obtained through the JASPAR website at: http://jaspar.genereg.net/downloads/ [required] --out TEXT Create this directory and write output to it. [required] --known-motifs TEXT Path to a known motif matrices file in JASPAR format.As a start, one can be obtained through the JASPAR website at: http://jaspar.genereg.net/downloads/ Default: None --overlap INTEGER Minimum overlap for correlated motifs. Default: 6 --corrcoef FLOAT Minimum correlation for correlated motifs. Default: 0.6 --combine Combine motifs. Default: Do not combine motifs. --motif-prefix TEXT Prefix motif names with this string.Default: combined_motif_ --no-logos Do not render logos. Default: Render logos. --jobs INTEGER Number of jobs for CPU multiprocessing. Default: Use all cores --quiet Do not write combined motifs to stdout. Default: Write combined motifs to stdout. --help Show this message and exit.
- Free software: MIT license
- This package was developed in the lab of Christopher Benner at UCSD.
- This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.