Skip to content

Commit 5fa189d

Browse files
committed
Document advanced features
1 parent dce8341 commit 5fa189d

17 files changed

+83
-16
lines changed

deeptools/heatmapper.py

+1
Original file line numberDiff line numberDiff line change
@@ -70,6 +70,7 @@ def chopRegionsFromMiddle(exonsInput, left=0, right=0):
7070
the center point of the exons.
7171
7272
The steps are as follow:
73+
7374
1) Find the center point of the set of exons (e.g., [(0, 200), (300, 400), (800, 900)] would be centered at 200)
7475
* If a given exon spans the center point then the exon is split
7576
2) The given number of bases at the end of the left-of-center list are extracted

docs/content/advanced_features.rst

+10
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
Advanced features
2+
=================
3+
4+
Some of the features of deepTools are not self-explanatory. Below, we provide links to longer expositions on these more advanced features:
5+
6+
* :doc:`feature/blacklist`
7+
* :doc:`feature/metagene`
8+
* :doc:`feature/read_extension`
9+
* :doc:`feature/unscaled_regions`
10+
* :doc:`feature/read_offsets`

docs/content/feature/blacklist.rst

+12
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
Blacklist Regions
2+
=================
3+
4+
There are many sources of bias in ChIPseq experiments. Among the most prevalent of these is signal arising from "blacklist" regions (see `Carroll et al. <http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3989762/>`__ and the references therein for historical context). Blacklisted regions show notably enriched signal across many ChIP experiment types (e.g., regardless of what is being IPed or the experimental conditions). Including these regions can lead not only to false-positive peaks, but can also throw off between-sample normalization. An example of this is found below:
5+
6+
.. image:: ../../images/feature-blacklist0.png
7+
8+
The region on chromosome 9 starting around position 3 million marks the start of an annotated satellite repeat. As this region contains vastly more reads than expected, slight differences in enrichment here between samples can cause errors in between-sample scaling, thereby masking signal in non-repetitive regions. This can be seen in the IGV screenshot below, where the blacklisted region is just off the side of the screen.
9+
10+
.. image:: ../../images/feature-blacklist1.png
11+
12+
Note that the signal outside of the blacklisted region is slightly depressed due to the blacklisted region. Using the `--blackListFileName` option available throughout deepTools. The subtraction of these regions is accounted for in all normalizations.

docs/content/feature/metagene.rst

+12
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
Metagene analyses
2+
=================
3+
4+
By default, `computeMatrix` uses the signal over entire contiguous regions (e.g., transcripts) for computing its output. While this is typically quite useful, in case such as RNAseq the results are less than ideal. Take, for example, the gene model and coverage profile below:
5+
6+
.. image:: ../../images/feature-metagene0.png
7+
8+
If clustering were done using such blocky coverage then the results would be biased by the number of exons and their positions. Instead, it's normally desired to ignore intronic regions and instead use only the signal in exons (denoted by blocks in the gene model). This can be accomlished by using the `--metagene` option in `computeMatrix` and supplying a BED12 or GTF file as a set of regions:
9+
10+
.. image:: ../../images/feature-metagene1.png
11+
12+
Note that for GTF files the regions used to define exons can be easily modified. For example, for RiboSeq samples it's preferable to use annotated coding regions, so specifying `--exonID CDS`. Likewise, entire genes can be used rather than transcripts by specifying `--transcriptID gene --transcript_id_designator gene_id`.
+30
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
Read extension
2+
==============
3+
4+
In the majority of NGS experiment, DNA (or RNA) is fragmented into small stretches and only the ends of these fragments sequenced. For many applications, it's desirable to quantify coverage of the entire original fragments over the genome. Consequently, there is an `--extendReads` option present throughout deepTools. This works as follows:
5+
6+
Paired-end reads
7+
----------------
8+
9+
1. Regions of the genome are sampled to determine the median fragment/read length.
10+
2. The genome is subdivided into disjoint regions. Each of these regions comprises one or more bins of some desired size (specified by `-bs`).
11+
3. For each region, all alignments overlapping it are gathered. In addition, all alignments within 2000 bases are gathered, as 2000 bases is the maximum allowed fragment size.
12+
4. The resulting collection of alignments are all extended according to their fragment length, which for paired-end reads is indicated in BAM files.
13+
14+
- For singletons, the expected fragment length from step 1 is used.
15+
16+
5. For each of the extended reads, the count in each bin that it overlaps is incremented.
17+
18+
Single-end reads
19+
----------------
20+
21+
1. An extension length, L, is specified.
22+
2. The genome is subdivided into disjoint regions. Each of these regions comprises one or more bins of some desired size (specified by `-bs`).
23+
3. For each region, all alignments overlapping it are gathered. In addition, all alignments within 2000 bases are gathered, as 2000 bases is the maximum allowed fragment size.
24+
4. The resulting collection of alignments are all extended to length L.
25+
5. For each of the extended reads, the count in each bin that it overlaps is incremented.
26+
27+
Blacklisted regions
28+
-------------------
29+
30+
The question likely arises as to how alignments originating inside of blacklisted regions are handled. In short, any alignment contained completely within a blacklisted region is ignored, regardless of whether it would extend into a non-blacklisted region or not. Alignments only partially overlapping blacklisted regions are treated as normal, as are pairs of reads that span over a blacklisted region. This is primarily for the sake of performance, as otherwise each extended read would need to be checked to see if it overlaps a blacklisted region.

docs/content/feature/read_offsets.rst

+8
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
Offsetting signal to a given position
2+
=====================================
3+
4+
A growing number of experiment types need to be analyzed by focusing the signal from each alignment at a single point. As an example, RiboSeq alignments tend to be offset such that the signal pause is centered around the translation start site, an offset of around 12. Alternatively, in GROseq experiments, the pause around the TSS becomes centered by using the 1st base of each read. This can be accomplished within `bamCoverage` using the `--Offset` option. A visual example is below:
5+
6+
.. image:: ../../images/feature-offset0.png
7+
8+
The alignments shown above overlap a transcript, denoted as a blue box, which in this case represents only the coding sequence. If the alignments are from a RiboSeq experiment then the signal from each alignment should be set at the ~12th base of each alignment. The section on the right denotes the resulting signal intensity, with the expected large peak at the translation start site.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
Unscaled regions
2+
================
3+
4+
Some experiments aim to quantify the distribution of pausing of factors, such as PolII, throughout gene or transcript bodies. PolII and many other factors, show pausing (i.e., accumulation of signal) near the start/end of transcripts. As scaling is normally performed to make all regions the same length, the breadth of the paused region could be scaled differently in each transcript. This would, in turn, cause biases during clustering or other analyses. In such cases, the `--unscaled5prime` and `--unscaled3prime` options in `computeMatrix` can be used. These will prevent regions at one or both end of transcripts (or other regions) to not be excluded from scaling, thereby allowing raw signal profiles to be compared across transcripts. An example of this from `Ferrari et al. 2013 <http://www.sciencedirect.com/science/article/pii/S2211124713005603>`__ is shown below:
5+
6+
.. image:: ../../images/feature-unscaled0.png
7+

docs/content/list_of_tools.rst

+1-5
Original file line numberDiff line numberDiff line change
@@ -100,11 +100,7 @@ We offer several ways to filter those BAM files on the fly so that you don't nee
100100

101101
These parameters are optional and available throughout deepTools.
102102

103-
.. note:: In version 2.3 we introduced a sampling method to correct the effect of filtering when normalizing using
104-
``bamCoverage`` or ``bamCompare``. For previous versions, if you know that your files will be strongly affected by
105-
the filtering of duplicates or reads of low quality then consider removing
106-
those reads *before* using ``bamCoverage`` or ``bamCompare``, as the filtering
107-
by deepTools is done *after* the scaling factors are calculated!
103+
.. note:: In version 2.3 we introduced a sampling method to correct the effect of filtering when normalizing using ``bamCoverage`` or ``bamCompare``. For previous versions, if you know that your files will be strongly affected by the filtering of duplicates or reads of low quality then consider removing those reads *before* using ``bamCoverage`` or ``bamCompare``, as the filtering by deepTools is done *after* the scaling factors are calculated!
108104

109105

110106
Tools for BAM and bigWig file processing

docs/content/tools/plotHeatmap.rst

+1-2
Original file line numberDiff line numberDiff line change
@@ -118,5 +118,4 @@ we combine different colormap colors, different scales and the new `--boxAround
118118
119119
.. image:: ../../images/test_plots/ExampleHeatmap4.png
120120

121-
.. tip:: **More examples** can be found in our
122-
`Gallery <http://deeptools.readthedocs.org/en/latest/content/example_gallery.html#normalized-chip-seq-signals-and-peak-regions>`_.
121+
.. tip:: **More examples** can be found in our `Gallery <http://deeptools.readthedocs.org/en/latest/content/example_gallery.html#normalized-chip-seq-signals-and-peak-regions>`_.

docs/images/feature-blacklist0.png

4.84 KB
Loading

docs/images/feature-blacklist1.png

4.63 KB
Loading

docs/images/feature-metagene0.png

67.3 KB
Loading

docs/images/feature-metagene1.png

18.6 KB
Loading

docs/images/feature-offset0.png

1.29 KB
Loading

docs/images/feature-unscaled0.png

50.3 KB
Loading

docs/index.rst

+1
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,7 @@ Contents:
2929

3030
content/installation
3131
content/list_of_tools
32+
content/advanced_features
3233
content/example_usage
3334
content/changelog
3435
content/help_galaxy_intro

docs/source/deeptools.rst

-9
Original file line numberDiff line numberDiff line change
@@ -100,15 +100,6 @@ deeptools.mapReduce module
100100
:undoc-members:
101101
:show-inheritance:
102102

103-
104-
deeptools.readBed module
105-
------------------------
106-
107-
.. automodule:: deeptools.readBed
108-
:members:
109-
:undoc-members:
110-
:show-inheritance:
111-
112103
deeptools.utilities module
113104
--------------------------
114105

0 commit comments

Comments
 (0)