Skip to content

Commit 86976be

Browse files
authored
Merge pull request #38 from cokelaer/main
add rnaseqc container and update rseqc
2 parents 2bf0d10 + f959c55 commit 86976be

File tree

10 files changed

+296
-243
lines changed

10 files changed

+296
-243
lines changed

.pre-commit-config.yaml

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
2+
files: '\.(py|rst|sh)$'
3+
fail_fast: false
4+
5+
repos:
6+
- repo: https://github.com/pre-commit/pre-commit-hooks
7+
rev: v3.2.0
8+
hooks:
9+
- id: trailing-whitespace
10+
- id: end-of-file-fixer
11+
- id: check-yaml
12+
#- id: check-executables-have-shebangs
13+
- id: check-ast
14+
15+
- repo: https://github.com/pycqa/flake8
16+
rev: 6.1.0
17+
hooks:
18+
- id: flake8
19+
args: ["-j8", "--ignore=E203,E501,W503,E722", "--max-line-length=120", "--exit-zero"]
20+
21+
- repo: https://github.com/psf/black
22+
rev: 22.10.0
23+
hooks:
24+
- id: black
25+
args: ["--line-length=120"]
26+
exclude: E501
27+
28+
- repo: https://github.com/pycqa/isort
29+
rev: 5.12.0
30+
hooks:
31+
- id: isort
32+
args: ["--profile", "black"] # solves conflicts between black and isort
33+

README.rst

Lines changed: 44 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
:alt: JOSS (journal of open source software) DOI
99

1010
.. image:: https://github.com/sequana/rnaseq/actions/workflows/main.yml/badge.svg
11-
:target: https://github.com/sequana/rnaseq/actions/workflows/main.yaml
11+
:target: https://github.com/sequana/rnaseq/actions/workflows/main.yaml
1212

1313

1414

@@ -17,7 +17,7 @@ This is is the **RNA-seq** pipeline from the `Sequana <https://sequana.readthedo
1717
:Overview: RNASeq analysis from raw data to feature counts
1818
:Input: A set of Fastq Files and genome reference and annotation.
1919
:Output: MultiQC and HTML reports, BAM and bigwig files, feature Counts, script to launch differential analysis
20-
:Status: Production.
20+
:Status: Production.
2121
:Citation(sequana): Cokelaer et al, (2017), ‘Sequana’: a Set of Snakemake NGS pipelines, Journal of Open Source Software, 2(16), 352, JOSS DOI doi:10.21105/joss.00352
2222
:Citation(pipeline):
2323
.. image:: https://zenodo.org/badge/DOI/10.5281/zenodo.4047837.svg
@@ -40,13 +40,13 @@ Usage
4040
sequana_rnaseq --help
4141
sequana_rnaseq --input-directory DATAPATH --genome-directory genome --aligner star
4242

43-
This creates a directory with the pipeline and configuration file. You will then need
43+
This creates a directory with the pipeline and configuration file. You will then need
4444
to execute the pipeline::
4545

4646
cd rnaseq
4747
sh rnaseq.sh # for a local run
4848

49-
This launch a snakemake pipeline. If you are familiar with snakemake, you can
49+
This launch a snakemake pipeline. If you are familiar with snakemake, you can
5050
retrieve the pipeline itself and its configuration files and then execute the pipeline yourself with specific parameters::
5151

5252
snakemake -s rnaseq.rules -c config.yaml --cores 4 --stats stats.txt
@@ -80,7 +80,7 @@ Or use the conda.yaml file available in this repository. If you start a new
8080
environment from scratch, those commands will create the environment and install
8181
all dependencies for you::
8282

83-
conda create --name sequana_env python 3.7.3
83+
conda create --name sequana_env python 3.7.3
8484
conda activate sequana_env
8585
conda install -c anaconda qt pyqt>5
8686
pip install sequana
@@ -100,22 +100,22 @@ To use apptainer, initialise the pipeline with the --use-singularity option and
100100
Details
101101
~~~~~~~~~
102102

103-
This pipeline runs a **RNA-seq** analysis of sequencing data. It runs in
104-
parallel on a set of input FastQ files (paired or not).
103+
This pipeline runs a **RNA-seq** analysis of sequencing data. It runs in
104+
parallel on a set of input FastQ files (paired or not).
105105
A brief HTML report is produced together with a MultiQC report.
106106

107107
This pipeline is complex and requires some expertise for the interpretation.
108-
Many online-resources are available and should help you deciphering the output.
108+
Many online-resources are available and should help you deciphering the output.
109109

110110
Yet, it should be quite straigtforward to execute it as shown above. The
111-
pipeline uses bowtie1 to look for ribosomal contamination (rRNA). Then,
111+
pipeline uses bowtie1 to look for ribosomal contamination (rRNA). Then,
112112
it cleans the data with cutapdat if you say so (your data may already be
113-
pre-processed). If no adapters are provided (default), reads are
114-
trimmed for low quality bases only. Then, mapping is performed with standard mappers such as
113+
pre-processed). If no adapters are provided (default), reads are
114+
trimmed for low quality bases only. Then, mapping is performed with standard mappers such as
115115
star or bowtie2 (--aligner option). Finally,
116116
feature counts are extracted from the previously generated BAM files. We guess
117117
the strand and save the feature counts into the directoy
118-
./rnadiff/feature_counts.
118+
./rnadiff/feature_counts.
119119

120120
The pipelines stops there. However, RNA-seq analysis are followed by a different
121121
analysis (DGE hereafter). Although the DGE is not part of the pipeline, you can
@@ -138,7 +138,7 @@ Rules and configuration details
138138
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
139139

140140
Here is the `latest documented configuration file <https://raw.githubusercontent.com/sequana/sequana_rnaseq/main/sequana_pipelines/rnaseq/config.yaml>`_
141-
to be used with the pipeline. Each rule used in the pipeline may have a section in the configuration file.
141+
to be used with the pipeline. Each rule used in the pipeline may have a section in the configuration file.
142142

143143

144144
.. warning:: the RNAseQC rule is switch off and is not currently functional in
@@ -158,6 +158,9 @@ Changelog
158158
========= ====================================================================
159159
Version Description
160160
========= ====================================================================
161+
0.19.1 * add rnaseqc container.
162+
* Update rseqc rules (redirection)
163+
* cleanup onsuccess rule
161164
0.19.0 * Refactorisation to use click
162165
0.18.1 * fastp multiqc regression. Fixed missing sample names by updating
163166
multiqc_config and adding sample names in the output filename
@@ -166,28 +169,28 @@ Version Description
166169
* BUG: Fix missing params (options) in star_mapping rule not taken
167170
into account
168171
0.17.1 * use new rulegraph / graphviz apptainer
169-
0.17.0 * fastp step changed to use sequana-wrappers. Slight change in
172+
0.17.0 * fastp step changed to use sequana-wrappers. Slight change in
170173
config file. The reverse and forward adapter options called
171174
rev and fwd have been dropped in favor of a single adapters option.
172-
v0.17.0 config and schema are not compatible with previous
175+
v0.17.0 config and schema are not compatible with previous
173176
versions.
174177
* Update singularity containers and add new one for fastp
175-
0.16.1 * fix bug in feature counts automatic strand balance detection. Was
178+
0.16.1 * fix bug in feature counts automatic strand balance detection. Was
176179
always using the stranded case (2).
177180
* add singularity workflow for testing
178181
* fix documentation in config.yaml
179-
0.16.0 * star, salmon, bam_coverage are now in sequana wrappers, updated
182+
0.16.0 * star, salmon, bam_coverage are now in sequana wrappers, updated
180183
the pipeline accordingly
181-
* updated config file and schema to include resources inside the
184+
* updated config file and schema to include resources inside the
182185
config file (so as to use new --profile option)
183186
* set singularity images in all rules
184-
* star wrappers has changed significantly to use star
187+
* star wrappers has changed significantly to use star
185188
recommandation. To keep using previous way, a legacy option
186189
is available and set to True in this version.
187190
* bamCoverage renamed in bam_coverage in the config file
188191
* multiqc_config removed redundant information and ordered
189192
the output in a coherent way (QC and then analysis)
190-
0.15.2 * Fix bowtie2 rule to use new wrappers. Use wrappers in
193+
0.15.2 * Fix bowtie2 rule to use new wrappers. Use wrappers in
191194
add_read_group and mark_duplicates
192195
0.15.1 * Adapt to new bowtie2 align wrapper
193196
0.15.0 * fix typo reported in https://github.com/sequana/rnaseq/issues/12
@@ -199,7 +202,7 @@ Version Description
199202
same genome directory.
200203
* Ribosomal is now estimated on the first 100,000 reads to speed up
201204
analysis
202-
* --indexing and --force-indexing options not required anymore.
205+
* --indexing and --force-indexing options not required anymore.
203206
Indexing will be done automatically and not redone if present.
204207
* Use of the new sequana-wrappers repository
205208
0.13.0 * Major update to use the new sequana version and the RNADiff tools.
@@ -210,7 +213,7 @@ Version Description
210213
* user interface has now a --skip-gff-check option. Better handling of
211214
input gff with more meaningful messages
212215
* integration of rseqc tool
213-
0.12.1 * indexing was always set to True in the config after 0.9.16 update.
216+
0.12.1 * indexing was always set to True in the config after 0.9.16 update.
214217
0.12.0 * BUG fix: Switch mark_duplicates correctly beore feature counts
215218
0.11.0 * rnadiff one factor is simplified
216219
* When initiating the pipeline, provide information about the GFF
@@ -226,7 +229,7 @@ Version Description
226229
created and used
227230
* fix the --do-igvtools and --do-bam-coverage with better doc
228231
0.10.0 * 9/12/2020
229-
* Fixed bug in sequana/star_indexing for small genomes (v0.9.7).
232+
* Fixed bug in sequana/star_indexing for small genomes (v0.9.7).
230233
Changed the rnaseq requirements to benefit from this bug-fix that
231234
could lead to seg fault with star aligner for small genomes.
232235
* Report improved with strand guess and plot
@@ -235,32 +238,32 @@ Version Description
235238
* In config file, bowtie section 'do' option is removed. This is now
236239
set automatically if rRNA_feature or rRNA_file is provided. This
237240
allows us to skip the rRNA mapping entirely if needed.
238-
* fastq_screen should be functional. Default behaviour is off. If
241+
* fastq_screen should be functional. Default behaviour is off. If
239242
set only phiX174 will be search for. Users should build their own
240243
configuration file.
241-
* star/bowtie1/bowtie2 have now their own sub-directories in the
242-
genome directory.
244+
* star/bowtie1/bowtie2 have now their own sub-directories in the
245+
genome directory.
243246
* added --run option to start pipeline automatically (if you know
244247
what you are doing)
245248
* rnadiff option has now a default value (one_factor)
246249
* add strandness plot in the HTML summary page
247-
0.9.19 * Remove the try/except around tolerance (guess of strandness) to
250+
0.9.19 * Remove the try/except around tolerance (guess of strandness) to
248251
make sure this is provided by the user. Final onsuccess benefits
249252
from faster GFF function (sequana 0.9.4)
250-
0.9.18 * Fix typo (regression bug) + add tolerance in schema + generic
253+
0.9.18 * Fix typo (regression bug) + add tolerance in schema + generic
251254
title in multiqc_config. (oct 2020)
252255
0.9.17 * add the *tolerance* parameter in the feature_counts rule as a user
253-
parameter (config and pipeline).
254-
0.9.16 * Best feature_counts is now saved into rnadiff/feature_counts
256+
parameter (config and pipeline).
257+
0.9.16 * Best feature_counts is now saved into rnadiff/feature_counts
255258
directory and rnadiff scripts have been updated accordingly
256259
* the most probable feature count option is now computed more
257260
effectivily and incorporated inside the Snakemake pipeline (not in
258-
the onsuccess) so that multiqc picks the best one (not the 3
261+
the onsuccess) so that multiqc picks the best one (not the 3
259262
results)
260263
* the target.txt file can be generated inside the pipeline if user
261264
fill the rnadiff/conditions section in the config file
262265
* indexing options are filled automatically when calling
263-
sequana_rnaseq based on the presence/absence of the index
266+
sequana_rnaseq based on the presence/absence of the index
264267
of the aligner being used.
265268
* salmon now integrated and feature counts created (still WIP in
266269
sequana)
@@ -283,13 +286,13 @@ Version Description
283286
analysis
284287
0.9.11 * Automatic guessing of the strandness of the experiment
285288
0.9.10 * Fix multiqc for RNAseQC rule
286-
0.9.9 * Fix RNAseQC rule, which is now available.
289+
0.9.9 * Fix RNAseQC rule, which is now available.
287290
* Fix ability to use existing rRNA file as input
288291
0.9.8 * Fix indexing for bowtie1 to not be done if aligner is different
289292
* add new options: --feature-counts-options and --do-rnaseq-qc,
290293
--rRNA-feature
291294
* Based on the input GFF, we now check the validity of the rRNA
292-
feature and feature counts options to check whether the feature
295+
feature and feature counts options to check whether the feature
293296
exists in the GFF
294297
* schema is now used to check the config file values
295298
* add a data test for testing and documentation
@@ -298,25 +301,25 @@ Version Description
298301
* Possiblity to switch off cutadapt section
299302
* Fixing bowtie2 rule in sequana and update the pipeline accordingly
300303
* Include a schema file
301-
* output-directory parameter renamed into output_directory (multiqc
304+
* output-directory parameter renamed into output_directory (multiqc
302305
section)
303306
* handle stdout correctly in fastqc, bowtie1, bowtie2 rules
304307
0.9.5 * Fixed https://github.com/sequana/sequana/issues/571
305308
* More cutadapt commands and sanity checks
306309
* Fixed bowtie2 options import in rnaseq.rules
307-
0.9.4
308-
0.9.3 if a fastq_screen.conf is provided, we switch the fastqc_screen
310+
0.9.4
311+
0.9.3 if a fastq_screen.conf is provided, we switch the fastqc_screen
309312
section ON automatically
310313
0.9.0 **Major refactorisation.**
311314

312-
* remove sartools, kraken rules.
315+
* remove sartools, kraken rules.
313316
* Indexing is now optional and can be set in the configuration.
314317
* Configuration file is simplified with a general section to enter
315-
the genome location and aligner.
318+
the genome location and aligner.
316319
* Fixed rules in sequana (0.8.0) that were not up-to-date with
317320
several executables used in the pipeline including picard,
318321
fastq_screen, etc. See Sequana Changelog for details with respect
319-
to rules changes.
320-
* Copying the feature counts in main directory ready to use for
322+
to rules changes.
323+
* Copying the feature counts in main directory ready to use for
321324
a differential analysis.
322325
========= ====================================================================

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ build-backend = "poetry.core.masonry.api"
44

55
[tool.poetry]
66
name = "sequana-rnaseq"
7-
version = "0.19.0"
7+
version = "0.19.1"
88
description = "A RNAseq pipeline from raw reads to feature counts"
99
authors = ["Sequana Team"]
1010
license = "BSD-3"

sequana_pipelines/rnaseq/cluster_config.json

Lines changed: 0 additions & 34 deletions
This file was deleted.

sequana_pipelines/rnaseq/config.yaml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ apptainers:
2626
igvtools: "https://zenodo.org/record/7022635/files/igvtools_2.12.0.img"
2727
graphviz: "https://zenodo.org/record/7928262/files/graphviz_7.0.5.img"
2828
multiqc: "https://zenodo.org/record/10205070/files/multiqc_1.16.0.img"
29-
29+
rnaseqc: "https://zenodo.org/record/5799564/files/rnaseqc_2.35.0.img"
3030

3131
# =========================================== Sections for the users
3232

@@ -370,6 +370,8 @@ rnaseqc:
370370
do: false
371371
gtf_file:
372372
options: --coverage
373+
resources:
374+
mem: 8G
373375

374376

375377
# if be_file not provided, try to create one on the fly

0 commit comments

Comments
 (0)