-
Notifications
You must be signed in to change notification settings - Fork 11
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Solve StrandPhaseR docker image installation (minor)
- Loading branch information
1 parent
278f52c
commit fb04b0a
Showing
115 changed files
with
16,416 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,114 @@ | ||
name: Tests | ||
|
||
on: | ||
push: | ||
branches: | ||
- smk_workflow_catalog | ||
# paths: | ||
# - "github-actions-runner/Dockerfile" | ||
|
||
jobs: | ||
build_container: | ||
name: Build and push image | ||
runs-on: ubuntu-20.04 | ||
env: | ||
IMAGE_NAME: mosaicatcher-pipeline | ||
|
||
if: github.ref == 'refs/heads/master' | ||
steps: | ||
- uses: actions/checkout@v2 | ||
|
||
- name: Read upstream tag without version | ||
id: gettag | ||
run: echo "::set-output name=tag::$(head -n 1 github-actions-runner/Dockerfile | awk -F':' '{print $2}' | awk -F'-' 'BEGIN { OFS="-" } {$NF=""; print $0}')" | ||
|
||
- name: Read internal update version | ||
id: getversion | ||
run: echo "::set-output name=version::$(grep 'ARG RUNNER_VERSION' github-actions-runner/Dockerfile | awk -F'=' '{print $2}')" | ||
|
||
- name: Build Image | ||
id: build-image | ||
uses: redhat-actions/buildah-build@v2 | ||
with: | ||
image: ${{ env.IMAGE_NAME }} | ||
tags: latest dev 1.3 | ||
dockerfiles: | | ||
./github-actions-runner/Dockerfile | ||
- name: Push To DockerHub | ||
id: push-to-dockerhub | ||
uses: redhat-actions/push-to-registry@v2 | ||
with: | ||
image: ${{ steps.build-image.outputs.image }} | ||
tags: ${{ steps.build-image.outputs.tags }} | ||
registry: docker.io/weber8thomas | ||
username: ${{ secrets.DOCKER_USERNAME }} | ||
password: ${{ secrets.DOCKER_TOKEN }} | ||
|
||
- name: Use the image | ||
run: echo "New images has been pushed to ${{ steps.push-to-quay.outputs.registry-paths }}" | ||
# jobs: | ||
test_workflow: | ||
runs-on: ubuntu-latest | ||
steps: | ||
- uses: actions/checkout@v1 | ||
# - name: Setup Snakemake environment | ||
# run: | | ||
# export PATH="/usr/share/miniconda/bin:$PATH" | ||
# conda config --set channel_priority strict | ||
# conda install -c conda-forge -q mamba | ||
# # ensure that mamba is happy to write into the cache | ||
# sudo chown -R runner:docker /usr/share/miniconda/pkgs/cache | ||
# # additionally add singularity | ||
# # TODO remove version constraint: needed because 3.8.7 fails with missing libz: | ||
# # bin/unsquashfs: error while loading shared libraries: libz.so.1: cannot open shared object file: No such file or directory | ||
# # mamba create -y -n mosaicatcher_env -c conda-forge -c bioconda snakemake pandas pysam tqdm imagemagick "singularity<=3.8.6" | ||
# # source activate mosaicatcher_env | ||
# # conda list | ||
# # which python | ||
# # python -c 'import pysam; print(pysam)' | ||
- name: Downloading data | ||
uses: snakemake/snakemake-github-action@v1.22.0 | ||
with: | ||
directory: .test | ||
snakefile: Snakefile | ||
stagein: "mamba env remove -n snakemake && mamba create -y -n snakemake -c conda-forge -c bioconda unzip snakemake pandas pysam tqdm imagemagick && source activate snakemake" | ||
args: "--cores 1 --config mode=download_data dl_external_files=True dl_bam_example=True input_bam_location=TEST_EXAMPLE_DATA/" | ||
- name: Test data | ||
uses: snakemake/snakemake-github-action@v1.22.0 | ||
with: | ||
directory: .test | ||
snakefile: Snakefile | ||
stagein: 'mamba env remove -n snakemake && mamba create -y -n snakemake -c conda-forge -c bioconda snakemake pandas pysam tqdm imagemagick "singularity<=3.8.6" && source activate snakemake && ls -lh' | ||
args: "--cores 1 --config plot=True input_bam_location=TEST_EXAMPLE_DATA/ output_location=TEST_OUTPUT/ --use-conda --use-singularity" | ||
|
||
formatting: | ||
runs-on: ubuntu-latest | ||
steps: | ||
- uses: actions/checkout@v1 | ||
- name: Formatting | ||
uses: github/super-linter@v3.16.1 | ||
env: | ||
VALIDATE_ALL_CODEBASE: false | ||
DEFAULT_BRANCH: smk_workflow_catalog | ||
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} | ||
VALIDATE_SNAKEMAKE_SNAKEFMT: true | ||
|
||
linting: | ||
runs-on: ubuntu-latest | ||
steps: | ||
- uses: actions/checkout@v1 | ||
- name: Downloading data | ||
uses: snakemake/snakemake-github-action@v1.22.0 | ||
with: | ||
directory: .test | ||
snakefile: Snakefile | ||
stagein: "mamba env remove -n snakemake && mamba create -y -n snakemake -c conda-forge -c bioconda unzip snakemake pandas pysam tqdm imagemagick && source activate snakemake && ls -l && pwd" | ||
args: "--cores 1 --config mode=download_data dl_external_files=True dl_bam_example=True input_bam_location=TEST_EXAMPLE_DATA/ --touch" | ||
- name: Linting | ||
uses: snakemake/snakemake-github-action@v1.22.0 | ||
with: | ||
directory: ".test" | ||
snakefile: Snakefile | ||
stagein: "mamba env remove -n snakemake && mamba create -y -n snakemake -c conda-forge -c bioconda unzip snakemake pandas pysam tqdm imagemagick && source activate snakemake && ls -l && pwd" | ||
args: "--lint" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,87 @@ | ||
# Hidden folders & files | ||
.DS_Store | ||
.vscode/ | ||
.snakemake/ | ||
.panoptes.db | ||
._.DS_Store | ||
.pytest_cache/ | ||
.condarc | ||
files.txt | ||
workflow/.conda/ | ||
|
||
# Tmp files & execution outputs | ||
*.pyc | ||
*.zip | ||
*.gz | ||
*.db | ||
*.png | ||
*.svg | ||
*.tsv | ||
*.csv | ||
|
||
# Links | ||
*@ | ||
|
||
# Mosaicatcher folders | ||
chroms/ | ||
counts/ | ||
log/ | ||
plots/ | ||
segmentation/ | ||
segmentation2/ | ||
snv_calls/ | ||
strand_states/ | ||
sv_probabilities/ | ||
workflow/config/config_df.tsv | ||
workflow/config/exclude_file.txt | ||
workflow/config/exclude_file | ||
|
||
# Docs | ||
docs/build/ | ||
build/ | ||
*.html | ||
workflow/static/ | ||
|
||
# Python | ||
__pycache__ | ||
workflow/scripts/__pycache__ | ||
|
||
# Zenodo | ||
workflow/sandbox.zenodo.org/ | ||
sandbox.zenodo.org/ | ||
TEST_OUTPUT/ | ||
TEST_OUTPUT | ||
workflow/test.txt | ||
.snakemake | ||
|
||
# Exceptions | ||
!docs/images/*.png | ||
!workflow/data/segdups/segDups_hg38_UCSCtrack.bed.gz | ||
!workflow/data/bin_200kb_all.bed | ||
!config/config.yaml | ||
!config/* | ||
|
||
# Dev | ||
discover_big_files_git.sh | ||
builds/ | ||
workflow/report_TALL/ | ||
*.bam | ||
*.bai | ||
workflow/TEST_EXAMPLE_DATA/ | ||
TEST_EXAMPLE_DATA | ||
TEST_EXAMPLE_DATA/ | ||
workflow/logs/ | ||
workflow/errors/ | ||
|
||
# git | ||
|
||
## Others | ||
# afac/ | ||
# workflow/.snakemake | ||
# bam/ | ||
|
||
## Personal note: files/folders specific to dev branch | ||
# .gitlab-ci.yml // to use with LFS example data in dev branch | ||
# singularity/ folder | ||
# afac/ debugging & dev folder |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
# configuration of display in snakemake workflow catalog: https://snakemake.github.io/snakemake-workflow-catalog | ||
|
||
usage: | ||
mandatory-flags: # optional definition of additional flags | ||
desc: # describe your flags here in a few sentences (they will be inserted below the example commands) | ||
flags: | ||
- "snakemake" | ||
- "mosaicatcher" | ||
- "single-cell-genomics" | ||
- "strand-seq" | ||
- "structural-variants" | ||
- "sv-calling" | ||
# put your flags here | ||
software-stack-deployment: # definition of software deployment method (at least one of conda, singularity, or singularity+conda) | ||
conda: false # whether pipeline works with --use-conda | ||
singularity: false # whether pipeline works with --use-singularity | ||
singularity+conda: true # whether pipeline works with --use-singularity --use-conda | ||
report: true # add this to confirm that the workflow allows to use 'snakemake --report report.zip' to generate a report containing all results and explanations |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
## 1.3 (2022-06-02) | ||
|
||
* Check if SM tag are corresponding to folder name [View](https://git.embl.de/tweber/mosaicatcher-update/-/commit/a4611b70a03675ee5db7816728b28eb9a9875e5c) | ||
|
||
|
||
|
||
## 1.2.3 (2022-05-18) | ||
|
||
* Correct issue [View](https://git.embl.de/tweber/mosaicatcher-update/-/commit/932d2529815cc31a57f60ca860fadf65212738f4) | ||
* Small correction [View](https://git.embl.de/tweber/mosaicatcher-update/-/commit/5ed61ec9d20692d4e14394baea5636a21ae9dfc1) | ||
* Correct SMK download BAM example files [View](https://git.embl.de/tweber/mosaicatcher-update/-/commit/d84904a5c1ec9f4901f7dc69d7b879692c1266c6) | ||
* Update README.md [View](https://git.embl.de/tweber/mosaicatcher-update/-/commit/881c6612b31e74efbb854b85d4e5328e300e7c2e) | ||
|
||
|
||
## 1.2.2 (2022-05-18) | ||
|
||
|
||
* Handle of multi samples in the same folder now Change the way to retrieve the selected cell list [View](https://git.embl.de/tweber/mosaicatcher-update/-/commit/3f0dc28ec22d88def269c215ef551800b8b1f7e5) | ||
|
||
|
||
## 1.2.1 (2022-05-17) | ||
|
||
* Removing files .gitattributes .gitlab-ci.yml [View](https://git.embl.de/tweber/mosaicatcher-update/-/commit/b6a46ff3d7dd8978743be9c4ee801535aac03eab) | ||
* Download example & external data Implemented rules based on snakemake.remote.HTTP function that can be called through config.yaml / CLI arguments Update config.yaml file Update rules/examples.smk Update Snakefile Update README.md [View](https://git.embl.de/tweber/mosaicatcher-update/-/commit/a835f79928bf6ec5c5b93678bd89bc54c59e3206) | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
MIT License | ||
|
||
Copyright (c) 2022 Thomas Weber (thomas.weber@embl.de) | ||
|
||
Permission is hereby granted, free of charge, to any person obtaining a copy | ||
of this software and associated documentation files (the "Software"), to deal | ||
in the Software without restriction, including without limitation the rights | ||
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell | ||
copies of the Software, and to permit persons to whom the Software is | ||
furnished to do so, subject to the following conditions: | ||
|
||
The above copyright notice and this permission notice shall be included in all | ||
copies or substantial portions of the Software. | ||
|
||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR | ||
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, | ||
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE | ||
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER | ||
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, | ||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE | ||
SOFTWARE. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,114 @@ | ||
data:image/s3,"s3://crabby-images/a3ad8/a3ad80f3ab624031095eabb35f8a6337531b94f1" alt="MosaiCatcher" | ||
|
||
|
||
Structural variant calling from single-cell Strand-seq data [Snakemake](https://github.com/snakemake/snakemake) pipeline. | ||
|
||
|
||
# Overview of this workflow | ||
|
||
This workflow uses [Snakemake](https://github.com/snakemake/snakemake) to | ||
execute all steps of MosaiCatcher in order. The starting point are single-cell | ||
BAM files from Strand-seq experiments and the final output are SV predictions in | ||
a tabular format as well as in a graphical representation. To get to this point, | ||
the workflow goes through the following steps: | ||
|
||
1. Binning of sequencing reads in genomic windows of 100kb via [mosaic](https://github.com/friendsofstrandseq/mosaicatcher) | ||
2. Strand state detection | ||
3. [Optional]Normalization of coverage with respect to a reference sample | ||
4. Multi-variate segmentation of cells ([mosaic](https://github.com/friendsofstrandseq/mosaicatcher)) | ||
5. Haplotype resolution via [StrandPhaseR](https://github.com/daewoooo/StrandPhaseR) | ||
6. Bayesian classification of segmentation to find SVs using MosaiClassifier | ||
7. Visualization of results using custom R plots | ||
|
||
|
||
# Quick Start | ||
|
||
1. Install [Singularity](https://www.sylabs.io/guides/3.0/user-guide/) | ||
2. To prevent conda channel errors | ||
``` | ||
conda config --set channel_priority | ||
``` | ||
3. Create a dedicated conda environment | ||
``` | ||
conda create -n mosaicatcher_env -c conda-forge -c bioconda snakemake pandas pysam imagemagick tqdm && conda activate mosaicatcher_env | ||
``` | ||
4. Clone the repository | ||
``` | ||
git clone https://github.com/friendsofstrandseq/mosaicatcher-pipeline.git && cd mosaicatcher-pipeline | ||
``` | ||
5. Download test and reference data | ||
``` | ||
snakemake -c1 --config mode=download_data dl_external_files=True dl_bam_example=True input_bam_location=TEST_EXAMPLE_DATA/ | ||
``` | ||
6. Run on example data on only one small chromosome (`<disk>` must be replaced by your disk letter/name, `/g` or `/scratch` at EMBL for example) | ||
``` | ||
snakemake --cores 12 --config mode=mosaiclassifier plot=True input_bam_location=TEST_EXAMPLE_DATA/ output_location=TEST_OUTPUT/ chromosomes="[chr21]" --use-conda --use-singularity --singularity-args "-B /<disk>:/<disk>" --latency-wait 60 | ||
``` | ||
|
||
7. Generate report on example data | ||
``` | ||
snakemake --cores 12 --config mode=mosaiclassifier plot=True input_bam_location=TEST_EXAMPLE_DATA/ output_location=TEST_OUTPUT/ chromosomes="[chr21]" --use-conda --use-singularity --singularity-args "-B /<disk>:/<disk>" --latency-wait 60 --report <REPORT.zip> | ||
``` | ||
|
||
|
||
8. Start running your own analysis | ||
``` | ||
snakemake --cores 12 --config mode=mosaiclassifier plot=True input_bam_location=<INPUT_DATA_FOLDER> output_location=<OUTPUT_DATA_FOLDER> --use-conda --use-singularity --singularity-args "-B /<disk>:/<disk>" --latency-wait 60 | ||
``` | ||
9. Generate report | ||
``` | ||
snakemake --cores 12 --config mode=mosaiclassifier plot=True input_bam_location=<INPUT_DATA_FOLDER> output_location=<OUTPUT_DATA_FOLDER> --use-conda --use-singularity --singularity-args "-B /<disk>:/<disk>" --latency-wait 60 --report <REPORT.zip> | ||
``` | ||
|
||
|
||
|
||
|
||
# Documentation | ||
|
||
* [Usage](docs/Usage.md) | ||
* [Parameters & input](docs/Parameters.md) | ||
* [Output](docs/Output.md) (#TODO) | ||
|
||
|
||
|
||
# 📆 Roadmap | ||
|
||
## Technical-related features | ||
|
||
- [x] Zenodo automatic download of external files + indexes ([1.2.1](https://github.com/friendsofstrandseq/mosaicatcher-pipeline/releases/tag/1.2.1)) | ||
- [x] Multiple samples in the parent folder ([1.2.2](https://github.com/friendsofstrandseq/mosaicatcher-pipeline/releases/tag/1.2.2)) | ||
- [x] Automatic testing of BAM SM tag compared to sample folder name ([1.2.3](https://github.com/friendsofstrandseq/mosaicatcher-pipeline/releases/tag/1.2.3)) | ||
- [x] On-error/success e-mail ([1.3](https://github.com/friendsofstrandseq/mosaicatcher-pipeline/releases/tag/1.3)) | ||
- [x] HPC execution (slurm profile for the moment) ([1.3](https://github.com/friendsofstrandseq/mosaicatcher-pipeline/releases/tag/1.3)) | ||
- [ ] Plotting options (enable/disable segmentation back colors) | ||
- [ ] Full singularity image with preinstalled conda envs | ||
- [ ] Portable Encapsulated Project compliant | ||
- [ ] Single BAM folder with side config file | ||
## Bioinformatic-related features | ||
|
||
- [ ] Change of reference genome (currently only GRCh38) | ||
- [ ] Upstream QC pipeline and FastQ handle | ||
- [ ] Pooling samples | ||
- [ ] Self-handling of low-coverage cells | ||
|
||
## Small issues to fix | ||
|
||
- [ ] Move pysam / SM tag comparison script to snakemake rule | ||
|
||
|
||
# 🛑 Troubleshooting & Current limitations | ||
|
||
- Do not change the structure of your input folder after running the pipeline, first execution will build a config dataframe file (`OUTPUT_DIRECTORY/config/config.tsv`) that contains the list of cells and the associated paths | ||
- Do not change the list of chromosomes after a first execution (i.e: first execution using `count` mode on `chr21`, second execution using `segmentation` mode on all chromosomes) | ||
- ~~Pipeline is unstable on **male** samples (LCL sample for example) for the moment due to the impossibility to run strandphaser (only one haplotype for the X chrom)~~ That was solved based on [Hufsah Ashraf](https://github.com/orgs/friendsofstrandseq/people/Hufsah-Ashraf) and [Wolfram Höps](https://github.com/orgs/friendsofstrandseq/people/WHops) work allowing to determine automatically sample sex and use [snakemake checkpoint](https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#data-dependent-conditional-execution) that allow data-depdendent conditional execution. Thus, initial list of chromosomes was updated regarding the samples sex in order to bypass chrX & chrY for male sample, as both are present in a single haplotype. | ||
|
||
|
||
# 📕 References | ||
|
||
|
||
> Strand-seq publication: Falconer, E., Hills, M., Naumann, U. et al. DNA template strand sequencing of single-cells maps genomic rearrangements at high resolution. Nat Methods 9, 1107–1112 (2012). https://doi.org/10.1038/nmeth.2206 | ||
> scTRIP/MosaiCatcher original publication: Sanders, A.D., Meiers, S., Ghareghani, M. et al. Single-cell analysis of structural variations and complex rearrangements with tri-channel processing. Nat Biotechnol 38, 343–354 (2020). https://doi.org/10.1038/s41587-019-0366-x | ||
|
Oops, something went wrong.