From a68b1275cadecd61323b51a93123e8b777e037c4 Mon Sep 17 00:00:00 2001 From: Dongqing Sun Date: Mon, 13 Apr 2020 01:47:13 +0800 Subject: [PATCH 1/4] Update ATAC and RNA tutorial --- .../ATAC_infrastructure_10x.md | 24 ++++++------- .../RNA_infrastructure_10x.md | 34 +++++++++---------- 2 files changed, 27 insertions(+), 31 deletions(-) diff --git a/example/ATAC_infrastructure_10x/ATAC_infrastructure_10x.md b/example/ATAC_infrastructure_10x/ATAC_infrastructure_10x.md index 7d499ef..1724306 100644 --- a/example/ATAC_infrastructure_10x/ATAC_infrastructure_10x.md +++ b/example/ATAC_infrastructure_10x/ATAC_infrastructure_10x.md @@ -20,22 +20,21 @@ $ conda activate MAESTRO Initialize the MAESTRO scATAC-seq workflow using `MAESTRO scATAC-init` command. This will install a Snakefile and a config file in this directory. ```bash $ MAESTRO scatac-init --platform 10x-genomics --species GRCh38 \ ---fastq-dir /home1/wangchenfei/Project/SingleCell/scATAC/Analysis/MAESTRO_tutorial/Data/atac_v1_pbmc_10k_fastqs --fastq-prefix atac_v1_pbmc_10k \ +--fastq-dir Data/atac_v1_pbmc_10k_fastqs --fastq-prefix atac_v1_pbmc_10k \ --cores 8 --directory 10X_PBMC_10k_MAESTRO_V110 --outprefix 10X_PBMC_10k \ --peak-cutoff 100 --count-cutoff 1000 --frip-cutoff 0.2 --cell-cutoff 50 \ ---giggleannotation /home1/wangchenfei/annotations/MAESTRO/giggle \ ---fasta /home1/wangchenfei/annotations/MAESTRO/Refdata_scATAC_MAESTRO_GRCh38_1.1.0/GRCh38_genome.fa \ ---whitelist /home1/wangchenfei/Tool/cellranger-atac-1.1.0/cellranger-atac-cs/1.1.0/lib/python/barcodes/737K-cratac-v1.txt +--giggleannotation annotations/MAESTRO/giggle \ +--fasta annotations/MAESTRO/Refdata_scATAC_MAESTRO_GRCh38_1.1.0/GRCh38_genome.fa \ +--whitelist Data/barcodes/737K-cratac-v1.txt ``` To get a full description of command-line options, please use the command `MAESTRO scatac-init -h`. ```bash usage: MAESTRO scatac-init [-h] [--platform {10x-genomics,sci-ATAC-seq,microfluidic}] - [--fastq-dir FASTQ_DIR] - [--fastq-prefix FASTQ_PREFIX] + --fastq-dir FASTQ_DIR [--fastq-prefix FASTQ_PREFIX] [--species {GRCh38,GRCm38}] [--cores CORES] - [-d DIRECTORY] [--outprefix OUTPREFIX] + [--directory DIRECTORY] [--outprefix OUTPREFIX] [--peak-cutoff PEAK_CUTOFF] [--count-cutoff COUNT_CUTOFF] [--frip-cutoff FRIP_CUTOFF] @@ -43,8 +42,8 @@ usage: MAESTRO scatac-init [-h] GIGGLEANNOTATION --fasta FASTA [--whitelist WHITELIST] [--custompeak] [--custompeak-file CUSTOMPEAK_FILE] [--shortpeak] - [--genedistance GENEDISTANCE] [--signature] - [--signature-file SIGNATURE_FILE] + [--genedistance GENEDISTANCE] + [--signature SIGNATURE] ``` Here we list all the arguments and their description. @@ -56,7 +55,7 @@ Arguments | Description `--platform` | {10x-genomics,Dropseq,Smartseq2} Platform of single cell RNA-seq. DEFAULT: 10x-genomics. `--fastq-dir` | Directory where fastq files are stored. `--fastq-prefix` | Sample name of fastq file (required for the platform of '10x-genomics' or 'sci-ATAC-seq'). When the platform is '10x-genomics', if there is a file named pbmc_1k_v2_S1_L001_I1_001.fastq.gz, the prefix is 'pbmc_1k_v2'. If the platform is 'sci-ATAC-seq', there are two ways to provide fastq files. The first is to provide pair-end sequencing results that contain two fastq files -- prefix_1.fastq and prefix_2.fastq. If in this way, the barcode for each read needs to be included in the reads ID (the first line of each read) in the format of '@ReadName:CellBarcode:OtherInformation'. For example, @rd.1:TCTCCCGCCGAGGCTGACTGCATAAGGCGAAT:SHEN-MISEQ02:1:1101:15311:1341. The other way is to provide 10x-like fastq files which should contain three fastq files -- prefix_R1.fastq, prefix_R2.fastq and prefix_R3.fastq. In this way, read1, barcode and read2 are associated with R1, R2, R3, respectively. -`--species` | {GRCh38,GRCm38} Species (GRCh38 for human and GRCm38 for mouse). DEFAULT: GRCh38. +`--species` | {GRCh38,GRCm38} Specify the genome assembly (GRCh38 for human and GRCm38 for mouse). DEFAULT: GRCh38. **Running and output arguments:** @@ -80,7 +79,7 @@ Arguments | Description Arguments | Description --------- | ----------- `--giggleannotation` | Path of the giggle annotation file required for regulator identification. Please download the annotation file from [here](http://cistrome.org/~chenfei/MAESTRO/giggle.tar.gz) and decompress it. -`--fasta` | Genome fasta file for minimap2. Users can just download the fasta file from [here](http://cistrome.org/~chenfei/MAESTRO/Refdata_scATAC_MAESTRO_GRCh38_1.1.0.tar.gz) and decompress it. For example, `--fasta Refdata_scATAC_MAESTR O_GRCh38_1.1.0/GRCh38_genome.fa`. +`--fasta` | Genome fasta file for minimap2. Users can just download the fasta file for [human](http://cistrome.org/~chenfei/MAESTRO/Refdata_scATAC_MAESTRO_GRCh38_1.1.0.tar.gz) and [mouse](http://cistrome.org/~chenfei/MAESTRO/Refdata_scATAC_MAESTRO_GRCm38_1.1.0.tar.gz) from CistromDB and decompress them. For example, `--fasta Refdata_scATAC_MAESTR O_GRCh38_1.1.0/GRCh38_genome.fa`. **Barcode library arguments, only for the platform of 'sci-ATAC-seq':** @@ -106,8 +105,7 @@ Arguments | Description Arguments | Description --------- | ----------- -`--signature` | Whether or not to provide custom cell signatures. If set, users need to provide the file location of cell signatures through `--signature-file`. By default (not set), the pipeline will use the built-in immune cell signature adapted from CIBERSORT. -`--signature-file` | If `--signature` is set, please provide the file location of custom cell signatures. The signature file is tab-separated without header. The first column is the cell type, and the second column is the signature gene. +`--signature` | Cell signature file used to annotate cell types. MAESTRO provides several sets of built-in cell signatures. Users can choose from ['human.immune.CIBERSORT', 'mouce.brain.ALLEN', 'mouse.all.facs.TabulaMuris', 'mouse.all.droplet.TabulaMuris']. Custom cell signatures are also supported. In this situation, users need to provide the file location of cell signatures, and the signature file is tab-seperated without header. The first column is cell type, and the second column is signature gene. DEFAULT: human.immune.CIBERSORT. ### Step 2. Run MAESTRO diff --git a/example/RNA_infrastructure_10x/RNA_infrastructure_10x.md b/example/RNA_infrastructure_10x/RNA_infrastructure_10x.md index 8988883..5bfaf43 100644 --- a/example/RNA_infrastructure_10x/RNA_infrastructure_10x.md +++ b/example/RNA_infrastructure_10x/RNA_infrastructure_10x.md @@ -20,22 +20,22 @@ $ conda activate MAESTRO Initialize the MAESTRO scRNA-seq workflow using `MAESTRO scrna-init` command. This will install a Snakefile and a config file in this directory. ```bash $ MAESTRO scrna-init --platform 10x-genomics --species GRCh38 \ ---fastq-dir /home1/wangchenfei/Project/SingleCell/scRNA/Analysis/MAESTRO_tutorial/Data/fastqs --fastq-prefix pbmc8k \ ---cores 8 --rseqc --directory 10X_PBMC_8k_MAESTRO_V110 --outprefix 10X_PBMC_8k \ ---mapindex /home1/wangchenfei/annotations/MAESTRO/Refdata_scRNA_MAESTRO_GRCh38_1.1.0/GRCh38_STAR_2.7.3a \ ---whitelist /home1/wangchenfei/Tool/cellranger-3.1.0/cellranger-cs/3.1.0/lib/python/cellranger/barcodes/737K-august-2016.txt \ ---umi-length 10 --method LISA --lisamode local --lisaenv lisa_update --condadir /home1/wangchenfei/miniconda3 +--fastq-dir Data/10X_PBMC_8k/fastqs --fastq-prefix pbmc8k \ +--cores 8 --rseqc --directory Analysis/10X_PBMC_8k_MAESTRO_V110 --outprefix 10X_PBMC_8k \ +--mapindex annotations/MAESTRO/Refdata_scRNA_MAESTRO_GRCh38_1.1.0/GRCh38_STAR_2.7.3a \ +--whitelist Data/barcodes/737K-august-2016.txt \ +--umi-length 10 --method LISA --lisamode local --lisaenv lisa --condadir /home1/user/miniconda3 --signature human.immune.CIBERSORT ``` To get a full description of command-line options, please use the command `MAESTRO scrna-init -h`. ```bash usage: MAESTRO scrna-init [-h] [--platform {10x-genomics,Dropseq,Smartseq2}] - [--fastq-dir FASTQ_DIR] - [--fastq-prefix FASTQ_PREFIX] + --fastq-dir FASTQ_DIR [--fastq-prefix FASTQ_PREFIX] [--fastq-barcode FASTQ_BARCODE] [--fastq-transcript FASTQ_TRANSCRIPT] [--species {GRCh38,GRCm38}] [--cores CORES] - [--rseqc] [-d DIRECTORY] [--outprefix OUTPREFIX] + [--rseqc] [--directory DIRECTORY] + [--outprefix OUTPREFIX] [--count-cutoff COUNT_CUTOFF] [--gene-cutoff GENE_CUTOFF] [--cell-cutoff CELL_CUTOFF] --mapindex MAPINDEX @@ -43,10 +43,9 @@ usage: MAESTRO scrna-init [-h] [--platform {10x-genomics,Dropseq,Smartseq2}] [--barcode-start BARCODE_START] [--barcode-length BARCODE_LENGTH] [--umi-start UMI_START] [--umi-length UMI_LENGTH] - [--method {RABIT,LISA}] [--rabitlib RABITLIB] - [--lisamode {local,web}] [--lisaenv LISAENV] - [--condadir CONDADIR] [--signature] - [--signature-file SIGNATURE_FILE] + [--method {LISA}] [--lisamode {local,web}] + [--lisaenv LISAENV] --condadir CONDADIR + [--signature SIGNATURE] ``` Here we list all the arguments and their description. @@ -60,7 +59,7 @@ Arguments | Description `--fastq-prefix` | Sample name of fastq file, only for the platform of '10x-genomics'. If there is a file named pbmc_1k_v2_S1_L001_I1_001.fastq.gz, the prefix is 'pbmc_1k_v2'. `--fastq-barcode` | Specify the barcode fastq file, only for the platform of 'Dropseq'. If there are multiple pairs of fastq, please provide a comma-separated list of barcode fastq files. For example, `--fastq-barcode test1_1.fastq,test2_1.fastq`. `--fastq-transcript` | Specify the transcript fastq file, only for the platform of 'Dropseq'. -`--species` | {GRCh38,GRCm38} Species (GRCh38 for human and GRCm38 for mouse). DEFAULT: GRCh38. +`--species` | {GRCh38,GRCm38} Specify the genome assembly (GRCh38 for human and GRCm38 for mouse). DEFAULT: GRCh38. **Running and output arguments:** @@ -84,7 +83,7 @@ Arguments | Description Arguments | Description --------- | ----------- `--mapindex` | Genome index directory for STAR. Users can just download the index file from [here](http://cistrome.org/~chenfei/MAESTRO/Refdata_scRNA_MAESTRO_GRCh38_1.1.0.tar.gz) and decompress it. Then specify the index directory for STAR, for example, `--mapindex Refdata_scRNA_MAESTRO_GRCh38_1.1.0/GRCh38_STAR_2.7.3a`. -`--rsem` | The prefix of transcript references for RSEM used by rsem-prepare-reference (Only required when the platform is Smartseq2). Users can directly download the annotation file from [here](http://cistrome.org/~chenfei/MAESTRO/giggle.tar.gz) and decompress it. Then specify the prefix for RSEM, for example, `--rsem Refdata_scRNA_MAESTRO_GRCh38_1.1.0/GRCh38_RSEM_1.3.2/GRCh38`. +`--rsem` | The prefix of transcript references for RSEM used by rsem-prepare-reference (Only required when the platform is Smartseq2). Users can directly download the reference file for [huamn](http://cistrome.org/~chenfei/MAESTRO/Refdata_scRNA_MAESTRO_GRCh38_1.1.0.tar.gz) and [mouse](http://cistrome.org/~chenfei/MAESTRO/Refdata_scRNA_MAESTRO_GRCm38_1.1.0.tar.gz) from CistromeDB and decompress them. Then specify the prefix for RSEM, for example, `--rsem Refdata_scRNA_MAESTRO_GRCh38_1.1.0/GRCh38_RSEM_1.3.2/GRCh38`. **Barcode arguments, for platform of 'Dropseq' or '10x-genomics':** @@ -109,8 +108,7 @@ Arguments | Description Arguments | Description --------- | ----------- -`--signature` | Whether or not to provide custom cell signatures. If set, users need to provide the file location of cell signatures through `--signature-file`. By default (not set), the pipeline will use the built-in immune cell signature adapted from CIBERSORT. -`--signature-file` | If `--signature` is set, please provide the file location of custom cell signatures. The signature file is tab-separated without header. The first column is the cell type, and the second column is the signature gene. +`--signature` | Cell signature file used to annotate cell types. MAESTRO provides several sets of built-in cell signatures. Users can choose from ['human.immune.CIBERSORT', 'mouce.brain.ALLEN', 'mouse.all.facs.TabulaMuris', 'mouse.all.droplet.TabulaMuris']. Custom cell signatures are also supported. In this situation, users need to provide the file location of cell signatures, and the signature file is tab-seperated without header. The first column is cell type, and the second column is signature gene. DEFAULT: human.immune.CIBERSORT. ### Step 2. Run MAESTRO @@ -289,8 +287,8 @@ To identify enriched transcription regulators is crucial to understanding gene r project = pbmc.RNA.res$RNA@project.name, method = "LISA", lisa.mode = "local", - conda.dir = "/home1/wangchenfei/miniconda3", - lisa.envname = "lisa_update", + conda.dir = "/home1/user/miniconda3", + lisa.envname = "lisa", organism = "GRCh38", top.tf = 10) > pbmc.RNA.tfs[["0"]] From 23578b2e45fd80cf4789eca2c09a7391d3062550 Mon Sep 17 00:00:00 2001 From: Dongqing Sun Date: Mon, 13 Apr 2020 01:56:02 +0800 Subject: [PATCH 2/4] Update RNA and integration tutorials --- example/Integration/Integration.md | 8 ++++---- example/RNA_infrastructure_10x/RNA_infrastructure_10x.md | 2 +- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/example/Integration/Integration.md b/example/Integration/Integration.md index dfe7ebd..0729e6f 100644 --- a/example/Integration/Integration.md +++ b/example/Integration/Integration.md @@ -13,8 +13,8 @@ $ conda activate MAESTRO ### Step 1. Configure the MAESTRO workflow Initialize the MAESTRO integration workflow using `MAESTRO scATAC-init` command. This will install a Snakefile and a config file in this directory. ```bash -$ MAESTRO integrate-init --rna-object /home1/wangchenfei/Project/SingleCell/scRNA/Analysis/MAESTRO_tutorial/10X_PBMC_8k_MAESTRO_V110/Result/Analysis/10X_PBMC_8k_scRNA_Object.rds \ ---atac-object /home1/wangchenfei/Project/SingleCell/scATAC/Analysis/MAESTRO_tutorial/10X_PBMC_10k_MAESTRO_V110/Result/Analysis/10X_PBMC_10k_scATAC_Object.rds \ +$ MAESTRO integrate-init --rna-object MAESTRO_tutorial/10X_PBMC_8k_MAESTRO_V110/Result/Analysis/10X_PBMC_8k_scRNA_Object.rds \ +--atac-object MAESTRO_tutorial/10X_PBMC_10k_MAESTRO_V110/Result/Analysis/10X_PBMC_10k_scATAC_Object.rds \ --directory 10X_PBMC_8kRNA_10kATAC_MAESTRO_V110 --outprefix 10X_PBMC_8kRNA_10kATAC ``` @@ -109,8 +109,8 @@ By default, MAESTRO will label the top 10 regulators using TF enrichment from GI cluster.2 = 0, type = "Integrated", SeuratObj = pbmc.RNA.res$RNA, - LISA.table = "/home1/wangchenfei/Project/SingleCell/scRNA/Analysis/MAESTRO_tutorial/10X_PBMC_8k_MAESTRO_V110/10X_PBMC_8k_lisa.txt", - GIGGLE.table = "/home1/wangchenfei/Project/SingleCell/scATAC/Analysis/MAESTRO_tutorial/10X_PBMC_10k_MAESTRO_V110/10X_PBMC_10k_giggle.txt", + LISA.table = "MAESTRO_tutorial/10X_PBMC_8k_MAESTRO_V110/10X_PBMC_8k_lisa.txt", + GIGGLE.table = "MAESTRO_tutorial/10X_PBMC_10k_MAESTRO_V110/10X_PBMC_10k_giggle.txt", visual.totalnumber = 100, name = "10X_PBMC_integrated_Monocyte_top") ``` diff --git a/example/RNA_infrastructure_10x/RNA_infrastructure_10x.md b/example/RNA_infrastructure_10x/RNA_infrastructure_10x.md index 5bfaf43..b33cdc5 100644 --- a/example/RNA_infrastructure_10x/RNA_infrastructure_10x.md +++ b/example/RNA_infrastructure_10x/RNA_infrastructure_10x.md @@ -83,7 +83,7 @@ Arguments | Description Arguments | Description --------- | ----------- `--mapindex` | Genome index directory for STAR. Users can just download the index file from [here](http://cistrome.org/~chenfei/MAESTRO/Refdata_scRNA_MAESTRO_GRCh38_1.1.0.tar.gz) and decompress it. Then specify the index directory for STAR, for example, `--mapindex Refdata_scRNA_MAESTRO_GRCh38_1.1.0/GRCh38_STAR_2.7.3a`. -`--rsem` | The prefix of transcript references for RSEM used by rsem-prepare-reference (Only required when the platform is Smartseq2). Users can directly download the reference file for [huamn](http://cistrome.org/~chenfei/MAESTRO/Refdata_scRNA_MAESTRO_GRCh38_1.1.0.tar.gz) and [mouse](http://cistrome.org/~chenfei/MAESTRO/Refdata_scRNA_MAESTRO_GRCm38_1.1.0.tar.gz) from CistromeDB and decompress them. Then specify the prefix for RSEM, for example, `--rsem Refdata_scRNA_MAESTRO_GRCh38_1.1.0/GRCh38_RSEM_1.3.2/GRCh38`. +`--rsem` | The prefix of transcript references for RSEM used by rsem-prepare-reference (Only required when the platform is Smartseq2). Users can directly download the reference file for [human](http://cistrome.org/~chenfei/MAESTRO/Refdata_scRNA_MAESTRO_GRCh38_1.1.0.tar.gz) and [mouse](http://cistrome.org/~chenfei/MAESTRO/Refdata_scRNA_MAESTRO_GRCm38_1.1.0.tar.gz) from CistromeDB and decompress them. Then specify the prefix for RSEM, for example, `--rsem Refdata_scRNA_MAESTRO_GRCh38_1.1.0/GRCh38_RSEM_1.3.2/GRCh38`. **Barcode arguments, for platform of 'Dropseq' or '10x-genomics':** From 66f5d1a72681dc30801a4c8e1dda988508c71c49 Mon Sep 17 00:00:00 2001 From: Dongqing Sun Date: Mon, 13 Apr 2020 01:58:06 +0800 Subject: [PATCH 3/4] Update RNA tutorial --- example/RNA_infrastructure_10x/RNA_infrastructure_10x.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/example/RNA_infrastructure_10x/RNA_infrastructure_10x.md b/example/RNA_infrastructure_10x/RNA_infrastructure_10x.md index b33cdc5..b72e89d 100644 --- a/example/RNA_infrastructure_10x/RNA_infrastructure_10x.md +++ b/example/RNA_infrastructure_10x/RNA_infrastructure_10x.md @@ -82,7 +82,7 @@ Arguments | Description Arguments | Description --------- | ----------- -`--mapindex` | Genome index directory for STAR. Users can just download the index file from [here](http://cistrome.org/~chenfei/MAESTRO/Refdata_scRNA_MAESTRO_GRCh38_1.1.0.tar.gz) and decompress it. Then specify the index directory for STAR, for example, `--mapindex Refdata_scRNA_MAESTRO_GRCh38_1.1.0/GRCh38_STAR_2.7.3a`. +`--mapindex` | Genome index directory for STAR. Users can just download the index file for [human](http://cistrome.org/~chenfei/MAESTRO/Refdata_scRNA_MAESTRO_GRCh38_1.1.0.tar.gz) and [mouse](http://cistrome.org/~chenfei/MAESTRO/Refdata_scRNA_MAESTRO_GRCm38_1.1.0.tar.gz) from CistromeDB and decompress them. Then specify the index directory for STAR, for example, `--mapindex Refdata_scRNA_MAESTRO_GRCh38_1.1.0/GRCh38_STAR_2.7.3a`. `--rsem` | The prefix of transcript references for RSEM used by rsem-prepare-reference (Only required when the platform is Smartseq2). Users can directly download the reference file for [human](http://cistrome.org/~chenfei/MAESTRO/Refdata_scRNA_MAESTRO_GRCh38_1.1.0.tar.gz) and [mouse](http://cistrome.org/~chenfei/MAESTRO/Refdata_scRNA_MAESTRO_GRCm38_1.1.0.tar.gz) from CistromeDB and decompress them. Then specify the prefix for RSEM, for example, `--rsem Refdata_scRNA_MAESTRO_GRCh38_1.1.0/GRCh38_RSEM_1.3.2/GRCh38`. **Barcode arguments, for platform of 'Dropseq' or '10x-genomics':** From 80e06be0e726d9bc7b2df2890dbabbd2af1bb9d1 Mon Sep 17 00:00:00 2001 From: Dongqing Sun Date: Mon, 13 Apr 2020 02:08:12 +0800 Subject: [PATCH 4/4] Update ATAC tutorials --- example/ATAC_infrastructure_10x/ATAC_infrastructure_10x.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/example/ATAC_infrastructure_10x/ATAC_infrastructure_10x.md b/example/ATAC_infrastructure_10x/ATAC_infrastructure_10x.md index 1724306..2f505d8 100644 --- a/example/ATAC_infrastructure_10x/ATAC_infrastructure_10x.md +++ b/example/ATAC_infrastructure_10x/ATAC_infrastructure_10x.md @@ -25,7 +25,7 @@ $ MAESTRO scatac-init --platform 10x-genomics --species GRCh38 \ --peak-cutoff 100 --count-cutoff 1000 --frip-cutoff 0.2 --cell-cutoff 50 \ --giggleannotation annotations/MAESTRO/giggle \ --fasta annotations/MAESTRO/Refdata_scATAC_MAESTRO_GRCh38_1.1.0/GRCh38_genome.fa \ ---whitelist Data/barcodes/737K-cratac-v1.txt +--whitelist Data/barcodes/737K-cratac-v1.txt --signature human.immune.CIBERSORT ``` To get a full description of command-line options, please use the command `MAESTRO scatac-init -h`.