Merge pull request #36 from liulab-dfci/doc/tutorial-1

Doc/tutorial 1
liulab-dfci · Apr 14, 2020 · 15c3818 · 15c3818
2 parents 844a828 + 80e06be
commit 15c3818
Show file tree

Hide file tree

Showing 3 changed files with 32 additions and 36 deletions.
diff --git a/example/ATAC_infrastructure_10x/ATAC_infrastructure_10x.md b/example/ATAC_infrastructure_10x/ATAC_infrastructure_10x.md
@@ -20,31 +20,30 @@ $ conda activate MAESTRO
 Initialize the MAESTRO scATAC-seq workflow using `MAESTRO scATAC-init` command. This will install a Snakefile and a config file in this directory.
 ```bash
 $ MAESTRO scatac-init --platform 10x-genomics --species GRCh38 \
---fastq-dir /home1/wangchenfei/Project/SingleCell/scATAC/Analysis/MAESTRO_tutorial/Data/atac_v1_pbmc_10k_fastqs --fastq-prefix atac_v1_pbmc_10k \
+--fastq-dir Data/atac_v1_pbmc_10k_fastqs --fastq-prefix atac_v1_pbmc_10k \
 --cores 8 --directory 10X_PBMC_10k_MAESTRO_V110 --outprefix 10X_PBMC_10k \
 --peak-cutoff 100 --count-cutoff 1000 --frip-cutoff 0.2 --cell-cutoff 50 \
---giggleannotation /home1/wangchenfei/annotations/MAESTRO/giggle \
---fasta /home1/wangchenfei/annotations/MAESTRO/Refdata_scATAC_MAESTRO_GRCh38_1.1.0/GRCh38_genome.fa \
---whitelist /home1/wangchenfei/Tool/cellranger-atac-1.1.0/cellranger-atac-cs/1.1.0/lib/python/barcodes/737K-cratac-v1.txt
+--giggleannotation annotations/MAESTRO/giggle \
+--fasta annotations/MAESTRO/Refdata_scATAC_MAESTRO_GRCh38_1.1.0/GRCh38_genome.fa \
+--whitelist Data/barcodes/737K-cratac-v1.txt --signature human.immune.CIBERSORT
 ```
 
 To get a full description of command-line options, please use the command `MAESTRO scatac-init -h`.
 ```bash
 usage: MAESTRO scatac-init [-h]
                            [--platform {10x-genomics,sci-ATAC-seq,microfluidic}]
-                           [--fastq-dir FASTQ_DIR]
-                           [--fastq-prefix FASTQ_PREFIX]
+                           --fastq-dir FASTQ_DIR [--fastq-prefix FASTQ_PREFIX]
                            [--species {GRCh38,GRCm38}] [--cores CORES]
-                           [-d DIRECTORY] [--outprefix OUTPREFIX]
+                           [--directory DIRECTORY] [--outprefix OUTPREFIX]
                            [--peak-cutoff PEAK_CUTOFF]
                            [--count-cutoff COUNT_CUTOFF]
                            [--frip-cutoff FRIP_CUTOFF]
                            [--cell-cutoff CELL_CUTOFF] --giggleannotation
                            GIGGLEANNOTATION --fasta FASTA
                            [--whitelist WHITELIST] [--custompeak]
                            [--custompeak-file CUSTOMPEAK_FILE] [--shortpeak]
-                           [--genedistance GENEDISTANCE] [--signature]
-                           [--signature-file SIGNATURE_FILE]
+                           [--genedistance GENEDISTANCE]
+                           [--signature SIGNATURE]
 ```
 
 Here we list all the arguments and their description.
@@ -56,7 +55,7 @@ Arguments  |  Description
 `--platform` | {10x-genomics,Dropseq,Smartseq2} Platform of single cell RNA-seq. DEFAULT: 10x-genomics.
 `--fastq-dir` | Directory where fastq files are stored.
 `--fastq-prefix` | Sample name of fastq file (required for the platform of '10x-genomics' or 'sci-ATAC-seq'). When the platform is '10x-genomics', if there is a file named pbmc_1k_v2_S1_L001_I1_001.fastq.gz, the prefix is 'pbmc_1k_v2'. If the platform is 'sci-ATAC-seq', there are two ways to provide fastq files. The first is to provide pair-end sequencing results that contain two fastq files -- prefix_1.fastq and prefix_2.fastq. If in this way, the barcode for each read needs to be included in the reads ID (the first line of each read) in the format of '@ReadName:CellBarcode:OtherInformation'. For example, @rd.1:TCTCCCGCCGAGGCTGACTGCATAAGGCGAAT:SHEN-MISEQ02:1:1101:15311:1341. The other way is to provide 10x-like fastq files which should contain three fastq files -- prefix_R1.fastq, prefix_R2.fastq and prefix_R3.fastq. In this way, read1, barcode and read2 are associated with R1, R2, R3, respectively.
-`--species` | {GRCh38,GRCm38} Species (GRCh38 for human and GRCm38 for mouse). DEFAULT: GRCh38.
+`--species` | {GRCh38,GRCm38} Specify the genome assembly (GRCh38 for human and GRCm38 for mouse). DEFAULT: GRCh38.
 
 **Running and output arguments:**
 
@@ -80,7 +79,7 @@ Arguments  |  Description
 Arguments  |  Description
 ---------  |  -----------
 `--giggleannotation` | Path of the giggle annotation file required for regulator identification. Please download the annotation file from [here](http://cistrome.org/~chenfei/MAESTRO/giggle.tar.gz) and decompress it.
-`--fasta` | Genome fasta file for minimap2. Users can just download the fasta file from [here](http://cistrome.org/~chenfei/MAESTRO/Refdata_scATAC_MAESTRO_GRCh38_1.1.0.tar.gz) and decompress it. For example, `--fasta Refdata_scATAC_MAESTR O_GRCh38_1.1.0/GRCh38_genome.fa`.
+`--fasta` | Genome fasta file for minimap2. Users can just download the fasta file for [human](http://cistrome.org/~chenfei/MAESTRO/Refdata_scATAC_MAESTRO_GRCh38_1.1.0.tar.gz) and [mouse](http://cistrome.org/~chenfei/MAESTRO/Refdata_scATAC_MAESTRO_GRCm38_1.1.0.tar.gz) from CistromDB and decompress them. For example, `--fasta Refdata_scATAC_MAESTR O_GRCh38_1.1.0/GRCh38_genome.fa`.
 
 **Barcode library arguments, only for the platform of 'sci-ATAC-seq':**
 
@@ -106,8 +105,7 @@ Arguments  |  Description
 
 Arguments  |  Description
 ---------  |  -----------
-`--signature` | Whether or not to provide custom cell signatures. If set, users need to provide the file location of cell signatures through `--signature-file`. By default (not set), the pipeline will use the built-in immune cell signature adapted from CIBERSORT.
-`--signature-file` | If `--signature` is set, please provide the file location of custom cell signatures. The signature file is tab-separated without header. The first column is the cell type, and the second column is the signature gene.
+`--signature` | Cell signature file used to annotate cell types. MAESTRO provides several sets of built-in cell signatures. Users can choose from ['human.immune.CIBERSORT', 'mouce.brain.ALLEN', 'mouse.all.facs.TabulaMuris', 'mouse.all.droplet.TabulaMuris']. Custom cell signatures are also supported. In this situation, users need to provide the file location of cell signatures, and the signature file is tab-seperated without header. The first column is cell type, and the second column is signature gene. DEFAULT: human.immune.CIBERSORT.
 
 
 ### Step 2. Run MAESTRO

diff --git a/example/Integration/Integration.md b/example/Integration/Integration.md
@@ -13,8 +13,8 @@ $ conda activate MAESTRO
 ### Step 1. Configure the MAESTRO workflow
 Initialize the MAESTRO integration workflow using `MAESTRO scATAC-init` command. This will install a Snakefile and a config file in this directory.
 ```bash
-$ MAESTRO integrate-init --rna-object /home1/wangchenfei/Project/SingleCell/scRNA/Analysis/MAESTRO_tutorial/10X_PBMC_8k_MAESTRO_V110/Result/Analysis/10X_PBMC_8k_scRNA_Object.rds \
---atac-object /home1/wangchenfei/Project/SingleCell/scATAC/Analysis/MAESTRO_tutorial/10X_PBMC_10k_MAESTRO_V110/Result/Analysis/10X_PBMC_10k_scATAC_Object.rds \
+$ MAESTRO integrate-init --rna-object MAESTRO_tutorial/10X_PBMC_8k_MAESTRO_V110/Result/Analysis/10X_PBMC_8k_scRNA_Object.rds \
+--atac-object MAESTRO_tutorial/10X_PBMC_10k_MAESTRO_V110/Result/Analysis/10X_PBMC_10k_scATAC_Object.rds \
 --directory 10X_PBMC_8kRNA_10kATAC_MAESTRO_V110 --outprefix 10X_PBMC_8kRNA_10kATAC
 ```
 
@@ -109,8 +109,8 @@ By default, MAESTRO will label the top 10 regulators using TF enrichment from GI
                              cluster.2 = 0,
                              type = "Integrated", 
                              SeuratObj = pbmc.RNA.res$RNA, 
-                             LISA.table = "/home1/wangchenfei/Project/SingleCell/scRNA/Analysis/MAESTRO_tutorial/10X_PBMC_8k_MAESTRO_V110/10X_PBMC_8k_lisa.txt",
-                             GIGGLE.table = "/home1/wangchenfei/Project/SingleCell/scATAC/Analysis/MAESTRO_tutorial/10X_PBMC_10k_MAESTRO_V110/10X_PBMC_10k_giggle.txt",
+                             LISA.table = "MAESTRO_tutorial/10X_PBMC_8k_MAESTRO_V110/10X_PBMC_8k_lisa.txt",
+                             GIGGLE.table = "MAESTRO_tutorial/10X_PBMC_10k_MAESTRO_V110/10X_PBMC_10k_giggle.txt",
                              visual.totalnumber = 100, 
                              name = "10X_PBMC_integrated_Monocyte_top") 
 ```

diff --git a/example/RNA_infrastructure_10x/RNA_infrastructure_10x.md b/example/RNA_infrastructure_10x/RNA_infrastructure_10x.md
@@ -20,33 +20,32 @@ $ conda activate MAESTRO
 Initialize the MAESTRO scRNA-seq workflow using `MAESTRO scrna-init` command. This will install a Snakefile and a config file in this directory.
 ```bash
 $ MAESTRO scrna-init --platform 10x-genomics --species GRCh38 \
---fastq-dir /home1/wangchenfei/Project/SingleCell/scRNA/Analysis/MAESTRO_tutorial/Data/fastqs --fastq-prefix pbmc8k \
---cores 8 --rseqc --directory 10X_PBMC_8k_MAESTRO_V110 --outprefix 10X_PBMC_8k \
---mapindex /home1/wangchenfei/annotations/MAESTRO/Refdata_scRNA_MAESTRO_GRCh38_1.1.0/GRCh38_STAR_2.7.3a \
---whitelist /home1/wangchenfei/Tool/cellranger-3.1.0/cellranger-cs/3.1.0/lib/python/cellranger/barcodes/737K-august-2016.txt \
---umi-length 10 --method LISA --lisamode local --lisaenv lisa_update --condadir /home1/wangchenfei/miniconda3
+--fastq-dir Data/10X_PBMC_8k/fastqs --fastq-prefix pbmc8k \
+--cores 8 --rseqc --directory Analysis/10X_PBMC_8k_MAESTRO_V110 --outprefix 10X_PBMC_8k \
+--mapindex annotations/MAESTRO/Refdata_scRNA_MAESTRO_GRCh38_1.1.0/GRCh38_STAR_2.7.3a \
+--whitelist Data/barcodes/737K-august-2016.txt \
+--umi-length 10 --method LISA --lisamode local --lisaenv lisa --condadir /home1/user/miniconda3 --signature human.immune.CIBERSORT
 ```
 
 To get a full description of command-line options, please use the command `MAESTRO scrna-init -h`.
 ```bash
 usage: MAESTRO scrna-init [-h] [--platform {10x-genomics,Dropseq,Smartseq2}]
-                          [--fastq-dir FASTQ_DIR]
-                          [--fastq-prefix FASTQ_PREFIX]
+                          --fastq-dir FASTQ_DIR [--fastq-prefix FASTQ_PREFIX]
                           [--fastq-barcode FASTQ_BARCODE]
                           [--fastq-transcript FASTQ_TRANSCRIPT]
                           [--species {GRCh38,GRCm38}] [--cores CORES]
-                          [--rseqc] [-d DIRECTORY] [--outprefix OUTPREFIX]
+                          [--rseqc] [--directory DIRECTORY]
+                          [--outprefix OUTPREFIX]
                           [--count-cutoff COUNT_CUTOFF]
                           [--gene-cutoff GENE_CUTOFF]
                           [--cell-cutoff CELL_CUTOFF] --mapindex MAPINDEX
                           [--rsem RSEM] [--whitelist WHITELIST]
                           [--barcode-start BARCODE_START]
                           [--barcode-length BARCODE_LENGTH]
                           [--umi-start UMI_START] [--umi-length UMI_LENGTH]
-                          [--method {RABIT,LISA}] [--rabitlib RABITLIB]
-                          [--lisamode {local,web}] [--lisaenv LISAENV]
-                          [--condadir CONDADIR] [--signature]
-                          [--signature-file SIGNATURE_FILE]
+                          [--method {LISA}] [--lisamode {local,web}]
+                          [--lisaenv LISAENV] --condadir CONDADIR
+                          [--signature SIGNATURE]
 ```
 
 Here we list all the arguments and their description.
@@ -60,7 +59,7 @@ Arguments  |  Description
 `--fastq-prefix` | Sample name of fastq file, only for the platform of '10x-genomics'. If there is a file named pbmc_1k_v2_S1_L001_I1_001.fastq.gz, the prefix is 'pbmc_1k_v2'.
 `--fastq-barcode` | Specify the barcode fastq file, only for the platform of 'Dropseq'. If there are multiple pairs of fastq, please provide a comma-separated list of barcode fastq files. For example, `--fastq-barcode test1_1.fastq,test2_1.fastq`.
 `--fastq-transcript` | Specify the transcript fastq file, only for the platform of 'Dropseq'.
-`--species` | {GRCh38,GRCm38} Species (GRCh38 for human and GRCm38 for mouse). DEFAULT: GRCh38.
+`--species` | {GRCh38,GRCm38} Specify the genome assembly (GRCh38 for human and GRCm38 for mouse). DEFAULT: GRCh38.
 
 **Running and output arguments:**
 
@@ -83,8 +82,8 @@ Arguments  |  Description
 
 Arguments  |  Description
 ---------  |  -----------
-`--mapindex` | Genome index directory for STAR. Users can just download the index file from [here](http://cistrome.org/~chenfei/MAESTRO/Refdata_scRNA_MAESTRO_GRCh38_1.1.0.tar.gz) and decompress it. Then specify the index directory for STAR, for example, `--mapindex Refdata_scRNA_MAESTRO_GRCh38_1.1.0/GRCh38_STAR_2.7.3a`.
-`--rsem` | The prefix of transcript references for RSEM used by rsem-prepare-reference (Only required when the platform is Smartseq2). Users can directly download the annotation file from [here](http://cistrome.org/~chenfei/MAESTRO/giggle.tar.gz) and decompress it. Then specify the prefix for RSEM, for example, `--rsem Refdata_scRNA_MAESTRO_GRCh38_1.1.0/GRCh38_RSEM_1.3.2/GRCh38`.
+`--mapindex` | Genome index directory for STAR. Users can just download the index file for [human](http://cistrome.org/~chenfei/MAESTRO/Refdata_scRNA_MAESTRO_GRCh38_1.1.0.tar.gz) and [mouse](http://cistrome.org/~chenfei/MAESTRO/Refdata_scRNA_MAESTRO_GRCm38_1.1.0.tar.gz) from CistromeDB and decompress them. Then specify the index directory for STAR, for example, `--mapindex Refdata_scRNA_MAESTRO_GRCh38_1.1.0/GRCh38_STAR_2.7.3a`.
+`--rsem` | The prefix of transcript references for RSEM used by rsem-prepare-reference (Only required when the platform is Smartseq2). Users can directly download the reference file for [human](http://cistrome.org/~chenfei/MAESTRO/Refdata_scRNA_MAESTRO_GRCh38_1.1.0.tar.gz) and [mouse](http://cistrome.org/~chenfei/MAESTRO/Refdata_scRNA_MAESTRO_GRCm38_1.1.0.tar.gz) from CistromeDB and decompress them. Then specify the prefix for RSEM, for example, `--rsem Refdata_scRNA_MAESTRO_GRCh38_1.1.0/GRCh38_RSEM_1.3.2/GRCh38`.
 
 **Barcode arguments, for platform of 'Dropseq' or '10x-genomics':**
 
@@ -109,8 +108,7 @@ Arguments  |  Description
 
 Arguments  |  Description
 ---------  |  -----------
-`--signature` | Whether or not to provide custom cell signatures. If set, users need to provide the file location of cell signatures through `--signature-file`. By default (not set), the pipeline will use the built-in immune cell signature adapted from CIBERSORT.
-`--signature-file` | If `--signature` is set, please provide the file location of custom cell signatures. The signature file is tab-separated without header. The first column is the cell type, and the second column is the signature gene.
+`--signature` | Cell signature file used to annotate cell types. MAESTRO provides several sets of built-in cell signatures. Users can choose from ['human.immune.CIBERSORT', 'mouce.brain.ALLEN', 'mouse.all.facs.TabulaMuris', 'mouse.all.droplet.TabulaMuris']. Custom cell signatures are also supported. In this situation, users need to provide the file location of cell signatures, and the signature file is tab-seperated without header. The first column is cell type, and the second column is signature gene. DEFAULT: human.immune.CIBERSORT.
 
 
 ### Step 2. Run MAESTRO
@@ -289,8 +287,8 @@ To identify enriched transcription regulators is crucial to understanding gene r
                                                  project = pbmc.RNA.res$RNA@project.name,
                                                  method = "LISA",
                                                  lisa.mode = "local",
-                                                 conda.dir = "/home1/wangchenfei/miniconda3",
-                                                 lisa.envname = "lisa_update",
+                                                 conda.dir = "/home1/user/miniconda3",
+                                                 lisa.envname = "lisa",
                                                  organism = "GRCh38",
                                                  top.tf = 10)
 > pbmc.RNA.tfs[["0"]]