Skip to content

Commit

Permalink
Merge pull request #36 from liulab-dfci/doc/tutorial-1
Browse files Browse the repository at this point in the history
Doc/tutorial 1
  • Loading branch information
DongqingSun96 authored Apr 14, 2020
2 parents 844a828 + 80e06be commit 15c3818
Show file tree
Hide file tree
Showing 3 changed files with 32 additions and 36 deletions.
24 changes: 11 additions & 13 deletions example/ATAC_infrastructure_10x/ATAC_infrastructure_10x.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,31 +20,30 @@ $ conda activate MAESTRO
Initialize the MAESTRO scATAC-seq workflow using `MAESTRO scATAC-init` command. This will install a Snakefile and a config file in this directory.
```bash
$ MAESTRO scatac-init --platform 10x-genomics --species GRCh38 \
--fastq-dir /home1/wangchenfei/Project/SingleCell/scATAC/Analysis/MAESTRO_tutorial/Data/atac_v1_pbmc_10k_fastqs --fastq-prefix atac_v1_pbmc_10k \
--fastq-dir Data/atac_v1_pbmc_10k_fastqs --fastq-prefix atac_v1_pbmc_10k \
--cores 8 --directory 10X_PBMC_10k_MAESTRO_V110 --outprefix 10X_PBMC_10k \
--peak-cutoff 100 --count-cutoff 1000 --frip-cutoff 0.2 --cell-cutoff 50 \
--giggleannotation /home1/wangchenfei/annotations/MAESTRO/giggle \
--fasta /home1/wangchenfei/annotations/MAESTRO/Refdata_scATAC_MAESTRO_GRCh38_1.1.0/GRCh38_genome.fa \
--whitelist /home1/wangchenfei/Tool/cellranger-atac-1.1.0/cellranger-atac-cs/1.1.0/lib/python/barcodes/737K-cratac-v1.txt
--giggleannotation annotations/MAESTRO/giggle \
--fasta annotations/MAESTRO/Refdata_scATAC_MAESTRO_GRCh38_1.1.0/GRCh38_genome.fa \
--whitelist Data/barcodes/737K-cratac-v1.txt --signature human.immune.CIBERSORT
```

To get a full description of command-line options, please use the command `MAESTRO scatac-init -h`.
```bash
usage: MAESTRO scatac-init [-h]
[--platform {10x-genomics,sci-ATAC-seq,microfluidic}]
[--fastq-dir FASTQ_DIR]
[--fastq-prefix FASTQ_PREFIX]
--fastq-dir FASTQ_DIR [--fastq-prefix FASTQ_PREFIX]
[--species {GRCh38,GRCm38}] [--cores CORES]
[-d DIRECTORY] [--outprefix OUTPREFIX]
[--directory DIRECTORY] [--outprefix OUTPREFIX]
[--peak-cutoff PEAK_CUTOFF]
[--count-cutoff COUNT_CUTOFF]
[--frip-cutoff FRIP_CUTOFF]
[--cell-cutoff CELL_CUTOFF] --giggleannotation
GIGGLEANNOTATION --fasta FASTA
[--whitelist WHITELIST] [--custompeak]
[--custompeak-file CUSTOMPEAK_FILE] [--shortpeak]
[--genedistance GENEDISTANCE] [--signature]
[--signature-file SIGNATURE_FILE]
[--genedistance GENEDISTANCE]
[--signature SIGNATURE]
```

Here we list all the arguments and their description.
Expand All @@ -56,7 +55,7 @@ Arguments | Description
`--platform` | {10x-genomics,Dropseq,Smartseq2} Platform of single cell RNA-seq. DEFAULT: 10x-genomics.
`--fastq-dir` | Directory where fastq files are stored.
`--fastq-prefix` | Sample name of fastq file (required for the platform of '10x-genomics' or 'sci-ATAC-seq'). When the platform is '10x-genomics', if there is a file named pbmc_1k_v2_S1_L001_I1_001.fastq.gz, the prefix is 'pbmc_1k_v2'. If the platform is 'sci-ATAC-seq', there are two ways to provide fastq files. The first is to provide pair-end sequencing results that contain two fastq files -- prefix_1.fastq and prefix_2.fastq. If in this way, the barcode for each read needs to be included in the reads ID (the first line of each read) in the format of '@ReadName:CellBarcode:OtherInformation'. For example, @rd.1:TCTCCCGCCGAGGCTGACTGCATAAGGCGAAT:SHEN-MISEQ02:1:1101:15311:1341. The other way is to provide 10x-like fastq files which should contain three fastq files -- prefix_R1.fastq, prefix_R2.fastq and prefix_R3.fastq. In this way, read1, barcode and read2 are associated with R1, R2, R3, respectively.
`--species` | {GRCh38,GRCm38} Species (GRCh38 for human and GRCm38 for mouse). DEFAULT: GRCh38.
`--species` | {GRCh38,GRCm38} Specify the genome assembly (GRCh38 for human and GRCm38 for mouse). DEFAULT: GRCh38.

**Running and output arguments:**

Expand All @@ -80,7 +79,7 @@ Arguments | Description
Arguments | Description
--------- | -----------
`--giggleannotation` | Path of the giggle annotation file required for regulator identification. Please download the annotation file from [here](http://cistrome.org/~chenfei/MAESTRO/giggle.tar.gz) and decompress it.
`--fasta` | Genome fasta file for minimap2. Users can just download the fasta file from [here](http://cistrome.org/~chenfei/MAESTRO/Refdata_scATAC_MAESTRO_GRCh38_1.1.0.tar.gz) and decompress it. For example, `--fasta Refdata_scATAC_MAESTR O_GRCh38_1.1.0/GRCh38_genome.fa`.
`--fasta` | Genome fasta file for minimap2. Users can just download the fasta file for [human](http://cistrome.org/~chenfei/MAESTRO/Refdata_scATAC_MAESTRO_GRCh38_1.1.0.tar.gz) and [mouse](http://cistrome.org/~chenfei/MAESTRO/Refdata_scATAC_MAESTRO_GRCm38_1.1.0.tar.gz) from CistromDB and decompress them. For example, `--fasta Refdata_scATAC_MAESTR O_GRCh38_1.1.0/GRCh38_genome.fa`.

**Barcode library arguments, only for the platform of 'sci-ATAC-seq':**

Expand All @@ -106,8 +105,7 @@ Arguments | Description

Arguments | Description
--------- | -----------
`--signature` | Whether or not to provide custom cell signatures. If set, users need to provide the file location of cell signatures through `--signature-file`. By default (not set), the pipeline will use the built-in immune cell signature adapted from CIBERSORT.
`--signature-file` | If `--signature` is set, please provide the file location of custom cell signatures. The signature file is tab-separated without header. The first column is the cell type, and the second column is the signature gene.
`--signature` | Cell signature file used to annotate cell types. MAESTRO provides several sets of built-in cell signatures. Users can choose from ['human.immune.CIBERSORT', 'mouce.brain.ALLEN', 'mouse.all.facs.TabulaMuris', 'mouse.all.droplet.TabulaMuris']. Custom cell signatures are also supported. In this situation, users need to provide the file location of cell signatures, and the signature file is tab-seperated without header. The first column is cell type, and the second column is signature gene. DEFAULT: human.immune.CIBERSORT.


### Step 2. Run MAESTRO
Expand Down
8 changes: 4 additions & 4 deletions example/Integration/Integration.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,8 @@ $ conda activate MAESTRO
### Step 1. Configure the MAESTRO workflow
Initialize the MAESTRO integration workflow using `MAESTRO scATAC-init` command. This will install a Snakefile and a config file in this directory.
```bash
$ MAESTRO integrate-init --rna-object /home1/wangchenfei/Project/SingleCell/scRNA/Analysis/MAESTRO_tutorial/10X_PBMC_8k_MAESTRO_V110/Result/Analysis/10X_PBMC_8k_scRNA_Object.rds \
--atac-object /home1/wangchenfei/Project/SingleCell/scATAC/Analysis/MAESTRO_tutorial/10X_PBMC_10k_MAESTRO_V110/Result/Analysis/10X_PBMC_10k_scATAC_Object.rds \
$ MAESTRO integrate-init --rna-object MAESTRO_tutorial/10X_PBMC_8k_MAESTRO_V110/Result/Analysis/10X_PBMC_8k_scRNA_Object.rds \
--atac-object MAESTRO_tutorial/10X_PBMC_10k_MAESTRO_V110/Result/Analysis/10X_PBMC_10k_scATAC_Object.rds \
--directory 10X_PBMC_8kRNA_10kATAC_MAESTRO_V110 --outprefix 10X_PBMC_8kRNA_10kATAC
```

Expand Down Expand Up @@ -109,8 +109,8 @@ By default, MAESTRO will label the top 10 regulators using TF enrichment from GI
cluster.2 = 0,
type = "Integrated",
SeuratObj = pbmc.RNA.res$RNA,
LISA.table = "/home1/wangchenfei/Project/SingleCell/scRNA/Analysis/MAESTRO_tutorial/10X_PBMC_8k_MAESTRO_V110/10X_PBMC_8k_lisa.txt",
GIGGLE.table = "/home1/wangchenfei/Project/SingleCell/scATAC/Analysis/MAESTRO_tutorial/10X_PBMC_10k_MAESTRO_V110/10X_PBMC_10k_giggle.txt",
LISA.table = "MAESTRO_tutorial/10X_PBMC_8k_MAESTRO_V110/10X_PBMC_8k_lisa.txt",
GIGGLE.table = "MAESTRO_tutorial/10X_PBMC_10k_MAESTRO_V110/10X_PBMC_10k_giggle.txt",
visual.totalnumber = 100,
name = "10X_PBMC_integrated_Monocyte_top")
```
Expand Down
36 changes: 17 additions & 19 deletions example/RNA_infrastructure_10x/RNA_infrastructure_10x.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,33 +20,32 @@ $ conda activate MAESTRO
Initialize the MAESTRO scRNA-seq workflow using `MAESTRO scrna-init` command. This will install a Snakefile and a config file in this directory.
```bash
$ MAESTRO scrna-init --platform 10x-genomics --species GRCh38 \
--fastq-dir /home1/wangchenfei/Project/SingleCell/scRNA/Analysis/MAESTRO_tutorial/Data/fastqs --fastq-prefix pbmc8k \
--cores 8 --rseqc --directory 10X_PBMC_8k_MAESTRO_V110 --outprefix 10X_PBMC_8k \
--mapindex /home1/wangchenfei/annotations/MAESTRO/Refdata_scRNA_MAESTRO_GRCh38_1.1.0/GRCh38_STAR_2.7.3a \
--whitelist /home1/wangchenfei/Tool/cellranger-3.1.0/cellranger-cs/3.1.0/lib/python/cellranger/barcodes/737K-august-2016.txt \
--umi-length 10 --method LISA --lisamode local --lisaenv lisa_update --condadir /home1/wangchenfei/miniconda3
--fastq-dir Data/10X_PBMC_8k/fastqs --fastq-prefix pbmc8k \
--cores 8 --rseqc --directory Analysis/10X_PBMC_8k_MAESTRO_V110 --outprefix 10X_PBMC_8k \
--mapindex annotations/MAESTRO/Refdata_scRNA_MAESTRO_GRCh38_1.1.0/GRCh38_STAR_2.7.3a \
--whitelist Data/barcodes/737K-august-2016.txt \
--umi-length 10 --method LISA --lisamode local --lisaenv lisa --condadir /home1/user/miniconda3 --signature human.immune.CIBERSORT
```

To get a full description of command-line options, please use the command `MAESTRO scrna-init -h`.
```bash
usage: MAESTRO scrna-init [-h] [--platform {10x-genomics,Dropseq,Smartseq2}]
[--fastq-dir FASTQ_DIR]
[--fastq-prefix FASTQ_PREFIX]
--fastq-dir FASTQ_DIR [--fastq-prefix FASTQ_PREFIX]
[--fastq-barcode FASTQ_BARCODE]
[--fastq-transcript FASTQ_TRANSCRIPT]
[--species {GRCh38,GRCm38}] [--cores CORES]
[--rseqc] [-d DIRECTORY] [--outprefix OUTPREFIX]
[--rseqc] [--directory DIRECTORY]
[--outprefix OUTPREFIX]
[--count-cutoff COUNT_CUTOFF]
[--gene-cutoff GENE_CUTOFF]
[--cell-cutoff CELL_CUTOFF] --mapindex MAPINDEX
[--rsem RSEM] [--whitelist WHITELIST]
[--barcode-start BARCODE_START]
[--barcode-length BARCODE_LENGTH]
[--umi-start UMI_START] [--umi-length UMI_LENGTH]
[--method {RABIT,LISA}] [--rabitlib RABITLIB]
[--lisamode {local,web}] [--lisaenv LISAENV]
[--condadir CONDADIR] [--signature]
[--signature-file SIGNATURE_FILE]
[--method {LISA}] [--lisamode {local,web}]
[--lisaenv LISAENV] --condadir CONDADIR
[--signature SIGNATURE]
```

Here we list all the arguments and their description.
Expand All @@ -60,7 +59,7 @@ Arguments | Description
`--fastq-prefix` | Sample name of fastq file, only for the platform of '10x-genomics'. If there is a file named pbmc_1k_v2_S1_L001_I1_001.fastq.gz, the prefix is 'pbmc_1k_v2'.
`--fastq-barcode` | Specify the barcode fastq file, only for the platform of 'Dropseq'. If there are multiple pairs of fastq, please provide a comma-separated list of barcode fastq files. For example, `--fastq-barcode test1_1.fastq,test2_1.fastq`.
`--fastq-transcript` | Specify the transcript fastq file, only for the platform of 'Dropseq'.
`--species` | {GRCh38,GRCm38} Species (GRCh38 for human and GRCm38 for mouse). DEFAULT: GRCh38.
`--species` | {GRCh38,GRCm38} Specify the genome assembly (GRCh38 for human and GRCm38 for mouse). DEFAULT: GRCh38.

**Running and output arguments:**

Expand All @@ -83,8 +82,8 @@ Arguments | Description

Arguments | Description
--------- | -----------
`--mapindex` | Genome index directory for STAR. Users can just download the index file from [here](http://cistrome.org/~chenfei/MAESTRO/Refdata_scRNA_MAESTRO_GRCh38_1.1.0.tar.gz) and decompress it. Then specify the index directory for STAR, for example, `--mapindex Refdata_scRNA_MAESTRO_GRCh38_1.1.0/GRCh38_STAR_2.7.3a`.
`--rsem` | The prefix of transcript references for RSEM used by rsem-prepare-reference (Only required when the platform is Smartseq2). Users can directly download the annotation file from [here](http://cistrome.org/~chenfei/MAESTRO/giggle.tar.gz) and decompress it. Then specify the prefix for RSEM, for example, `--rsem Refdata_scRNA_MAESTRO_GRCh38_1.1.0/GRCh38_RSEM_1.3.2/GRCh38`.
`--mapindex` | Genome index directory for STAR. Users can just download the index file for [human](http://cistrome.org/~chenfei/MAESTRO/Refdata_scRNA_MAESTRO_GRCh38_1.1.0.tar.gz) and [mouse](http://cistrome.org/~chenfei/MAESTRO/Refdata_scRNA_MAESTRO_GRCm38_1.1.0.tar.gz) from CistromeDB and decompress them. Then specify the index directory for STAR, for example, `--mapindex Refdata_scRNA_MAESTRO_GRCh38_1.1.0/GRCh38_STAR_2.7.3a`.
`--rsem` | The prefix of transcript references for RSEM used by rsem-prepare-reference (Only required when the platform is Smartseq2). Users can directly download the reference file for [human](http://cistrome.org/~chenfei/MAESTRO/Refdata_scRNA_MAESTRO_GRCh38_1.1.0.tar.gz) and [mouse](http://cistrome.org/~chenfei/MAESTRO/Refdata_scRNA_MAESTRO_GRCm38_1.1.0.tar.gz) from CistromeDB and decompress them. Then specify the prefix for RSEM, for example, `--rsem Refdata_scRNA_MAESTRO_GRCh38_1.1.0/GRCh38_RSEM_1.3.2/GRCh38`.

**Barcode arguments, for platform of 'Dropseq' or '10x-genomics':**

Expand All @@ -109,8 +108,7 @@ Arguments | Description

Arguments | Description
--------- | -----------
`--signature` | Whether or not to provide custom cell signatures. If set, users need to provide the file location of cell signatures through `--signature-file`. By default (not set), the pipeline will use the built-in immune cell signature adapted from CIBERSORT.
`--signature-file` | If `--signature` is set, please provide the file location of custom cell signatures. The signature file is tab-separated without header. The first column is the cell type, and the second column is the signature gene.
`--signature` | Cell signature file used to annotate cell types. MAESTRO provides several sets of built-in cell signatures. Users can choose from ['human.immune.CIBERSORT', 'mouce.brain.ALLEN', 'mouse.all.facs.TabulaMuris', 'mouse.all.droplet.TabulaMuris']. Custom cell signatures are also supported. In this situation, users need to provide the file location of cell signatures, and the signature file is tab-seperated without header. The first column is cell type, and the second column is signature gene. DEFAULT: human.immune.CIBERSORT.


### Step 2. Run MAESTRO
Expand Down Expand Up @@ -289,8 +287,8 @@ To identify enriched transcription regulators is crucial to understanding gene r
project = pbmc.RNA.res$RNA@project.name,
method = "LISA",
lisa.mode = "local",
conda.dir = "/home1/wangchenfei/miniconda3",
lisa.envname = "lisa_update",
conda.dir = "/home1/user/miniconda3",
lisa.envname = "lisa",
organism = "GRCh38",
top.tf = 10)
> pbmc.RNA.tfs[["0"]]
Expand Down

0 comments on commit 15c3818

Please sign in to comment.