- Reference genome sequence file
- Reference genome annotation file (in
.gtfformat) - RNA-seq fastq files
Gene expression count matrix.
├── script
│ └── snake_pipeline
├── raw_data
├── genome_index
└── logsPlease storage your resequence data in raw_data/ folder and genome file in genome_index/ folder. Script files, pipeline files and configuration files can be stored in the way you are used to.
The config file needs to be at the same folder of snakefile.
# Absolute path to the genome fasta file
ref: "/workingdir/genome_index/genome.fasta" 2.2 Sometimes the fastq files may be ended with .fastq.gz or .fq.gz, specify the suffix of the fastq files if it's necessary.
# Fastq file suffix
fastq_suffix: " " # Default value is ".fq.gz"# Sample list, samples' name should start with letters.
sample:
- "sample1"
- "sample2"
- "sample3"
- "sample4"
- ...
- "samplen"You can use following command to add sample list to the config file if you have a sample list txt file (for example sample.list):
# sample.list
sample1
sample2
sample3
sample4
# Add samples to the config file:
awk '{print " - \"" $0 "\""}' sample.list >> ${working_dir}/RNAseq_config_featureCounts.yamlFor example:
snakemake \
--snakefile ${snakefile} \
-d ${working_dir} \
--cores ${cores_num}