Skip to content

Test data to be used for automated testing with the nf-core pipelines

License

Notifications You must be signed in to change notification settings

nf-core/test-datasets

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

nfcore/test-datasets

Test data to be used for automated testing with the nf-core pipelines

This branch contains test data to be used for automated testing with the nf-core/fastquorum pipeline.

Content of this repository

references/: genome reference and auxiliary files for Homo sapiens assembly hg38 for chromosome 17.

`testdata/fastqs: raw FASTQs for Illumina paired-end duplex-sequencing experiments.

The full contents are shown below:

.
├── CITATION.cff
├── LICENSE
├── README.md
├── docs
│   ├── ADD_NEW_DATA.md
│   ├── USE_EXISTING_DATA.md
│   └── images
│       ├── test-datasets_logo.png
│       └── test-datasets_logo.svg
├── references
│   ├── chr17.dict
│   ├── chr17.fa
│   ├── chr17.fa.amb
│   ├── chr17.fa.ann
│   ├── chr17.fa.bwt
│   ├── chr17.fa.fai
│   ├── chr17.fa.pac
│   └── chr17.fa.sa
└── testdata
    ├── fastqs
    │   ├── full
    │   │   ├── SRR6109255_1.fastq.gz
    │   │   ├── SRR6109255_2.fastq.gz
    │   │   ├── SRR6109273_1.fastq.gz
    │   │   └── SRR6109273_2.fastq.gz
    │   └── tiny
    │       ├── SRR6109255_1.fastq.gz
    │       ├── SRR6109255_2.fastq.gz
    │       ├── SRR6109255_3.fastq.gz
    │       ├── SRR6109255_4.fastq.gz
    │       └── lanes
    │           ├── SRR6109255_S1_L001_R1_001.fastq.gz
    │           ├── SRR6109255_S1_L001_R2_001.fastq.gz
    │           ├── SRR6109255_S1_L002_R1_001.fastq.gz
    │           ├── SRR6109255_S1_L002_R2_001.fastq.gz
    │           ├── SRR6109255_S1_L003_R1_001.fastq.gz
    │           └── SRR6109255_S1_L003_R2_001.fastq.gz
    └── samplesheets
        ├── samplesheet.full.csv
        ├── samplesheet.multi_fastq.csv
        ├── samplesheet.multi_lanes.csv
        ├── samplesheet.single_fastq.csv
        └── samplesheet.tiny.csv

Sample Information

Run Accession Experiment Accession Experiment Title Citation
SRR6109255 SRX3224128 Illumina MiSeq sequencing: CRISPR-DS Sequencing of TP53 [1]
SRR6109273 SRX3224110 Illumina MiSeq sequencing: CRISPR-DS Sequencing of TP53 [1]

Citations:

  1. Nachmanson D, Lian S, Schmidt EK, Hipp MJ, Baker KT, Zhang Y, Tretiakova M, Loubet-Senear K, Kohrn BF, Salk JJ, Kennedy SR, Risques RA. Targeted genome fragmentation with CRISPR/Cas9 enables fast and efficient enrichment of small genomic regions and ultra-accurate sequencing with low DNA input (CRISPR-DS). Genome Res. 2018 Oct;28(10):1589-1599. doi: 10.1101/gr.235291.118. Epub 2018 Sep 19. PMID: 30232196; PMCID: PMC6169890.

For SRR6109255, we sub-sampled the reads to fit under the GitHub 100MB limit:

seqtk sample -s 42 SRR6109255_1.fastq.gz 0.65 | gzip -c > full/SRR6109255_1.fastq.gz
seqtk sample -s 42 SRR6109255_2.fastq.gz 0.65 | gzip -c > full/SRR6109255_2.fastq.gz

We also sub-sampled SRR6109255 to create a tiny dataset for rapid testing:

seqtk sample -s 42 SRR6109255_1.fastq.gz 0.01 | gzip -c > tiny/SRR6109255_1.fastq.gz
seqtk sample -s 42 SRR6109255_2.fastq.gz 0.01 | gzip -c > tiny/SRR6109255_2.fastq.gz

To simulate sample barcode (index) reads (e.g. index1/i7 and index2/i5), a FASTQ of only the UMI sequences was generated by slicing the first 10bp from the sequencing data in the R1 FASTQ file using fastx:

gunzip -dc testdata/duplex-seq/tiny/SRR6109255_1.fastq.gz | fastx_trimmer -f 1 -l 10 -z -o testdata/duplex-seq/tiny/SRR6109255_3.fastq.gz
ln -s SRR6109255_3.fastq.gz SRR6109255_4.fastq.gz

To simulate multiple lanes, seqkit split2 --by-part was used.

tool version
fastx_toolkit 0.0.14
seqtk 1.4-r122
seqkit 2.8.1

Reference Information

We used the following commands to prepare auxiliary reference information:

samtools faidx chr17.fa
samtools dict -u https://hgdownload.soe.ucsc.edu/goldenPath/hg38/chromosomes/chr17.fa.gz -a hg38 -s "Homo sapiens" chr17.fa > chr17.dict
bwa index chr17.fa
tool version
samtools 1.17
bwa 0.7.17-r1188