Maintains Read Pairs: Always keeps or discards both R1 & R2 reads together to maintain pairing.
Configurable Subsampling: You can specify any subsampling fraction from 1% to 100%.
Random Seed Control: Allows setting a specific random seed for reproducible results.
Efficient Processing: Reads and writes FASTQ records in blocks of 4 lines.
Error Handling: Checks for file opening errors & unpaired reads at the end of files.
Verbose Output: Optionally reports statistics about the subsampling process.
Compilation:
You'll need to link with zlib:
gcc -o subsample_paired_fastq subsample_paired_fastq.c -lz
Usage:
./subsample_paired_fastq -a input_R1.fastq.gz -b input_R2.fastq.gz -x output_R1.fastq -y output_R2.fastq -f 10 -z
Compilation:
You'll need to link with zlib:
gcc -o subsample_single_end_fastq.c -o subsample_single_end_fastq -lz
Usage:
# Subsample 20% with gzip-compressed output and verbose logs
./subsample_single_end_fastq -i input.fastq -o subsampled.fastq.gz -f 20 -z -v
Handles your gzipped input files
Gives you a clear error if the files are wrongly paired
Allows compressed output with -z
Provides more detailed information about any issues