Skip to content

Hyphenated sample names causes downstream error #1364

@mniederhuber

Description

@mniederhuber

Description of the bug

Ran into an error with the summarized experiment process

Process `NFCORE_RNASEQ:RNASEQ:QUANTIFY_STAR_SALMON:SE_TRANSCRIPT (all_samples)`

Error message is from R:

Error in findColumnWithAllEntries(ids, metadata) : 
No column contains all vector entries

Tracked it down to the parse_metadata function in the R script.

metadata_id_col <- findColumnWithAllEntries(ids, metadata)

I had used hyphens in my sample names, but the ids passed to findColumnWithAllEntries have all the hyphens replaced with '.'
eg. "D10-D_Na-R1" becomes "D10.D_Na.R1"

Looks like this is happening with the output from salmon, the column names from the salmon.merged.transcript_counts.tsv, which are used to set the ids variable in the Rscript, have the incorrect sample names.

Easy fix to just correct the names in the sample sheet.

But it might be useful to add to another check when initially parsing the sample sheet to catch this right out of the gate.

Command used and terminal output

#!/bin/bash
#SBATCH --job-name=fashe
#SBATCH -p barc
#SBATCH -t 12:00:00
#SBATCH --mem=8G
#SBATCH -o log/rna-%j.out
#SBATCH -e log/rna-%j.err

if [ ! -d log ]; then
    mkdir log
fi

module load nextflow

# using the dev branch because of gzip bug that's been fixed
nextflow run nf-core/rnaseq \
    -profile unc_longleaf \
    -params-file conf/rnaseq_params.yaml \
    -r dev

Relevant files

No response

System information

Nextflow 24.04.2
HPC
Slurm
Singularity
Rhel8
nf-core/rnaseq dev branch

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions