Fix sample name mangling in tximport #9407

pinin4fjords · 2025-11-14T14:54:23Z

Description

This PR fixes an issue where R's data.frame() function was automatically modifying sample names, causing downstream errors when trying to match sample IDs between count matrices and samplesheet metadata.

The Problem

R's data.frame() automatically modifies column/row names when check.names=TRUE (the default):

Sample names starting with numbers get an "X" prepended: 1A2 → X1A2
Hyphens get converted to dots: D10-D → D10.D
Other special characters are also modified

This caused the SUMMARIZEDEXPERIMENT process to fail with:

Error in findColumnWithAllEntries(ids, metadata) : 
  No column contains all vector entries

Root Cause

While PR #6638 partially fixed this by adding check.names = FALSE to the build_table() function, it missed three additional locations where data.frame() and read.csv() calls were made without this parameter.

The most critical one was at line 134 where coldata is created - this directly sets the sample names that become column names in all output matrices.

Changes Made

Added check.names = FALSE to three function calls in tximport.r:

Line 76: read.csv() when reading transcript info
Line 79: data.frame() when creating extra transcript info rows
Line 134: data.frame() when creating coldata (main bug fix)

Testing

This fix ensures that sample names are preserved exactly as provided in the input, preventing mismatches downstream. Users can now safely use:

Sample names starting with numbers (e.g., 1A2, 5B2)
Sample names with hyphens (e.g., sample-1, D10-D)
Any other valid sample name format

PR checklist

This comment contains a description of changes (with reason)
If you've fixed a bug or added code that should be tested, add tests!
Ensure the test suite passes (nf-test test path/to/test.nf.test)
Usage Documentation in docs/usage.md is updated
Output Documentation in docs/output.md is updated
CHANGELOG.md is updated
README.md is updated (including new tool citations and authors/contributors)

Addresses nf-core/rnaseq#1445 R's data.frame() function automatically modifies column names when check.names=TRUE (the default), which causes issues with sample names that: - Start with numbers (prepends "X": "1A2" -> "X1A2") - Contain special characters like hyphens (converts to dots: "D10-D" -> "D10.D") This caused the downstream summarizedexperiment script to fail when trying to match sample IDs from count matrices against the samplesheet metadata, as the names no longer matched. PR #6638 partially fixed this issue by adding check.names=FALSE to the build_table() function, but missed three additional data.frame() and read.csv() calls that also needed this parameter. This commit adds check.names=FALSE to: 1. Line 76: read.csv() when reading transcript info 2. Line 79: data.frame() when creating extra transcript info rows 3. Line 134: data.frame() when creating coldata (the main bug) The coldata fix (line 134) is the most critical as it directly affects sample names that become column names in the output matrices.

JoseEspinosa

LGTM

pinin4fjords · 2025-11-14T14:57:38Z

Thanks @JoseEspinosa !

SPPearce

Is this actually going to fix the problem downstream?
Any other R package used is likely to fail in this manner.

pinin4fjords · 2025-11-14T15:00:37Z

Is this actually going to fix the problem downstream? Any other R package used is likely to fail in this manner.

I've already spent some time nobbling other instances, hoping this will catch the remainder

JoseEspinosa approved these changes Nov 14, 2025

View reviewed changes

pinin4fjords added this pull request to the merge queue Nov 14, 2025

SPPearce approved these changes Nov 14, 2025

View reviewed changes

Merged via the queue into master with commit d205ebc Nov 14, 2025
14 checks passed

pinin4fjords deleted the fix-tximport-sample-names branch November 14, 2025 15:00

This was referenced Nov 14, 2025

Update tximeta/tximport module to fix sample name mangling nf-core/rnaseq#1622

Closed

Update tximeta/tximport module to fix sample name mangling nf-core/rnaseq#1623

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix sample name mangling in tximport #9407

Fix sample name mangling in tximport #9407

Uh oh!

pinin4fjords commented Nov 14, 2025

Uh oh!

JoseEspinosa left a comment

Uh oh!

pinin4fjords commented Nov 14, 2025

Uh oh!

SPPearce left a comment

Uh oh!

Uh oh!

pinin4fjords commented Nov 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Fix sample name mangling in tximport #9407

Fix sample name mangling in tximport #9407

Uh oh!

Conversation

pinin4fjords commented Nov 14, 2025

Description

The Problem

Root Cause

Changes Made

Testing

PR checklist

Uh oh!

JoseEspinosa left a comment

Choose a reason for hiding this comment

Uh oh!

pinin4fjords commented Nov 14, 2025

Uh oh!

SPPearce left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pinin4fjords commented Nov 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants