Fix sample name mangling in tximport #9407
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Fixes nf-core/rnaseq#1445
This PR fixes an issue where R's
data.frame()function was automatically modifying sample names, causing downstream errors when trying to match sample IDs between count matrices and samplesheet metadata.The Problem
R's
data.frame()automatically modifies column/row names whencheck.names=TRUE(the default):1A2→X1A2D10-D→D10.DThis caused the
SUMMARIZEDEXPERIMENTprocess to fail with:Root Cause
While PR #6638 partially fixed this by adding
check.names = FALSEto thebuild_table()function, it missed three additional locations wheredata.frame()andread.csv()calls were made without this parameter.The most critical one was at line 134 where
coldatais created - this directly sets the sample names that become column names in all output matrices.Changes Made
Added
check.names = FALSEto three function calls intximport.r:read.csv()when reading transcript infodata.frame()when creating extra transcript info rowsdata.frame()when creating coldata (main bug fix)Testing
This fix ensures that sample names are preserved exactly as provided in the input, preventing mismatches downstream. Users can now safely use:
1A2,5B2)sample-1,D10-D)PR checklist
nf-test test path/to/test.nf.test)docs/usage.mdis updateddocs/output.mdis updatedCHANGELOG.mdis updatedREADME.mdis updated (including new tool citations and authors/contributors)