Skip to content

Commit

Permalink
fix variable with melt(measure.vars=list), na.rm=T/F consistency (#4723)
Browse files Browse the repository at this point in the history
  • Loading branch information
tdhock authored May 7, 2021
1 parent 7b4bd7d commit ebc5bc3
Show file tree
Hide file tree
Showing 4 changed files with 11 additions and 15 deletions.
2 changes: 2 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,8 @@
12. `as.data.table(table(NULL))` now returns `data.table(NULL)` rather than error `attempt to set an attribute on NULL`, [#4179](https://github.com/Rdatatable/data.table/issues/4179). The result differs slightly to `as.data.frame(table(NULL))` (0-row, 1-column) because 0-column works better with other `data.table` functions like `rbindlist()`. Thanks to Michael Chirico for the report and fix.
13. `melt` with a list for `measure.vars` would output `variable` inconsistently between `na.rm=TRUE` and `FALSE`, [#4455](https://github.com/Rdatatable/data.table/issues/4455). Thanks to @tdhock for reporting and fixing.
## NOTES
1. New feature 29 in v1.12.4 (Oct 2019) introduced zero-copy coercion. Our thinking is that requiring you to get the type right in the case of `0` (type double) vs `0L` (type integer) is too inconvenient for you the user. So such coercions happen in `data.table` automatically without warning. Thanks to zero-copy coercion there is no speed penalty, even when calling `set()` many times in a loop, so there's no speed penalty to warn you about either. However, we believe that assigning a character value such as `"2"` into an integer column is more likely to be a user mistake that you would like to be warned about. The type difference (character vs integer) may be the only clue that you have selected the wrong column, or typed the wrong variable to be assigned to that column. For this reason we view character to numeric-like coercion differently and will warn about it. If it is correct, then the warning is intended to nudge you to wrap the RHS with `as.<type>()` so that it is clear to readers of your code that a coercion from character to that type is intended. For example :
Expand Down
7 changes: 5 additions & 2 deletions inst/tests/tests.Rraw
Original file line number Diff line number Diff line change
Expand Up @@ -3014,6 +3014,9 @@ test(1034, as.data.table(x<-as.character(sample(letters, 5))), data.table(V1=x))
error="Unknown 'id.vars' type raw")
test(1035.012, melt(DT, id.vars=1:3, measure.vars=as.raw(0)),
error="Unknown 'measure.vars' type raw")
test(1035.013, melt(data.table(a=1, b=1), id.vars=c(1,1)), data.table(a=1, a.1=1, variable=factor("b"), value=1))
test(1035.014, melt(data.table(a1=1, b1=1, b2=2), na.rm=TRUE, measure.vars=list(a="a1", b=c("b1","b2"))), data.table(variable=factor(1,c("1","2")), a=1, b=1))
test(1035.015, melt(data.table(a=1+2i, b=1), id.vars="a"), error="Unknown column type 'complex' for column 'a' in 'data'")

ans1 = cbind(DT[, c(1,2,8), with=FALSE], variable=factor("l_1"))
ans1[, value := DT$l_1]
Expand Down Expand Up @@ -3175,9 +3178,9 @@ Sep,33.5,19.4,15.7,11.9,0,100.8,100.8,0,12.7,12.7,0,174.1")
x[, c("y1","z1"):=NA]
test(1037.405, dim(melt(x, measure.vars=patterns("^y", "^z"))), INT(4,5))
test(1037.406, dim(ans<-melt(x, measure.vars=patterns("^y", "^z"), na.rm=TRUE)), INT(2,5))
test(1037.407, ans$variable, factor(c("1","1")))
test(1037.407, ans$variable, factor(c("2","2"), c("1", "2")))
test(1037.408, dim(ans<-melt(x, measure.vars=patterns("^y", "^z"), na.rm=TRUE, variable.factor=FALSE)), INT(2,5))
test(1037.409, ans$variable, c("1","1"))
test(1037.409, ans$variable, c("2","2"))

test(1037.410, melt(data.table(NULL), verbose=TRUE), data.table(NULL),
output="ncol(data) is 0. Nothing to melt")
Expand Down
2 changes: 1 addition & 1 deletion man/melt.data.table.Rd
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ non-measure columns will be assigned to it. If integer, must be positive; see De
}
For convenience/clarity in the case of multiple \code{melt}ed columns, resulting column names can be supplied as names to the elements \code{measure.vars} (in the \code{list} and \code{patterns} usages). See also \code{Examples}. }
\item{variable.name}{name for the measured variable names column. The default name is \code{'variable'}.}
\item{variable.name}{name (default \code{'variable'}) of output column containing information about which input column(s) were melted. If \code{measure.vars} is an integer/character vector, then each entry of this column contains the name of a melted column from \code{data}. If \code{measure.vars} is a list of integer/character vectors, then each entry of this column contains an integer indicating an index/position in each of those vectors.}
\item{value.name}{name for the molten data values column(s). The default name is \code{'value'}. Multiple names can be provided here for the case when \code{measure.vars} is a \code{list}, though note well that the names provided in \code{measure.vars} take precedence. }
\item{na.rm}{If \code{TRUE}, \code{NA} values will be removed from the molten
data.}
Expand Down
15 changes: 3 additions & 12 deletions src/fmelt.c
Original file line number Diff line number Diff line change
Expand Up @@ -538,19 +538,18 @@ SEXP getvarcols(SEXP DT, SEXP dtnames, Rboolean varfactor, Rboolean verbose, str
} else {
for (int j=0, ansloc=0, level=1; j<data->lmax; ++j) {
const int thislen = data->narm ? length(VECTOR_ELT(data->naidx, j)) : data->nrow;
if (thislen==0) continue; // so as not to bump level
char buff[20];
snprintf(buff, 20, "%d", level++);
SEXP str = PROTECT(mkChar(buff));
for (int k=0; k<thislen; ++k) SET_STRING_ELT(target, ansloc++, str);
UNPROTECT(1);
}
}
} else {
} else {// varfactor==TRUE
SET_VECTOR_ELT(ansvars, 0, target=allocVector(INTSXP, data->totlen));
SEXP levels;
int *td = INTEGER(target);
if (data->lvalues == 1) {
if (data->lvalues == 1) {//single output column.
SEXP thisvaluecols = VECTOR_ELT(data->valuecols, 0);
int len = length(thisvaluecols);
levels = PROTECT(allocVector(STRSXP, len)); protecti++;
Expand All @@ -573,24 +572,16 @@ SEXP getvarcols(SEXP DT, SEXP dtnames, Rboolean varfactor, Rboolean verbose, str
const int thislen = data->narm ? length(VECTOR_ELT(data->naidx, j)) : data->nrow;
for (int k=0; k<thislen; ++k) td[ansloc++] = md[j];
}
} else {
} else {//multiple output columns.
int nlevel=0;
levels = PROTECT(allocVector(STRSXP, data->lmax)); protecti++;
for (int j=0, ansloc=0; j<data->lmax; ++j) {
const int thislen = data->narm ? length(VECTOR_ELT(data->naidx, j)) : data->nrow;
if (thislen==0) continue; // so as not to bump level
char buff[20];
snprintf(buff, 20, "%d", nlevel+1);
SET_STRING_ELT(levels, nlevel++, mkChar(buff)); // generate levels = 1:nlevels
for (int k=0; k<thislen; ++k) td[ansloc++] = nlevel;
}
if (nlevel < data->lmax) {
// data->narm is true and there are some all-NA items causing at least one 'if (thislen==0) continue' above
// shrink the levels
SEXP newlevels = PROTECT(allocVector(STRSXP, nlevel)); protecti++;
for (int i=0; i<nlevel; ++i) SET_STRING_ELT(newlevels, i, STRING_ELT(levels, i));
levels = newlevels;
}
}
setAttrib(target, R_LevelsSymbol, levels);
setAttrib(target, R_ClassSymbol, ScalarString(char_factor));
Expand Down

0 comments on commit ebc5bc3

Please sign in to comment.