Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

setDF deletes the index attribute #4893

Merged
merged 14 commits into from
Jul 27, 2021
2 changes: 1 addition & 1 deletion NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -149,7 +149,7 @@

26. `melt()` now outputs scalar logical `NA` instead of `NULL` in rows corresponding to missing list columns, for consistency with non-list columns when using `na.rm=TRUE`, [#5053](https://github.com/Rdatatable/data.table/pull/5053). Thanks to Toby Dylan Hocking for the PR.

27. `as.data.frame(DT)` now removes any indices in addition to removing any key, [#5042](https://github.com/Rdatatable/data.table/issues/5042). When indices were left intact, a subsequent subset or reorder of the `data.frame` would not update the indices since they are treated just like any other `data.frame` attribute, causing incorrect results if the result is later converted back to `data.table` again.
27. `as.data.frame(DT)`, `setDF(DT)` and `as.list(DT)` now remove the `"index"` attribute which contains any indices (a.k.a. secondary keys), as they already did for other `data.table`-only attributes such as the primary key stored in the `"sorted"` attribute. When indices were left intact, a subsequent subset or reorder of the `data.frame` by `data.frame`-code in base R or other packages would not update the indices, causing incorrect results if then converted back to `data.table`, [#4889](https://github.com/Rdatatable/data.table/issues/4889) [#5042](https://github.com/Rdatatable/data.table/issues/5042). Thanks @OfekShilon for the report and the PR.

## NOTES

Expand Down
4 changes: 3 additions & 1 deletion R/data.table.R
Original file line number Diff line number Diff line change
Expand Up @@ -2140,7 +2140,7 @@ as.data.frame.data.table = function(x, ...)
setattr(ans,"row.names",.set_row_names(nrow(x))) # since R 2.4.0, data.frames can have non-character row names
setattr(ans,"class","data.frame")
setattr(ans,"sorted",NULL) # remove so if you convert to df, do something, and convert back, it is not sorted
setattr(ans,"index",NULL) #5042
setattr(ans,"index",NULL) #4889 #5042
setattr(ans,".internal.selfref",NULL)
# leave tl intact, no harm,
ans
Expand All @@ -2157,6 +2157,7 @@ as.list.data.table = function(x, ...) {
setattr(ans, "class", NULL)
setattr(ans, "row.names", NULL)
setattr(ans, "sorted", NULL)
setattr(ans, "index", NULL) #4889 #5042
setattr(ans,".internal.selfref", NULL) # needed to pass S4 tests for example
ans
}
Expand Down Expand Up @@ -2716,6 +2717,7 @@ setDF = function(x, rownames=NULL) {
setattr(x, "row.names", rn)
setattr(x, "class", "data.frame")
setattr(x, "sorted", NULL)
setattr(x, "index", NULL) #4889 #5042
setattr(x, ".internal.selfref", NULL)
} else if (is.data.frame(x)) {
if (!is.null(rownames)) {
Expand Down
8 changes: 8 additions & 0 deletions inst/tests/tests.Rraw
Original file line number Diff line number Diff line change
Expand Up @@ -17802,3 +17802,11 @@ test(2199.1, as.data.table(as.list(1:2))[, .SD,.SDcols=(-1L)], data.table(V2=2
test(2199.2, as.data.table(as.list(1:2))[, .SD,.SDcols=(-(1L))], data.table(V2=2L))
test(2199.3, as.data.table(as.list(1:3))[, .SD,.SDcols=(-1L)], data.table(V2=2L, V3=3L))
test(2199.4, data.table(V1=-1L, V2=-2L, V3=-3L)[,.SD,.SDcols=-V2:-V1], error="not found")

# setDF now drops index attributes, #4889
d = data.table(a=1:100, b=1:100)
setindex(d, a)
setDF(d)
d[1:50, "a"] = d[51:100, "a"]
setDT(d)
test(2200, nrow(d[a==99]), 2L)
2 changes: 1 addition & 1 deletion man/setDF.Rd
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ setDF(x, rownames=NULL)
}

\details{
All \code{data.table} attributes including any keys of the input data.table are stripped off.
All \code{data.table} attributes including any keys and indices of the input data.table are stripped off.

When using \code{rownames}, recall that the row names of a \code{data.frame} must be unique. By default, the assigned set of row names is simply the sequence 1, \ldots, \code{nrow(x)} (or \code{length(x)} for \code{list}s).
}
Expand Down
6 changes: 3 additions & 3 deletions src/utils.c
Original file line number Diff line number Diff line change
Expand Up @@ -381,11 +381,11 @@ SEXP coerceAs(SEXP x, SEXP as, SEXP copyArg) {
#include <zlib.h>
#endif
SEXP dt_zlib_version() {
char out[51];
char out[71];
#ifndef NOZLIB
snprintf(out, 50, "zlibVersion()==%s ZLIB_VERSION==%s", zlibVersion(), ZLIB_VERSION);
snprintf(out, 70, "zlibVersion()==%s ZLIB_VERSION==%s", zlibVersion(), ZLIB_VERSION);
MichaelChirico marked this conversation as resolved.
Show resolved Hide resolved
#else
snprintf(out, 50, _("zlib header files were not found when data.table was compiled"));
snprintf(out, 70, _("zlib header files were not found when data.table was compiled"));
#endif
return ScalarString(mkChar(out));
}