You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: NEWS.md
+2Lines changed: 2 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -8,6 +8,8 @@
8
8
9
9
1.`nafill()` now applies `fill=` to the front/back of the vector when `type="locf|nocb"`, [#3594](https://github.com/Rdatatable/data.table/issues/3594). Thanks to @ben519 for the feature request. It also now returns a named object based on the input names. Note that if you are considering joining and then using `nafill(...,type='locf|nocb')` afterwards, please review `roll=`/`rollends=` which should achieve the same result in one step more efficiently. `nafill()` is for when filling-while-joining (i.e. `roll=`/`rollends=`/`nomatch=`) cannot be applied.
10
10
11
+
2.`mean(na.rm=TRUE)` by group is now GForce optimized, [#4849](https://github.com/Rdatatable/data.table/issues/4849). Thanks to the [h2oai/db-benchmark](https://github.com/h2oai/db-benchmark) project for spotting this issue. The 1 billion row example in the issue shows 48s reduced to 14s. The optimization also applies to type `integer64` resulting in a difference to the `bit64::mean.integer64` method: `data.table` returns a `double` result whereas `bit64` rounds the mean to the nearest integer.
//Rprintf(_("gsum int took %.3f\n"), wallclock()-started);
402
401
if (overflow) {
403
402
UNPROTECT(1); // discard the result with overflow
404
-
if (warnOverflow) warning(_("The sum of an integer column for a group was more than type 'integer' can hold so the result has been coerced to 'numeric' automatically for convenience."));
403
+
warning(_("The sum of an integer column for a group was more than type 'integer' can hold so the result has been coerced to 'numeric' automatically for convenience."));
if (!isLogical(narm) ||LENGTH(narm)!=1||LOGICAL(narm)[0]==NA_LOGICAL) error(_("na.rm must be TRUE or FALSE"));
578
-
if (!isVectorAtomic(x)) error(_("GForce mean can only be applied to columns, not .SD or similar. Likely you're looking for 'DT[,lapply(.SD,mean),by=,.SDcols=]'. See ?data.table."));
579
574
if (inherits(x, "factor")) error(_("mean is not meaningful for factors."));
free(s); free(c); // # nocov because it already stops at gsum, remove nocov if gmean will support a type that gsum wont
650
-
error(_("Type '%s' not supported by GForce mean (gmean) na.rm=TRUE. Either add the prefix base::mean(.) or turn off GForce optimization using options(datatable.optimize=1)"), type2char(TYPEOF(x))); // # nocov
651
-
}
652
-
switch(TYPEOF(x)) {
653
-
caseLGLSXP: caseINTSXP: caseREALSXP: {
654
-
ans=PROTECT(allocVector(REALSXP, ngrp));
655
-
double*ansd=REAL(ans);
656
-
for (inti=0; i<ngrp; i++) {
657
-
if (c[i]==0) { ansd[i] =R_NaN; continue; } // NaN to follow base::mean
#pragma omp parallel for num_threads(getDTthreads(ngrp, true))
699
+
for (inti=0; i<ngrp; i++) {
700
+
ansp[i].r /= nna_counts_r[i];
701
+
ansp[i].i /= nna_counts_i[i];
702
+
}
703
+
free(nna_counts_r);
704
+
free(nna_counts_i);
671
705
}
672
706
} break;
673
707
default:
674
-
error(_("Internal error: unsupported type at the end of gmean")); // # nocov
708
+
error(_("Type '%s' not supported by GForce mean (gmean). Either add the prefix base::mean(.) or turn off GForce optimization using options(datatable.optimize=1)"), type2char(TYPEOF(x)));
675
709
}
676
-
free(s); free(si); free(c);
677
710
copyMostAttrib(x, ans);
678
-
// Rprintf(_("this gmean na.rm=TRUE took %8.3f\n"), 1.0*(clock()-start)/CLOCKS_PER_SEC);
679
-
UNPROTECT(1);
711
+
if (verbose) { Rprintf(_("%.3fs\n"), wallclock()-started); }
0 commit comments