Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support missing values in measure.vars arg to melt #4720

Merged
merged 15 commits into from
May 9, 2021
Merged

Conversation

tdhock
Copy link
Member

@tdhock tdhock commented Sep 26, 2020

Closes #4027
hi @vspinu @Henrik-P @jangorecki @mrdwab tagging you because you commented on the related issue #4027 which I hope this PR will close. In current master I get these errors because NA is unsupported in the measure.vars argument to melt:

> DT.wide = data.table(a2=2, b1=1, b2=2)
> melt(DT.wide, measure.vars=list(a=c(NA,1), b=2:3))
Error in melt.data.table(DT.wide, measure.vars = list(a = c(NA, 1), b = 2:3)) : 
  One or more values in 'measure.vars' is invalid.
> melt(DT.wide, measure.vars=list(a=c(NA,"a2"), b=c("b1","b2")))
Error in melt.data.table(DT.wide, measure.vars = list(a = c(NA, "a2"),  : 
  One or more values in 'measure.vars' is invalid.

I added a couple of simple tests which explain the new behavior implemented in this PR:

> DT.wide = data.table(a2=2, b1=1, b2=2)
> melt(DT.wide, measure.vars=list(a=c(NA,1), b=2:3))
   variable  a b
1:        1 NA 1
2:        2  2 2
> melt(DT.wide, measure.vars=list(a=c(NA,"a2"), b=c("b1","b2")))
   variable  a b
1:        1 NA 1
2:        2  2 2

It took me a while to read through and understand the fmelt code, and I added some comments which explain the meaning of the different variables/functions. If those comments are not welcome I can delete them.
All tests pass on my computer (Ubuntu 18.04, R-4.0.2).
@MichaelChirico can you code review and maybe suggest additional tests?

@tdhock
Copy link
Member Author

tdhock commented Sep 26, 2020

EDIT: was a reminder to implement the same logic for id.vars not null, now done.

@tdhock
Copy link
Member Author

tdhock commented Sep 26, 2020

by the way did you notice the funny coincidence that the original issue number #4027 anagrams to this PR number #4720, how often does that happen!?

@codecov
Copy link

codecov bot commented Sep 26, 2020

Codecov Report

Merging #4720 (1f50aaf) into master (ebc5bc3) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #4720   +/-   ##
=======================================
  Coverage   99.43%   99.44%           
=======================================
  Files          73       73           
  Lines       14460    14469    +9     
=======================================
+ Hits        14379    14388    +9     
  Misses         81       81           
Impacted Files Coverage Δ
src/chmatch.c 100.00% <100.00%> (ø)
src/fmelt.c 99.43% <100.00%> (+<0.01%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ebc5bc3...1f50aaf. Read the comment docs.

src/fmelt.c Outdated Show resolved Hide resolved
NEWS.md Outdated Show resolved Hide resolved
@vspinu
Copy link

vspinu commented Sep 30, 2020

@tdhock I don't see how this PR fixes #4027. Could you please use it with the original example?

@tdhock
Copy link
Member Author

tdhock commented Sep 30, 2020

remotes::install_github("Rdatatable/data.table@fix4027")
#> Skipping install of 'data.table' from a github remote, the SHA1 (4c5810c2) has not changed since last install.
#>   Use `force = TRUE` to force installation
library(data.table)
dt <- data.table(id = 1, a.1 = 1, a.3 = 3, b.1 = 1, b.2 = 2, b.3 = 3)
melt(
  dt, variable.name="ix",
  measure.vars=list(a=c("a.1", NA, "a.3"), b=c("b.1", "b.2", "b.3")))
#>    id ix  a b
#> 1:  1  1  1 1
#> 2:  1  2 NA 2
#> 3:  1  3  3 3

@tdhock
Copy link
Member Author

tdhock commented Oct 3, 2020

should the new tests be moved to the "melt section" starting at test number 1035?

@tdhock tdhock added the reshape dcast melt label Oct 3, 2020
@mattdowle mattdowle added this to the 1.14.1 milestone May 7, 2021
@@ -105,7 +105,7 @@ SEXP measurelist(SEXP measure, SEXP dtnames) {
SEXP x = VECTOR_ELT(measure, i);
switch(TYPEOF(x)) {
case STRSXP :
SET_VECTOR_ELT(ans, i, chmatch(x, dtnames, 0));
SET_VECTOR_ELT(ans, i, chmatch(x, dtnames, NA_INTEGER));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new chmatch_na function in this PR that I just removed I initially thought was necessary because this PR needed NAs in x not to match to NAs in table. So I started to add bool matchNAtoNA argument to chmatchMain. Then it turned out that just passing nomatch=NA_INTEGER on this line (the only line where chmatch_na was used) instead of nomatch=0, seemed to work and passed the new tests. I left matchNAtoNA as a comment in chmatchMain in case we need to come back to that in future.
@tdhock Please check and let me know if I got anything wrong here. If so, a new test is probably needed please.

UNPROTECT(protecti);
return(ans);
}

struct processData {
SEXP RCHK; // a 2 item list holding vars (result of checkVars) and naidx. PROTECTed up in fmelt so that preprocess() doesn't need to PROTECT. To pass rchk, #2865
SEXP idcols, valuecols, naidx; // convenience pointers into RCHK[0][0], RCHK[0][1] and RCHK[1] respectively
int lids, lvalues, lmax, lmin, totlen, nrow;
Copy link
Member

@mattdowle mattdowle May 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that one item (lmin) has been removed from processData. But it looks like you've reviewed the items and commented them which is great (so you know this code better than I do). And no existing tests are changed, just new tests added. So I'm just logging as a comment that I spotted that removed item, and why it looks good to me.

…protect from i,j being used accidentally outside those loops, not really for speed reasons)
@mattdowle mattdowle merged commit ebc14ce into master May 9, 2021
@mattdowle mattdowle deleted the fix4027 branch May 9, 2021 06:09
@jangorecki jangorecki modified the milestones: 1.14.9, 1.15.0 Oct 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
reshape dcast melt
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ability to explicitly pass index names in melt.data.table measure.vars argument
4 participants