-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
problem with uncode() #273
Comments
Can you provide a sample dataset / query / function call where this is an issue? The current version of |
I think this mostly works... but I see what you mean about if you library(soilDB)
x <- SDA_query("SELECT DISTINCT nhiforsoigrp FROM mapunit")
#> single result set, returning a data.frame
x
#> nhiforsoigrp
#> 1 Group IB
#> 2 Group IIB
#> 3 Group IA
#> 4 <NA>
#> 5 Group IIA
#> 6 Group IC
#> 7 NC
x2 <- code(x)
x2
#> nhiforsoigrp
#> 1 2
#> 2 5
#> 3 1
#> 4 NA
#> 5 4
#> 6 3
#> 7 6
x3 <- uncode(x2)
x3
#> nhiforsoigrp
#> 1 IB
#> 2 IIB
#> 3 IA
#> 4 <NA>
#> 5 IIA
#> 6 IC
#> 7 NC |
test <- SDA_query("SELECT TOP 10000 * FROM mapunit WHERE nhiforsoigrp IS NOT NULL") |
I think you are right that the logic should be simplified at least for the I took a crack at it and it seems to work for the following case (a mix of mapunits from NH and MA). Before change I get: library(soilDB)
test <- SDA_query("SELECT TOP 10000 * FROM mapunit
INNER JOIN legend ON legend.lkey = mapunit.lkey
WHERE areasymbol LIKE 'NH%' OR areasymbol LIKE 'MA%'")
#> single result set, returning a data.frame
sum(is.na(test$nhiforsoigrp))
#> [1] 1582
test2 <- uncode(test)
#> Warning in `[<-.factor`(`*tmp*`, is.na(nc), value = structure(c(NA, NA, :
#> invalid factor level, NA generated
table(test2$nhiforsoigrp, useNA = "ifany")
#>
#> NC <NA>
#> 181 3273 After a change (c6f0f56) I get: library(soilDB)
test <- SDA_query("SELECT TOP 10000 * FROM mapunit
INNER JOIN legend ON legend.lkey = mapunit.lkey
WHERE areasymbol LIKE 'NH%' OR areasymbol LIKE 'MA%'")
#> single result set, returning a data.frame
sum(is.na(test$nhiforsoigrp))
#> [1] 1582
test2 <- uncode(test)
table(test2$nhiforsoigrp, useNA = "ifany")
#>
#> Group IA Group IB Group IC Group IIA Group IIB NC <NA>
#> 857 268 210 192 164 181 1582 I'll leave this issue open, please make/suggest any additional changes you see fit as long as existing tests of uncode/code pass. If you have certain things you want to become the normal expectation for these functions, add to tests/test-uncode.R. I think the main thing is we wanted out of the recent changes (source of this problem) was for |
- do not omit NA values - properly order on ChoiceSequence when domain is ranked and ordered=TRUE
This issue also alerted me to a problem with some of the new functions to help with explicitly setting factor levels. Again the overlap between ChoiceName and ChoiceLabel causes issues, thanks for bringing this up. Before change: library(soilDB)
test <- SDA_query("SELECT TOP 10000 * FROM mapunit WHERE nhiforsoigrp IS NOT NULL")
#> single result set, returning a data.frame
x1 <- factor(test$nhiforsoigrp, levels = get_NASIS_column_metadata("nhiforsoigrp")$ChoiceLabel)
x2 <- NASISChoiceList(list(nhiforsoigrp=test$nhiforsoigrp), choice = "ChoiceLabel")
#> Error in na.omit(as.numeric(apply(do.call("cbind", lapply(y[c("ChoiceValue", : 'list' object cannot be coerced to type 'double'
all.equal(x1, x2)
#> Error in all.equal.factor(x1, x2): object 'x2' not found After change (2985b37): library(soilDB)
test <- SDA_query("SELECT TOP 10000 * FROM mapunit WHERE nhiforsoigrp IS NOT NULL")
#> single result set, returning a data.frame
x1 <- factor(test$nhiforsoigrp, levels = get_NASIS_column_metadata("nhiforsoigrp")$ChoiceLabel)
x2 <- NASISChoiceList(list(nhiforsoigrp=test$nhiforsoigrp), choice = "ChoiceLabel")
all.equal(x1, x2)
#> [1] TRUE |
I am going to close this issue since I think the problems that were found have been fixed for some time. Create a new issue if more bugs or unexpected results are found. |
The following line doesn't work when the ChoiceName != ChoiceLabel (e.g. nhiforsoigrp column in the mapunit table), which is the case for ~25% of the metadata lookup table. In these instances, line 118 converts the values to NA because they don't match the levels in the other factor. This is complicated by the fact that SOME but not ALL of factor levels maybe different, thus the preceding if statements logic overwrites SOME but not ALL of them to NA. I haven't fully determined what the best fix would be, but wanted to log this before I logged off for the day. Values from SDA and GDB (need to check LIMS) appear to return values coded as ChoiceLabel, thus it may not be necessary to use an if statement on line 116.
Cheers
The text was updated successfully, but these errors were encountered: