-
Notifications
You must be signed in to change notification settings - Fork 982
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multibytes colnames in non-native encoding cause errors in complex j expression with keyby #3722
Comments
Works on my Mac and Linux, I guess it's a Windows thing? |
An example that works on Mac or Linux, which uses the latin-1 encoding: library(data.table)
utf8 = c("\u00e7ile", "\u00de")
latin1 = iconv(utf8, from = "UTF-8", to = "latin1")
tbl <- as.data.table(setNames(list(1, 2), utf8))
tbl[]
#> çile Þ
#> 1: 1 2
Encoding(colnames(tbl))
#> [1] "UTF-8" "UTF-8"
tbl[, .(a = sum(`çile`)), keyby = `Þ`]
#> Þ a
#> 1: 2 1
tbl[, .(a = sum(sort(`çile`))), keyby = `Þ`]
#> Þ a
#> 1: 2 1
setnames(tbl, colnames(tbl), latin1)
Encoding(colnames(tbl))
#> [1] "latin1" "latin1"
tbl[, .(a = sum(`çile`)), keyby = `Þ`]
#> Þ a
#> 1: 2 1
tbl[, .(a = sum(sort(`çile`))), keyby = `Þ`]
#> Error in sort(çile): object 'çile' not found Created on 2019-08-04 by the reprex package (v0.2.1) |
The error is thrown from Line 258 in a8e0230
I don't know how to debug C code that involves R's language or environment type. Fail to find them on R-internals or R-ext... If anybody can share me how to print out the info related to those objects, it would be very much appreciated. |
@shrektan maybe just modify |
Possibly related (though I think not): #1726 |
Difference on these two is
Are you sure |
The Mac/Linux example is also not working for me (Mac) @shrektan along Jan's suggestion, try running this?
|
So you are saying you can't reproduce the blow code on macOS? library(data.table)
utf8 = c("\u00e7ile", "\u00de")
latin1 = iconv(utf8, from = "UTF-8", to = "latin1")
tbl <- as.data.table(setNames(list(1, 2), latin1))
tbl[, .(a = sum(`çile`)), keyby = `Þ`]
#> Þ a
#> 1: 2 1
tbl[, .(a = sum(sort(`çile`))), keyby = `Þ`]
#> Error in sort(çile): object 'çile' not found
tbl[, .(a = {print(ls()); print(names(.SD)); print(Encoding(names(.SD))); print(.BY); sum(sort(`çile`))}), keyby = `Þ`]
#> [1] "\xe7ile" "Cfastmean" "print" "strptime" "Þ"
#> [1] "çile"
#> [1] "latin1"
#> $Þ
#> [1] 2
#> Error in sort(çile): object 'çile' not found Created on 2019-09-06 by the reprex package (v0.3.0) Session infodevtools::session_info()
#> ─ Session info ──────────────────────────────────────────────────────────
#> setting value
#> version R version 3.6.1 (2019-07-05)
#> os macOS Mojave 10.14.5
#> system x86_64, darwin15.6.0
#> ui X11
#> language (EN)
#> collate en_US.UTF-8
#> ctype en_US.UTF-8
#> tz Asia/Shanghai
#> date 2019-09-06
#>
#> ─ Packages ──────────────────────────────────────────────────────────────
#> package * version date lib source
#> assertthat 0.2.1 2019-03-21 [1] CRAN (R 3.6.0)
#> backports 1.1.4 2019-04-10 [1] CRAN (R 3.6.0)
#> callr 3.3.1 2019-07-18 [1] CRAN (R 3.6.0)
#> cli 1.1.0 2019-03-19 [1] CRAN (R 3.6.0)
#> crayon 1.3.4 2017-09-16 [1] CRAN (R 3.6.0)
#> data.table * 1.12.3 2019-08-24 [1] local
#> desc 1.2.0 2018-05-01 [1] CRAN (R 3.6.0)
#> devtools 2.1.0 2019-07-06 [1] CRAN (R 3.6.0)
#> digest 0.6.20 2019-07-04 [1] CRAN (R 3.6.0)
#> evaluate 0.14 2019-05-28 [1] CRAN (R 3.6.0)
#> fs 1.3.1 2019-05-06 [1] CRAN (R 3.6.0)
#> glue 1.3.1 2019-03-12 [1] CRAN (R 3.6.0)
#> highr 0.8 2019-03-20 [1] CRAN (R 3.6.0)
#> htmltools 0.3.6 2017-04-28 [1] CRAN (R 3.6.0)
#> knitr 1.24 2019-08-08 [1] CRAN (R 3.6.0)
#> magrittr 1.5 2014-11-22 [1] CRAN (R 3.6.0)
#> memoise 1.1.0 2017-04-21 [1] CRAN (R 3.6.0)
#> pkgbuild 1.0.4 2019-08-05 [1] CRAN (R 3.6.0)
#> pkgload 1.0.2 2018-10-29 [1] CRAN (R 3.6.0)
#> prettyunits 1.0.2 2015-07-13 [1] CRAN (R 3.6.0)
#> processx 3.4.1 2019-07-18 [1] CRAN (R 3.6.0)
#> ps 1.3.0 2018-12-21 [1] CRAN (R 3.6.0)
#> R6 2.4.0 2019-02-14 [1] CRAN (R 3.6.0)
#> Rcpp 1.0.2 2019-07-25 [1] CRAN (R 3.6.0)
#> remotes 2.1.0 2019-06-24 [1] CRAN (R 3.6.0)
#> rlang 0.4.0 2019-06-25 [1] CRAN (R 3.6.0)
#> rmarkdown 1.15 2019-08-21 [1] CRAN (R 3.6.0)
#> rprojroot 1.3-2 2018-01-03 [1] CRAN (R 3.6.0)
#> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.6.0)
#> stringi 1.4.3 2019-03-12 [1] CRAN (R 3.6.0)
#> stringr 1.4.0 2019-02-10 [1] CRAN (R 3.6.0)
#> testthat 2.2.1 2019-07-25 [1] CRAN (R 3.6.0)
#> usethis 1.5.1 2019-07-04 [1] CRAN (R 3.6.0)
#> withr 2.1.2 2018-03-15 [1] CRAN (R 3.6.0)
#> xfun 0.9 2019-08-21 [1] CRAN (R 3.6.0)
#> yaml 2.2.0 2018-07-25 [1] CRAN (R 3.6.0)
#>
#> [1] /Library/Frameworks/R.framework/Versions/3.6/Resources/library |
I see, the first I think R will always parse the quotation into the parse tree with native encoding (see the code below). If the column name of the data.table is not native encoded and is evaluated via env <- new.env(parent = emptyenv())
utf8 = c("\u00e7ile", "\u00de")
latin1 = iconv(utf8, from = "UTF-8", to = "latin1")
assign(latin1[1], 1, pos = env)
ls(env)
#> [1] "çile"
Encoding(ls(env))
#> [1] "unknown"
assign(utf8[1], 1, pos = env)
ls(env)
#> [1] "çile"
Encoding(ls(env))
#> [1] "unknown" Created on 2019-09-06 by the reprex package (v0.3.0) In conclusion, the fix should be when preparing the |
The below simple example illustrates the issue quite well. I believe it's a specific issue on Windows.
session info
> sessionInfo() R version 3.4.4 (2018-03-15) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 7 x64 (build 7601) Service Pack 1Matrix products: default
locale:
[1] LC_COLLATE=Chinese (Simplified)_People's Republic of China.936
[2] LC_CTYPE=Chinese (Simplified)_People's Republic of China.936
[3] LC_MONETARY=Chinese (Simplified)_People's Republic of China.936
[4] LC_NUMERIC=C
[5] LC_TIME=Chinese (Simplified)_People's Republic of China.936
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] data.table_1.12.3 usethis_1.4.0
loaded via a namespace (and not attached):
[1] compiler_3.4.4 tools_3.4.4 fs_1.2.7 glue_1.3.1 yaml_2.2.0 Rcpp_1.0.1
The text was updated successfully, but these errors were encountered: