Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

file path issues when testing on windows #6116

Open
tdhock opened this issue May 1, 2024 · 9 comments
Open

file path issues when testing on windows #6116

tdhock opened this issue May 1, 2024 · 9 comments

Comments

@tdhock
Copy link
Member

tdhock commented May 1, 2024

Hi! I expected tests should pass on windows, but I am observing two failures, related to #1668 (same test numbers, not same failure)

th798@cmp2986 MINGW64 ~/R/data.table (fread-key-length-0)
$ R -e 'library(data.table);test.data.table()'

R Under development (unstable) (2024-01-23 r85822 ucrt) -- "Unsuffered Consequences"
Copyright (C) 2024 The R Foundation for Statistical Computing
Platform: x86_64-w64-mingw32/x64

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

Loading required namespace: BiocManager
> library(data.table);test.data.table()
getDTthreads(verbose=TRUE):
  OpenMP version (_OPENMP)       201511
  omp_get_num_procs()            12
  R_DATATABLE_NUM_PROCS_PERCENT  unset (default 50)
  R_DATATABLE_NUM_THREADS        unset
  R_DATATABLE_THROTTLE           unset (default 1024)
  omp_get_thread_limit()         2147483647
  omp_get_max_threads()          12
  OMP_THREAD_LIMIT               unset
  OMP_NUM_THREADS                unset
  RestoreAfterFork               true
  data.table is using 6 threads with throttle==1024. See ?setDTthreads.
test.data.table() running: C:/Program Files/R/R-devel/library/data.table/tests/tests.Rraw
'file:' is not recognized as an internal or external command,
operable program or batch file.
Test 1378.2 produced 2 warnings but expected 0
Expected: 
Observed: '(file://C:/Program Files/R/R-devel/library/data.table/tests/russellCRLF.csv) > C:\Users\th798\AppData\Local\Temp\RtmpGMzTle\file1680248362b7' execution failed with error code 1
 Test 1378.2 produced 2 warnings but expected 0
Expected: 
Observed: File 'C:\Users\th798\AppData\Local\Temp\RtmpGMzTle\file1680248362b7' has size 0. Returning a NULL data.table.
Test 1378.2 produced 1 errors but expected 0
Expected: 
Observed: j (the 2nd argument inside [...]) is a single symbol but column name 'Value With Dividends' is not found. If you intended to select columns using a variable in calling scope, please try DT[, ..Value With Dividends]. The .. prefix conveys one-level-up similar to a file system path.
Test 1378.2 produced 1 messages but expected 0
Expected: 
Observed: Taking input= as a system command because it contains a space ('file://C:/Program Files/R/R-devel/library/data.table/tests/russellCRLF.csv'). If it's a filename please remove the space, or use file= explicitly. A variable is being passed to input= and when this is taken as a system command there is a security concern if you are creating an app, the app could have a malicious user, and the app is not running in a secure environment; e.g. the app is running as root. Please read item 5 in the NEWS file for v1.11.6 for more information and for the option to suppress this message.

'file:' is not recognized as an internal or external command,
operable program or batch file.
Test 1378.3 produced 2 warnings but expected 0
Expected: 
Observed: '(file://C:/Program Files/R/R-devel/library/data.table/tests/russellCRCRLF.csv) > C:\Users\th798\AppData\Local\Temp\RtmpGMzTle\file168024827834' execution failed with error code 1
 Test 1378.3 produced 2 warnings but expected 0
Expected: 
Observed: File 'C:\Users\th798\AppData\Local\Temp\RtmpGMzTle\file168024827834' has size 0. Returning a NULL data.table.
Test 1378.3 produced 1 errors but expected 0
Expected: 
Observed: j (the 2nd argument inside [...]) is a single symbol but column name 'Value With Dividends' is not found. If you intended to select columns using a variable in calling scope, please try DT[, ..Value With Dividends]. The .. prefix conveys one-level-up similar to a file system path.
Test 1378.3 produced 1 messages but expected 0
Expected: 
Observed: Taking input= as a system command because it contains a space ('file://C:/Program Files/R/R-devel/library/data.table/tests/russellCRCRLF.csv'). If it's a filename please remove the space, or use file= explicitly. A variable is being passed to input= and when this is taken as a system command there is a security concern if you are creating an app, the app could have a malicious user, and the app is not running in a secure environment; e.g. the app is running as root. Please read item 5 in the NEWS file for v1.11.6 for more information and for the option to suppress this message.


Wed May  1 14:07:01 2024  endian==little, sizeof(long double)==16, longdouble.digits==64, sizeof(pointer)==8, TZ==unset, Sys.timezone()=='America/Phoenix', Sys.getlocale()=='LC_COLLATE=English_United States.utf8;LC_CTYPE=English_United States.utf8;LC_MONETARY=English_United States.utf8;LC_NUMERIC=C;LC_TIME=English_United States.utf8', l10n_info()=='MBCS=TRUE; UTF-8=TRUE; Latin-1=FALSE; codepage=65001; system.codepage=65001', getDTthreads()=='OpenMP version (_OPENMP)==201511; omp_get_num_procs()==12; R_DATATABLE_NUM_PROCS_PERCENT==unset (default 50); R_DATATABLE_NUM_THREADS==unset; R_DATATABLE_THROTTLE==unset (default 1024); omp_get_thread_limit()==2147483647; omp_get_max_threads()==12; OMP_THREAD_LIMIT==unset; OMP_NUM_THREADS==unset; RestoreAfterFork==true; data.table is using 6 threads with throttle==1024. See ?setDTthreads.', zlibVersion()==1.3 ZLIB_VERSION==1.3
Error in stopf("%d error(s) out of %d. Search %s for test number(s) %s. Duration: %s.",  : 
  2 error(s) out of 11188. Search tests/tests.Rraw for test number(s) 1378.2, 1378.3. Duration: 00:01:05 elapsed (34.8s cpu).
Calls: test.data.table -> stopf -> raise_condition -> signal
In addition: Warning message:
In (function (category = "LC_ALL", locale = "")  :
  using locale code page other than 65001 ("UTF-8") may cause problems
Execution halted
(base) �]0;MINGW64:/c/Users/th798/R/data.table�
th798@cmp2986 MINGW64 ~/R/data.table (fread-key-length-0)
$ 

Error message says above says that the problem is that a path with "Program Files" is being treated as a system command, so I thought the fix should be changing fread(f) to fread(file=f), but it looks like that is already done in another test (see below), so I wonder what should the fix be?
here is the related code in tests.Rraw:

https://github.com/Rdatatable/data.table/blob/65fbc2af1e7ce368d798d25612e1c16d336780d0/inst/tests/tests.Rraw#L6009-L6028

@tdhock
Copy link
Member Author

tdhock commented May 2, 2024

is quoting needed somewhere?

@MichaelChirico
Copy link
Member

MichaelChirico commented May 2, 2024

The key looks like here:

'file:' is not recognized as an internal or external command,
operable program or batch file.

As noted in nearby comments, the test is about simulating download.file() without needing internet access by passing file:// paths.

What does this do on the machine in question?

writeLines(letters, tmp<-tempfile())
download.file(paste0("file://", tmp), tmp2<-tempfile(), method="internal")
readLines(tmp2)

@tdhock
Copy link
Member Author

tdhock commented May 2, 2024

> writeLines(letters, tmp<-tempfile())
> download.file(paste0("file://", tmp), tmp2<-tempfile(), method="internal")
> readLines(tmp2)
 [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s"
[20] "t" "u" "v" "w" "x" "y" "z"

so I think it has something to do with the fread logic "if there is a space it must be a system command"

@D3SL
Copy link

D3SL commented Jun 13, 2024

@tdhock that's exactly what it is. Despite the error message referring to v1.11.6 this seems like a very recent change in behavior, I never had this issue before.

This also makes fread incompatible with any map()-like commands without using an anonymous function so you can feed x to file= instead of the default first input.

@tdhock
Copy link
Member Author

tdhock commented Aug 7, 2024

closing because I no longer have access to the windows computer where I had this issue, and it does not happen on my new windows computer.

@tdhock tdhock closed this as completed Aug 7, 2024
@D3SL
Copy link

D3SL commented Aug 14, 2024

I still have this on my windows 10 work machine with data.table 1.15.4. Any attempt to load a file with a space in the name will result in data.table trying to treat it as a system command. What has changed in your environment with the new windows computer? R version, IDE, is your environment contaminated with a different version/branch of data.table's code or functions?

@tdhock tdhock reopened this Aug 14, 2024
@ben-schwen
Copy link
Member

@D3SL can you try it with the dev version? data.table::update_dev_pkg()

@tdhock
Copy link
Member Author

tdhock commented Aug 14, 2024

also running sessionInfo() may be helpful, can you please share that with us @D3SL ?

@D3SL
Copy link

D3SL commented Aug 14, 2024

Here's my sessioninfo. I'll have to try updating to the latest dev version tomorrow.

R version 4.4.0 (2024-04-24 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 10 x64 (build 19045)

Matrix products: default


locale:
[1] LC_COLLATE=English_World.utf8  LC_CTYPE=English_World.utf8    LC_MONETARY=English_World.utf8 LC_NUMERIC=C                  
[5] LC_TIME=English_World.utf8    

time zone: Asia/Jerusalem
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] tidyxl_1.0.10        XXXX        mongolite_2.8.0      data.table_1.15.4    collapse_2.0.13      arrow_15.0.1        
 [7] fedmatch_2.0.5       stringi_1.8.4        readxl_1.4.3         openxlsx2_1.6        xml2_1.3.6           data.validator_0.2.1
[13] jsonlite_1.8.8       assertr_3.0.1        lubridate_1.9.3      magrittr_2.0.3       stringr_1.5.1        future.callr_0.8.2  
[19] blastula_0.3.5       xtable_1.8-4         furrr_0.3.1          future_1.33.2        purrr_1.0.2          readr_2.1.5         
[25] tibble_3.2.1         tidyr_1.3.1          dplyr_1.1.4         

loaded via a namespace (and not attached):
 [1] utf8_1.2.4        generics_0.1.3    listenv_0.9.1     hms_1.1.3         digest_0.6.35     evaluate_0.23     timechange_0.3.0 
 [8] fastmap_1.2.0     cellranger_1.1.0  processx_3.8.4    zip_2.3.1         ps_1.7.6          fansi_1.0.6       codetools_0.2-20 
[15] cli_3.6.2         rlang_1.1.3       parallelly_1.37.1 bit64_4.0.5       withr_3.0.0       tools_4.4.0       parallel_4.4.0   
[22] tzdb_0.4.0        globals_0.16.3    assertthat_0.2.1  vctrs_0.6.5       R6_2.5.1          lifecycle_1.0.4   bit_4.0.5        
[29] pkgconfig_2.0.3   callr_3.7.6       pillar_1.9.0      glue_1.7.0        Rcpp_1.0.12       xfun_0.44         tidyselect_1.2.1 
[36] rstudioapi_0.16.0 knitr_1.46        htmltools_0.5.8.1 rmarkdown_2.27    compiler_4.4.0    askpass_1.2.0     openssl_2.2.0   

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants