-
Notifications
You must be signed in to change notification settings - Fork 982
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fread may fail to parse valid file when dec=',' #2750
Comments
Awesome package! Thank you. library(data.table)
DT = data.table(A = rep("20,1", 1e4))
fwrite(DT, "DT.csv", quote = FALSE)
classA = character(1e3)
for (i in seq_along(classA)) {
DT = fread("DT.csv", sep = ";", dec = ",", colClasses = "numeric")
classA[i] = DT[, class(A)]
}
table(classA) This gives me: Warning message:
In fread("DT.csv", sep = ";", dec = ",", colClasses = "numeric", :
Bumped column 1 to type character on data row 387, field contains '20,1'. Coercing previously read values in this column from logical, integer or numeric back to character which may not be lossless; e.g., if '00' and '000' occurred before they will now be just '0', and there may be inconsistencies with treatment of ',,' and ',NA,' too (if they occurred in this column before the bump). If this matters please rerun and set 'colClasses' to 'character' for this column. Please note that column type detection uses a sample of 1,000 rows (100 rows at 10 points) so hopefully this message should be very rare. If reporting to datatable-help, please rerun and include the output from verbose=TRUE.
table(classA)
# classA
# character numeric
# 1 999
which(classA == "character")
# [1] 9 So, sometimes, the column is read as character with the spotted row index being different for different runs. And the sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] data.table_1.10.4-3
loaded via a namespace (and not attached):
[1] tools_3.3.2 yaml_2.1.18 |
please make sure to update to the development version of data.table and test again. there have been a lot of improvements to `fread` since the last release
|
Indeed... Ran the same example a couple of time with 1.10.5 and it worked fine. |
Confirming that this was fixed with #4495 |
thanks for checking! |
This happens because float parser greedily consumes
1,2
as a single token, whereas without quotes it must be parsed as 2 separate fields.In addition, the "Details" section in documentation has the following information (which is long since being outdated):
The text was updated successfully, but these errors were encountered: