Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fread's skip.white=FALSE behavior #2492

Closed
st-pasha opened this issue Nov 21, 2017 · 1 comment · Fixed by #6159
Closed

fread's skip.white=FALSE behavior #2492

st-pasha opened this issue Nov 21, 2017 · 1 comment · Fixed by #6159

Comments

@st-pasha
Copy link
Contributor

st-pasha commented Nov 21, 2017

The documentation for this parameter says the following:

strip.white: default is ‘TRUE’. Strips leading and trailing whitespaces
          of unquoted fields. If ‘FALSE’, only header trailing spaces
          are removed.

So, it appears that, according to the documentation, when the flag is false the white space should be left intact EXCEPT for the trailing whitespace in the headers (and it is unclear whether the documentation talks about all fields in the header, or just the last one). I'm not sure why such exception is warranted, but I tried to see how it works:

(1) It appears that the trailing whitespace in the header isn't removed after all (and neither does the leading whitespace):

> data.table::fread("A  ,B  \n1,2\n3,4\n", strip.white=F) -> f
> colnames(f)
[1] "A  " "B  "
> data.table::fread("  A,  B\n1,2\n3,4\n", strip.white=F) -> f
> colnames(f)
[1] "  A" "  B"

I think this is good, just the documentation needs to be corrected.

(2) Now what about the data? It appears the flag is not respected when the data is numeric:

> data.table::fread("A,B,C,D\n  1.0  ,  2  ,  x , true \n  3.7  ,  4  ,  y\t, false   \n", strip.white=F) -> f
> str(f)
Classes ‘data.table’ and 'data.frame':	2 obs. of  4 variables:
 $ A: num  1 3.7
 $ B: int  2 4
 $ C: chr  "  x " "  y\t"
 $ D: logi  TRUE FALSE
 - attr(*, ".internal.selfref")=<externalptr> 

Thus, it appears the flag only applies to character fields and is ignored for all others. I'm not sure whether this is the intentional behavior or not, but the documentation doesn't mention it at all...

(Update) Cross-checking with the documentation of read.csv, they mention the following:

strip.white: logical. Used only when ‘sep’ has been specified, and
          allows the stripping of leading and trailing white space from
          unquoted ‘character’ fields (‘numeric’ fields are always
          stripped).

So it appears that the behavior of fread is (almost) consistent with that of read.csv, and then it's just the documentation issue. The only discrepancy in the behavior is that read.csv strips both spaces and tabs, while fread only spaces.

@st-pasha st-pasha added the fread label Nov 21, 2017
@st-pasha st-pasha added this to the v1.10.6 milestone Nov 21, 2017
@HughParsonage
Copy link
Member

Related? #2376

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants