Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why is SA2-by-DJZ-2011 re-reading? #2509

Open
mattdowle opened this issue Dec 7, 2017 · 2 comments
Open

Why is SA2-by-DJZ-2011 re-reading? #2509

mattdowle opened this issue Dec 7, 2017 · 2 comments

Comments

@mattdowle
Copy link
Member

In #2481, an aside to revisit was why the reread is happening. Row 0 is in-sample, so it shouldn't need a reread. Likely due to the (needed) header=FALSE.

fread("~/Downloads/SA2-by-DJZ-2011.csv", verbose=TRUE, header=FALSE)
...
  Type counts:
         1 : bool8     '1'
         1 : int32     '5'
         2 : string    'A'
=============================
   0.000s (  0%) Memory map 0.341GB file
   0.002s (  0%) sep=',' ncol=4 and header detection
   0.000s (  0%) Column type detection using 10027 sample rows
   0.196s ( 10%) Allocation of 25164895 rows x 4 cols (0.469GB) of which 22885380 ( 91%) rows used
   1.829s ( 90%) Reading 352 chunks of 0.993MB (64991 rows) using 8 threads
   =    0.000s (  0%) Finding first non-embedded \n after each jump
   +    0.358s ( 18%) Parse to row-major thread buffers (grown 0 times)
   +    0.713s ( 35%) Transpose
   +    0.757s ( 37%) Waiting
   0.592s ( 29%) Rereading 1 columns due to out-of-sample type exceptions
   2.027s        Total
Column 1 ("") bumped from 'bool8' to 'string' due to <<"Goulburn">> on row 0
@mattdowle mattdowle added this to the v1.10.6 milestone Dec 7, 2017
@mattdowle mattdowle changed the title Why is SA2-by-DJZ-2011 reading? Why is SA2-by-DJZ-2011 re-reading? Feb 28, 2018
@mattdowle
Copy link
Member Author

No re-read is happening now, given recent improvements. Will add test and close.

@st-pasha
Copy link
Contributor

st-pasha commented Mar 1, 2018

Re-read is still happening:

> fread("~/Downloads/SA2-by-DJZ-2011.csv", verbose=T, header=F) -> f0
... (no re-read)

> fread("~/Downloads/SA2-by-DJZ-2011.csv", verbose=T, header=T) -> f0
...
Column 1 ("Goulburn") bumped from 'bool8' to 'string' due to <<"Bega-Eden Hinterland">> on row 1202

> fread("~/Downloads/SA2-by-DJZ-2011.csv", verbose=T, header=NA) -> f0
...
Column 1 ("") bumped from 'bool8' to 'string' due to <<"Goulburn">> on row 0

@mattdowle mattdowle reopened this Apr 22, 2018
@mattdowle mattdowle modified the milestones: v1.11.0, v1.11.2 Apr 29, 2018
@mattdowle mattdowle modified the milestones: 1.12.0, 1.12.2 Jan 6, 2019
@jangorecki jangorecki modified the milestones: 1.12.2, 1.12.4 Jan 24, 2019
@jangorecki jangorecki modified the milestones: 1.12.4, 1.13.0 Sep 17, 2019
@mattdowle mattdowle modified the milestones: 1.12.7, 1.12.9 Dec 8, 2019
@mattdowle mattdowle modified the milestones: 1.13.1, 1.13.3 Oct 17, 2020
@jangorecki jangorecki modified the milestones: 1.14.3, 1.14.5 Jul 19, 2022
@jangorecki jangorecki modified the milestones: 1.14.11, 1.15.1 Oct 29, 2023
@jangorecki jangorecki removed this from the 1.16.0 milestone Nov 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants