Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fread error: Sampling jump point is before the last jump ended #2173

Closed
markdanese opened this issue May 18, 2017 · 3 comments
Closed

fread error: Sampling jump point is before the last jump ended #2173

markdanese opened this issue May 18, 2017 · 3 comments

Comments

@markdanese
Copy link

markdanese commented May 18, 2017

RxTerms201704.txt
I am reading in a pipe delimited file from a fresh session. See below. This is on the development version. I tested 1.10.4 and it read in fine. Attaching the problematic file. Running Apple Sierra 10.12.4 using RStudio 1.0.143.

As always, thanks for all your work on data.table.

R version 3.3.3 (2017-03-06) -- "Another Canoe"
Copyright (C) 2017 The R Foundation for Statistical Computing
Platform: x86_64-apple-darwin13.4.0 (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(data.table)
data.table 1.10.5 IN DEVELOPMENT built 2017-05-15 22:57:07 UTC; travis
  The fastest way to learn (by data.table authors): https://www.datacamp.com/courses/data-analysis-the-data-table-way
  Documentation: ?data.table, example(data.table) and browseVignettes("data.table")
  Release notes, videos and slides: http://r-datatable.com
> rxterms <- fread("./data/base/RxTerms201704/RxTerms201704.txt", sep = "|", verbose = TRUE)
Input contains no \n. Taking this to be a filename to open
NAstrings = [<<NA>>]
None of the NAstrings are numeric (such as '-9999').
`filename` argument given, attempting to open a file with such name
File opened, size 0.005821 GB.
Memory mapping ... ok
Detected eol as \r\n (CRLF) in that order, the Windows standard.
Positioned on line 1 starting: <<RXCUI|GENERIC_RXCUI|TTY|FULL_N>>
Using supplied sep '|'
  sep=='|'(ascii 124)  with 100 lines of 18 fields using quote rule 0
Detected 18 columns on line 1. This line is either column names or first data row (first 30 chars): <<RXCUI|GENERIC_RXCUI|TTY|FULL_N>>
All the fields on line 1 are character fields. Treating as the column names.
Number of sampling jump points = 101 because 6250296 bytes from row 1 to eof / (2 * 30729 jump0size) == 101
Type codes (jump 000)    : 226666666666613666  Quote rule 0
Error in fread("./data/base/RxTerms201704/RxTerms201704.txt", sep = "|",  : 
  Internal error: Sampling jump point 79 is before the last jump ended
@catlyst
Copy link

catlyst commented Jun 9, 2017

I'm facing the same issue on R 3.4.0 on macOS 10.12 (16A323) on RStudio 1.0.143

@mattdowle
Copy link
Member

mattdowle commented Jul 20, 2017

This was fixed recently. That error message no longer appears in the source and the file works now.
Just need to add this test to the test suite and then close.
That file is (necessarily to generate the condition) fairly big at 6MB though, too big for CRAN. We need a separate big/long running test environment.

@st-pasha
Copy link
Contributor

Added the file to H₂O's private test environment -- it will be tested with all changes to datatable, and if at any point fread stops being able to read the file, we'll flag this as an issue.

It appears there is nothing else to do for this report, so closing it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants