-
Notifications
You must be signed in to change notification settings - Fork 982
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[R-Forge #2605] add filtering option to fread so it can load less than all rows #583
Comments
I'm pretty sure this is the same as what I had in mind recently, but let me elaborate with an example:
I'm working with
This is inefficient because I need to read all of An approach like this:
Would only require 1) read |
Hello. I could do it in two steps: first read all the file, second filter, but this is slower and I could have problems if the file doesn't fit on memory. I don't know if we are speaking about the same thing or if I missunderstood it. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Any update for those stuck with Windows :D ? Update : My bad, Cygwin works perfectly on Windows, as said above. In order to avoid to include the header as a line and get the colnames you can write something like that :
Thanks to @thoera for the help ! Regards. |
UPDATE 2 : After some tests, I figured that the solution I proposed wasn't working on R. This issue can be reproduced with the iris dataset and the following code :
You can also try with Does fread deal with pipe and command lines more complicated than one instruction ? Thanks Vincent. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
To be updated |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
For those who are bumping this issue, be sure to upvote first post here as well. AFAIK nobody is currently working on implementing this. If anyone would, we would be happy to assign him/her to this issue. Regarding the FR itself. I don't think it make sense to introduce new mechanism for filtering on a csv files directly. It is basically a lot of effort and maintenance, where now |
See here: https://stackoverflow.com/a/62240442/3576984
|
One issue I don't see raised yet that's a shortcoming of many
Would it be worth adding an argument to |
How about using The only overlap with current usage is for one-column files; it should be safe to check Related: #4029, #4686, |
It's interesting that neither Python nor Stata nor other R functions like readr's |
Submitted by: stat quant; Assigned to: Nobody; R-Forge link
Discussed in data.table list.
fread(input, chunk.nrows=10000, chunk.filter = <anything acceptable to i of DT[i]>)
, that could begrep()
or any expression of column names.The text was updated successfully, but these errors were encountered: