Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

select via pattern in fread #2066

Open
MichaelChirico opened this issue Mar 19, 2017 · 1 comment
Open

select via pattern in fread #2066

MichaelChirico opened this issue Mar 19, 2017 · 1 comment
Labels

Comments

@MichaelChirico
Copy link
Member

I'm trying to read some files where the column names (for the most part) are followed by two digits indicating the year. I'd like to select only a few of the >100 columns from each file, but currently have to do a lot more leg work to get this to work because though the column pattern is the same across files, the full name is not.

Specifically, I'm reading some csv-ified files from here.

I might want the SCHNAM (school name) column for several years; they would be stored as, e.g., SCHNAM06 for 2006-07, SCHNAM07 for 2007-08, SCHNAM08 for 2008-09, and so on.

It would be great to do fread(file_name, select = grep('SCHNAM', .)). But instead I have to do something along the lines of readLines(file_name, n = 1L) %>% strsplit(',') %>% el %>% grep('SCHNAM', .) first.

@lmullany
Copy link

lmullany commented Jan 27, 2023

agree, this would be most helpful. This is my current work around:

find_cols <- function(pth,re,...) {
  cols = names(fread(pth, nrows=0))
  cols[grepl(re,cols,...)]
}
pth = "dataset_with_hundreds_of_columns.csv"
df = fread(pth, select=find_cols(pth,"<my_awesome_regex>")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants