Skip to content

ENH: read_sas only selected columns (and rows) of a large data file #37088

Open
@paris0120

Description

@paris0120

Is your feature request related to a problem?

Reading a large data file can exhaust system memory. However, most of the time, we don't need all data from that file. It will be convenient to be able to read only the data needed.

Describe the solution you'd like

Reading only the data needed through iteration.

API breaking implications

One more function for each type of files

Describe alternatives you've considered

May add conditions as well, which will be sweet

Additional context

[add any other context, code examples, or references to existing implementations about the feature request here]

def read_sas_by_columns(file_path, keys, chunksize=100, charset = 'utf-8'):
    data = pd.DataFrame()
    for df in pd.read_sas(file_path,iterator=True, chunksize=chunksize):
        data = data.append(df[keys])
    for c in data.select_dtypes(include=['object']).columns:
        data[c] = data[c].str.decode(charset)
    return data

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions