Open
Description
Is your feature request related to a problem?
Reading a large data file can exhaust system memory. However, most of the time, we don't need all data from that file. It will be convenient to be able to read only the data needed.
Describe the solution you'd like
Reading only the data needed through iteration.
API breaking implications
One more function for each type of files
Describe alternatives you've considered
May add conditions as well, which will be sweet
Additional context
[add any other context, code examples, or references to existing implementations about the feature request here]
def read_sas_by_columns(file_path, keys, chunksize=100, charset = 'utf-8'):
data = pd.DataFrame()
for df in pd.read_sas(file_path,iterator=True, chunksize=chunksize):
data = data.append(df[keys])
for c in data.select_dtypes(include=['object']).columns:
data[c] = data[c].str.decode(charset)
return data