Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modest performance, address #12647 #12656

Closed
wants to merge 19 commits into from
Closed
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Decouple data decoding and decoding e.g. of column names
  • Loading branch information
kshedden committed Apr 21, 2016
commit 23bdf7a49c80341c7cfa83f5c146e4ceac6dbfa7
10 changes: 7 additions & 3 deletions pandas/io/sas/sas7bdat.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,17 +53,21 @@ class SAS7BDATReader(BaseIterator):
Return SAS7BDATReader object for iterations, returns chunks
with given number of lines.
encoding : string, defaults to None
String encoding. If None, text variables are left as raw bytes.
String encoding.
convert_text : bool, deafaults to True
If False, text variables are left as raw bytes.
"""

def __init__(self, path_or_buf, index=None, convert_dates=True,
blank_missing=True, chunksize=None, encoding=None):
blank_missing=True, chunksize=None, encoding=None,
convert_text=True):

self.index = index
self.convert_dates = convert_dates
self.blank_missing = blank_missing
self.chunksize = chunksize
self.encoding = encoding
self.convert_text = convert_text

self.compression = ""
self.column_names_strings = []
Expand Down Expand Up @@ -611,7 +615,7 @@ def _chunk_to_dataframe(self):
elif self.column_types[j] == b's':
rslt[name] = self._string_chunk[js, :]
rslt[name] = rslt[name].apply(lambda x: x.rstrip(b'\x00 '))
if self.encoding is not None:
if self.convert_text and (self.encoding is not None):
rslt[name] = rslt[name].apply(
lambda x: x.decode(encoding=self.encoding))
if self.blank_missing:
Expand Down