You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First, I want to thank you for the amazing package!
I failed on reading a huge (1.15TB) sas7bdat file using read_sas7bdat (see the error message at the end). This dataset contains both character and numeric columns. My guess is that it could be either because 1) the file is corrupted or 2) some internal pyreadstat implementation has limited its ability to read such a large file.
The sas file is downloaded using rsync, so errors during transmission should be unlikely. Before I redownload the data, which is very painful, could you advice me on any possible solutions?
Thank you!
---------------------------------------------------------------------------
ReadstatError Traceback (most recent call last)
Cell In[8], [line 4](vscode-notebook-cell:?execution_count=8&line=4)
[1](vscode-notebook-cell:?execution_count=8&line=1) extra_date_formats = ["YYMMDDN8"]
[2](vscode-notebook-cell:?execution_count=8&line=2) extra_datetime_formats = ["DATETIME25"]
----> [4](vscode-notebook-cell:?execution_count=8&line=4) df, meta = pyreadstat.read_sas7bdat(
[5](vscode-notebook-cell:?execution_count=8&line=5) '[/home/yu/chaoyang/local-wrds/etfg/sasdata/constituents/constituents.sas7bdat](https://vscode-remote+ssh-002dremote-002byu-002dws.vscode-resource.vscode-cdn.net/home/yu/chaoyang/local-wrds/etfg/sasdata/constituents/constituents.sas7bdat)',
[6](vscode-notebook-cell:?execution_count=8&line=6) encoding="latin1",
[7](vscode-notebook-cell:?execution_count=8&line=7) disable_datetime_conversion=False,
[8](vscode-notebook-cell:?execution_count=8&line=8) dates_as_pandas_datetime=False,
[9](vscode-notebook-cell:?execution_count=8&line=9) extra_date_formats=extra_date_formats,
[10](vscode-notebook-cell:?execution_count=8&line=10) extra_datetime_formats=extra_datetime_formats,
[11](vscode-notebook-cell:?execution_count=8&line=11) # row_offset=1,
[12](vscode-notebook-cell:?execution_count=8&line=12) row_limit=1
[13](vscode-notebook-cell:?execution_count=8&line=13) )
File pyreadstat/pyreadstat.pyx:129, in pyreadstat.pyreadstat.read_sas7bdat()
File pyreadstat/_readstat_parser.pyx:1137, in pyreadstat._readstat_parser.run_conversion()
File pyreadstat/_readstat_parser.pyx:882, in pyreadstat._readstat_parser.run_readstat_parser()
File pyreadstat/_readstat_parser.pyx:804, in pyreadstat._readstat_parser.check_exit_status()
ReadstatError: Invalid file, or file has unsupported features
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
First, I want to thank you for the amazing package!
I failed on reading a huge (1.15TB) sas7bdat file using
read_sas7bdat
(see the error message at the end). This dataset contains both character and numeric columns. My guess is that it could be either because 1) the file is corrupted or 2) some internal pyreadstat implementation has limited its ability to read such a large file.The sas file is downloaded using rsync, so errors during transmission should be unlikely. Before I redownload the data, which is very painful, could you advice me on any possible solutions?
Thank you!
Beta Was this translation helpful? Give feedback.
All reactions