Closed
Description
openedon Nov 23, 2022
haven::read_sas
cannot read a SAS data file with page size 16 MiB (16777216 bytes). Some data files with sizes slightly under 16 MiB also fail to read.
I would expect the attached sas7bdat files (which I have zipped to keep filesize under 10MB) to be read in by haven::read_sas
and give a 10,000 row tibble with one column empty
consisting only of empty strings.
20221123 - haven bug report.zip
haven::read_sas('test_16766976.sas7bdat') # Succeeds
# # A tibble: 10,000 x 1
# empty
# <chr>
# 1 ""
# 2 ""
# 3 ""
# 4 ""
# 5 ""
# 6 ""
# 7 ""
# 8 ""
# 9 ""
# 10 ""
# # ... with 9,990 more rows
haven::read_sas('test_16776192.sas7bdat') # Fails
# Error in df_parse_sas_file(spec_data, spec_cat, encoding = encoding, catalog_encoding = catalog_encoding, :
# Failed to parse <snip>/test_16776192.sas7bdat: Unable to allocate memory.
haven::read_sas('test_16777216.sas7bdat') # Fails
# Error in df_parse_sas_file(spec_data, spec_cat, encoding = encoding, catalog_encoding = catalog_encoding, :
# Failed to parse <snip>/test_16777216.sas7bdat: Unable to allocate memory.
I generated these files in SAS using
* libname out "appropriate/path/here";
data out.test_16777216 (bufsize=16777216 compress=no);
length empty $3000;
do i = 1 to 10000; output; end;
drop i;
run;
* PROC CONTENTS to verify page size;
proc contents data=out.test_16777216 varnum; run;
data out.test_16776192 (bufsize=16776192 compress=no);
length empty $3000;
do i = 1 to 10000; output; end;
drop i;
run;
proc contents data=out.test_16776192 varnum; run;
data out.test_16766976 (bufsize=16766976 compress=no);
length empty $3000;
do i = 1 to 10000; output; end;
drop i;
run;
proc contents data=out.test_16766976 varnum; run;
Workaround: Set the default page size in SAS to 8MB with -BUFSIZE 8M
or on a case-by-case basis. The default page size for my operating environment is 16M.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment