Skip to content

haven::read_sas unable to allocate memory for 16MB SAS page sizes #697

Closed

Description

haven::read_sas cannot read a SAS data file with page size 16 MiB (16777216 bytes). Some data files with sizes slightly under 16 MiB also fail to read.

I would expect the attached sas7bdat files (which I have zipped to keep filesize under 10MB) to be read in by haven::read_sas and give a 10,000 row tibble with one column empty consisting only of empty strings.

20221123 - haven bug report.zip

haven::read_sas('test_16766976.sas7bdat') # Succeeds
# # A tibble: 10,000 x 1
# empty
# <chr>
#   1 ""   
# 2 ""   
# 3 ""   
# 4 ""   
# 5 ""   
# 6 ""   
# 7 ""   
# 8 ""   
# 9 ""   
# 10 ""   
# # ... with 9,990 more rows

haven::read_sas('test_16776192.sas7bdat') # Fails
# Error in df_parse_sas_file(spec_data, spec_cat, encoding = encoding, catalog_encoding = catalog_encoding,  : 
#                              Failed to parse <snip>/test_16776192.sas7bdat: Unable to allocate memory.
                           
haven::read_sas('test_16777216.sas7bdat') # Fails
# Error in df_parse_sas_file(spec_data, spec_cat, encoding = encoding, catalog_encoding = catalog_encoding,  : 
#                              Failed to parse <snip>/test_16777216.sas7bdat: Unable to allocate memory.

I generated these files in SAS using

* libname out "appropriate/path/here";

data out.test_16777216 (bufsize=16777216 compress=no);
length empty $3000;
do i = 1 to 10000; output; end;
drop i;
run;
* PROC CONTENTS to verify page size;
proc contents data=out.test_16777216 varnum; run;

data out.test_16776192 (bufsize=16776192 compress=no);
length empty $3000;
do i = 1 to 10000; output; end;
drop i;
run;
proc contents data=out.test_16776192 varnum; run;

data out.test_16766976 (bufsize=16766976 compress=no);
length empty $3000;
do i = 1 to 10000; output; end;
drop i;
run;
proc contents data=out.test_16766976 varnum; run;

Workaround: Set the default page size in SAS to 8MB with -BUFSIZE 8M or on a case-by-case basis. The default page size for my operating environment is 16M.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions