-
Notifications
You must be signed in to change notification settings - Fork 79
Open
Description
[GCC 8.5.0 20210514 (Red Hat 8.5.0-24)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyreadstat as prs
>>> d,m=prs.read_dta("test.dta")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "pyreadstat/pyreadstat.pyx", line 296, in pyreadstat.pyreadstat.read_dta
File "pyreadstat/_readstat_parser.pyx", line 1282, in pyreadstat._readstat_parser.run_conversion
File "pyreadstat/_readstat_parser.pyx", line 955, in pyreadstat._readstat_parser.run_readstat_parser
File "pyreadstat/_readstat_parser.pyx", line 877, in pyreadstat._readstat_parser.check_exit_status
pyreadstat._readstat_parser.ReadstatError: Unable to allocate memory
>>>From my investigation, the issue is caused by L451 in readstat_dta_read.c, within dta_read_strls() function. It allocates memory for each string separately in a while loop. Later, at L445, the code is unable to allocate a large continuous chunk of memory because the heap is heavily fragmented.
With the reproducible example https://www.dropbox.com/scl/fi/sx9cz7vjekvud3ail9ph3/test.dta?rlkey=7e5qmwl9tbuoa0967kq3uq65f&st=g3wxulnc&dl=0,
L451 (malloc for each string) was executed approximately 1.6 million times. After that, L445 failed to allocate 26MB of continuous heap memory.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels