-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Memory usage increases with subsequent reads to same data #207
Comments
Thanks for the report and good example @joostmeulenbeld I can reproduce on MacOS 12.5 / M1, and also if I use What is even more fun, I can reproduce this without reading any geometries or columns: from pyogrio import read_dataframe
for _ in range(1_000):
tmp = read_dataframe("points.gpkg", columns=[], read_geometry=False) Likewise, I can reproduce it with reading only bounds: from pyogrio import read_bounds
for _ in range(1_000):
tmp = read_bounds("points.gpkg") However, using Arrow I/O, it works without appearing to increase memory, though there is no columnar data other than geometry here: for _ in range(1_000):
tmp = read_dataframe("points.gpkg", read_geometry=False, use_arrow=True) (it fails reading geometry with WKB error, need to investigate that separately) This at least narrows down a little bit where the issue may be coming from. |
I get the same error, and that's because have all empty bytes ( |
I ran the That points to |
I have a fix at #209 |
RAM usage keeps going up when loading the same geospatial file in a loop.
The example script below creates a geopackage of about 10MB and reads it many times into a geodataframe using pyogrio. RAM usage goes up every iteration, even though each iteration the loaded geodataframe goes out of scope. After about 500 reads, memory usage is ~10GB and keeps rising.
The following lines did not yield increasing RAM usage over time, from which I conclude it's not
shapely
orgeopandas
itself, andfiona
does not have the same problem:Also adding
gc.collect()
inside the loop does not make a difference.The actual use case I have problems with uses bounding-boxed reading of a file too large to load in memory in a loop, which has the same problems.
Environment
pyogrio
0.5.0 (from PyPI)shapely
2.0.0 (from PyPI)geopandas
0.12.2 (from PyPI)The text was updated successfully, but these errors were encountered: