Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Task import-heat-demand fails with --dataset-boundary=Everything #204

Closed
ClaraBuettner opened this issue Apr 9, 2021 · 6 comments · Fixed by #213
Closed

Task import-heat-demand fails with --dataset-boundary=Everything #204

ClaraBuettner opened this issue Apr 9, 2021 · 6 comments · Fixed by #213
Assignees
Labels
🐛 bug Something isn't working

Comments

@ClaraBuettner
Copy link
Contributor

The heat demand import fails with the following log, the log file is ~4GB so I wasn't even able to open it completely, but this screenshot shows the most important lines (just the huge list of values is missing).
log_import-heat-demand

It looks the these lines need too much memory:

import_rasters = subprocess.run(
["raster2pgsql", "-e", "-s", "3035", "-I", "-C", "-F", "-a"]
+ sources
+ [f"{rasters}"],
text=True,
).stdout
with engine.begin() as connection:
connection.execute(
f'CREATE TEMPORARY TABLE "{rasters}"'
' ("rid" serial PRIMARY KEY,"rast" raster,"filename" text);'
)
connection.execute(import_rasters)

I think we can't really change something in these lines to reduce the memory because it is already directly inserted to the database.
But regarding htop, the maximal RAM used is ~20GB and the server has much more.
@gnn Is there a limit for airflow or the docker container and could we try to adjust this?

@ClaraBuettner ClaraBuettner added the 🐛 bug Something isn't working label Apr 9, 2021
@EvaWie
Copy link
Contributor

EvaWie commented Apr 9, 2021

I do not have a deep understanding of the reported problem. I just want to tell you, that I successfully run the heat demand data import for Germany, before we implemented the test case and the new docker container (and a few heat demand table related changes e.g. version numbering) on my laptop. Maybe this hint helps.

@gnn
Copy link
Collaborator

gnn commented Apr 12, 2021

Thanks for the hint, @EvaWie . This is strange. It seems that the culprit is the line importing the rasters. This executes a SQL script which inserts a row that is simply to large for PostgreSQL. The whole raster_data is saved in one row and single objects like row entries are only allowed to be up to 1GB (=1073741824 Bytes) big, which the value in the log messages exceeds. So we either have to find a way of splitting the rasters into more than one row, or we'll have to find a way of storing the rasters as "large objects". When implementing the raster import, I actually experimented a bit with storing the rasters in more than one row, so I might be able to dig up some code which could help here.
The only strange thing is, that it worked previously. Did we get more rasters with heat demand data for some reason?

@ClaraBuettner
Copy link
Contributor Author

I remember that we have changed the datatype from int to float. If that was done after Evas run, this could cause the difference and also the new issue. I will check if it works to import the values as integers.

@EvaWie
Copy link
Contributor

EvaWie commented Apr 12, 2021

I remember that we have changed the datatype from int to float. If that was done after Evas run, this could cause the difference and also the new issue. I will check if it works to import the values as integers.

That is a very good point!

@ClaraBuettner
Copy link
Contributor Author

I tried to run it with Integer values and it worked. So we know at least why this error didn't occurred before.
@gnn: It would be great if you could find the old code snippets for storing the raster in more than one row. If you send them to me I can try to implement it.

@gnn
Copy link
Collaborator

gnn commented Apr 12, 2021

I tried to run it with Integer values and it worked. So we know at least why this error didn't occurred before.

Nice.

@gnn: It would be great if you could find the old code snippets for storing the raster in more than one row. If you send them to me I can try to implement it.

It's not much. I think it would be enough to use the -t switch documented in raster2pgsql's documentation. So something like adding a ["-t", "auto"] to line 479. The hard part is testing that this actually creates the same raster so it doesn't modify the results obtained from using the it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐛 bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants