-
Notifications
You must be signed in to change notification settings - Fork 150
Description
We've been talking a lot about meta properties in CLIMADA Hazard and Centroids objects, how they have changed over time, and how it's now causing bugs. This issue brings it all together.
Previous conversations are in #340, #341 and #343.
The issue
Historically the meta property of CLIMADA classes was used to store raster metadata. So if you wanted to check that an object had a defined raster transformation you could just check for the existence of its meta property.
This check shows all over the place in CLIMADA. But as the use of meta has expanded and changed it's no longer a reliable check:
- The
metaproperty is sometimes set with no raster information. e.g.Exposures.metasometimes stores CRS information and nothing else, and after appending or union-ing Centroids and Hazards the meta is set to an empty dictionary. There areifstatements in CLIMADA that assume the existence of ametais equivalent to full raster metadata, and the code then falls over when that metadata isn't available. - Earlier CLIMADA functionality allowed you to describe your geographic data with either
latandlonproperties or themetaproperty, with the idea that they were equivalent and interchangeable for raster operations. So there are methods that assume that if ametais defined, it contains all the relevant information about an object. For example,Centroids.equalassumes two Centroids objects are identical if they have the samemetaproperties. Some methods, such asCentroids.from_raster_fileread a raster file and only set themetaproperty, notlatandlon. This isn't really how we use CLIMADA any more. - A consequence of allowing two ways of describing geographic properties is that
metas can get out of sync with lat-lon data, so checks on a meta don't necessarily give accurate information about lat and lon coordinates, and vice versa.
In summary, the meta can no longer be used to infer anything about a CLIMADA class.
Quick fixes failed
I spent some of last week attempting a quick fix, which was an is_raster method for the Centroids and Exposures classes. The idea was if x.is_raster() would replace existing if x.meta checks.
I ran into issues about what the 'source of truth' is for some of these checks, however, and decided that I needed to discuss things with you all.
For example,
is_rastercan be calculated without looking at an object'smeta. So should the method care if themetaisn't defined? Or if it's defined but empty? Or if it's defined and disagrees with the lat-lon data?- On the other hand, if a
metais defined but there's no lat-lon data, what should happen (and should this functionality be allowed any more?) - Should there different new check(s) depending on what the
if x.metacheck was looking for? e.g. checking the data are on a regular grid, versus checking they're defined at every grid point of a regular grid? Do we really want several new methods? - The new checks would probably break old code, is that ok?
This was enough to confuse me, and I think we've reached the point where a bad fix might be worse than no fix. So I want to ask about what to do next.
What to do next?
@chahank and others had alreadty told me that quick fixes weren't going to help much. Since nothing is horribly broken right now, we don't have to patch this immediately, but it will eventually cause serious bugs.
After conversations in the other pull requests and issues, I think we might need to:
- Officially retire the
metaas an "alternate" way to describe a CLIMADA object's geographic properties, make lat-lon properties required and to make the raster checks depend on them. - In particular, we should no longer be able to initialise objects with a
metaand without lat-lon data. - We could continue to have a
metaproperty – it's very useful – but it should be recalculated whenever it is accessed. - CLIMADA's raster functionality should be limited to reading, writing and regridding data at the start and end of an analysis, with everything in between working on centroids. This is how most people use CLIMADA anyway. (There's a possible exception for
coordinates.assign_centroidswhere large gridded datasets are processed a lot faster with raster methods.)
Unfortunately this will be an annoying amount of work, and bad for backward compatibility.
Thoughts?