Skip to content

Reviewing how meta works in CLIMADA #348

@ChrisFairless

Description

@ChrisFairless

We've been talking a lot about meta properties in CLIMADA Hazard and Centroids objects, how they have changed over time, and how it's now causing bugs. This issue brings it all together.

Previous conversations are in #340, #341 and #343.

The issue

Historically the meta property of CLIMADA classes was used to store raster metadata. So if you wanted to check that an object had a defined raster transformation you could just check for the existence of its meta property.

This check shows all over the place in CLIMADA. But as the use of meta has expanded and changed it's no longer a reliable check:

  • The meta property is sometimes set with no raster information. e.g. Exposures.meta sometimes stores CRS information and nothing else, and after appending or union-ing Centroids and Hazards the meta is set to an empty dictionary. There are if statements in CLIMADA that assume the existence of a meta is equivalent to full raster metadata, and the code then falls over when that metadata isn't available.
  • Earlier CLIMADA functionality allowed you to describe your geographic data with either lat and lon properties or the meta property, with the idea that they were equivalent and interchangeable for raster operations. So there are methods that assume that if a meta is defined, it contains all the relevant information about an object. For example, Centroids.equal assumes two Centroids objects are identical if they have the same meta properties. Some methods, such as Centroids.from_raster_file read a raster file and only set the meta property, not lat and lon. This isn't really how we use CLIMADA any more.
  • A consequence of allowing two ways of describing geographic properties is that metas can get out of sync with lat-lon data, so checks on a meta don't necessarily give accurate information about lat and lon coordinates, and vice versa.

In summary, the meta can no longer be used to infer anything about a CLIMADA class.

Quick fixes failed

I spent some of last week attempting a quick fix, which was an is_raster method for the Centroids and Exposures classes. The idea was if x.is_raster() would replace existing if x.meta checks.

I ran into issues about what the 'source of truth' is for some of these checks, however, and decided that I needed to discuss things with you all.

For example,

  • is_raster can be calculated without looking at an object's meta. So should the method care if the meta isn't defined? Or if it's defined but empty? Or if it's defined and disagrees with the lat-lon data?
  • On the other hand, if a meta is defined but there's no lat-lon data, what should happen (and should this functionality be allowed any more?)
  • Should there different new check(s) depending on what the if x.meta check was looking for? e.g. checking the data are on a regular grid, versus checking they're defined at every grid point of a regular grid? Do we really want several new methods?
  • The new checks would probably break old code, is that ok?

This was enough to confuse me, and I think we've reached the point where a bad fix might be worse than no fix. So I want to ask about what to do next.

What to do next?

@chahank and others had alreadty told me that quick fixes weren't going to help much. Since nothing is horribly broken right now, we don't have to patch this immediately, but it will eventually cause serious bugs.

After conversations in the other pull requests and issues, I think we might need to:

  • Officially retire the meta as an "alternate" way to describe a CLIMADA object's geographic properties, make lat-lon properties required and to make the raster checks depend on them.
  • In particular, we should no longer be able to initialise objects with a meta and without lat-lon data.
  • We could continue to have a meta property – it's very useful – but it should be recalculated whenever it is accessed.
  • CLIMADA's raster functionality should be limited to reading, writing and regridding data at the start and end of an analysis, with everything in between working on centroids. This is how most people use CLIMADA anyway. (There's a possible exception for coordinates.assign_centroids where large gridded datasets are processed a lot faster with raster methods.)

Unfortunately this will be an annoying amount of work, and bad for backward compatibility.

Thoughts?

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions