-
Notifications
You must be signed in to change notification settings - Fork 150
Code Review: Methods to handle lines and polygons with CLIMADA exposures #221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from all commits
Commits
Show all changes
55 commits
Select commit
Hold shift + click to select a range
f2cbc72
New version consistent with develop
Evelyn-M fe1677e
new version consistent with develop
Evelyn-M a765270
try removing circular import
Evelyn-M 10103b7
merge interpolation into coordinates for circular import error
Evelyn-M c9b18e8
adjust unit tests after merging interpolation & coordinates
Evelyn-M 49a1846
Merge branch 'develop' of https://github.com/CLIMADA-project/climada_…
Evelyn-M cc7d016
rename double dist_approx function from interpolation module
Evelyn-M f7dce04
remove duplicate dist_approx, adjust tests
Evelyn-M 034a312
add placeholder unittests for lp_handler and interpolation funcs
Evelyn-M 3824e7f
change name of dist_great_circle_allgeoms
Evelyn-M ba48731
update interpolate_polygons
Evelyn-M 946d4f1
update docstring to numpy format
Evelyn-M 1ad8e47
Merge commit '67e1b9093852650186377445ac412061eea4a9d4' into feature/…
emanuel-schmid ead0d05
Merge branch 'feature/fix_tc_synth_decay' into feature/lines_polygons…
emanuel-schmid b06e205
Merge branch 'develop' of https://github.com/CLIMADA-project/climada_…
Evelyn-M ec19a1c
Merge branch 'develop' of https://github.com/CLIMADA-project/climada_…
Evelyn-M a4ceae1
refactor polygons lines interpolation
Evelyn-M e892d39
replace geom_on_land func
Evelyn-M 658f9b4
move disaggregation to exposure base
Evelyn-M eb7a359
update tutorial
Evelyn-M 2d49afe
demo data line exposure
Evelyn-M a4c7eba
Merge branch 'develop' of https://github.com/CLIMADA-project/climada_…
Evelyn-M 77ca7ac
include litpop interpolation method for polys
Evelyn-M 41fbaff
correct litpop disaggregation, update tutorial
Evelyn-M 7051ca4
add m2/m value to constant valuation method
Evelyn-M 31ed202
bug-fixing litpop disaggregation
Evelyn-M a4b82e5
Merge branch 'develop' of https://github.com/CLIMADA-project/climada_…
Evelyn-M 316da1a
pushing before switching
Evelyn-M 6ee7bcf
Update spacings
fad8b7b
Add notes on lines polyongs
2e6480e
Refactor polygon utils (part1)
f54387a
Merge branch 'develop' into feature/lines_polygons_exp
e9397a1
Allow any geo CRS for geodesics
128a6d5
Move inteMove interpolate lines polyongs to line_polys_handler
8b8ad34
Rewrite poly to points and line to points
9ab2c69
Remove old methods
c9d37a3
Add helper methods geometry impact
c1daf07
Remove old methods
94ff972
Add line impact
5fc09ad
Remove old set geom/line methods
e2c4bda
Add imp matrix sub-routines
8553ef1
Add small example
492d6ca
Make impact agg method and add geom_orig to imp
ed85226
Add example imp agg
959c580
Add impact calc for exp pnt
7d3d6a0
rename method
f420842
Add select hazard by extent
0832451
Allow select by reg_id and extent
5cf7874
Add select tight_centroids_bounds
53779b6
Add docstring
49b3274
Optimize aggregate impact
8c79980
Remove useless empty line
a591028
Add kdtree nearest neighbour search
f21193c
User parallel computing by default
20f0f5d
Change matrix multiplication to avoid an expensive todense() call
ChrisFairless File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -34,28 +34,32 @@ | |
| from fiona.crs import from_epsg | ||
| import geopandas as gpd | ||
| import numpy as np | ||
| from numba import jit | ||
| import pandas as pd | ||
| import scipy as sp | ||
|
|
||
| import pyproj | ||
| import pycountry | ||
| import rasterio | ||
| import rasterio.crs | ||
| import rasterio.features | ||
| import rasterio.mask | ||
| import rasterio.warp | ||
| import scipy.interpolate | ||
| from shapely.geometry import Polygon, MultiPolygon, Point, box | ||
| from shapely.geometry import Polygon, MultiPolygon, MultiPoint, Point, box | ||
| import shapely.ops | ||
| import shapely.vectorized | ||
| import shapefile | ||
| from sklearn.neighbors import BallTree | ||
|
|
||
| from climada.util.constants import (DEF_CRS, SYSTEM_DIR, ONE_LAT_KM, | ||
| NATEARTH_CENTROIDS, | ||
| from climada.util.constants import (DEF_CRS,EARTH_RADIUS_KM, SYSTEM_DIR, | ||
| ONE_LAT_KM, NATEARTH_CENTROIDS, | ||
| ISIMIP_GPWV3_NATID_150AS, | ||
| ISIMIP_NATID_TO_ISO, | ||
| NONISO_REGIONS, | ||
| RIVER_FLOOD_REGIONS_CSV) | ||
| from climada.util.files_handler import download_file | ||
| import climada.util.hdf5_handler as u_hdf5 | ||
| import climada.util.interpolation as u_interp | ||
|
|
||
| pd.options.mode.chained_assignment = None | ||
|
|
||
|
|
@@ -76,6 +80,206 @@ | |
| MAX_DEM_TILES_DOWN = 300 | ||
| """Maximum DEM tiles to dowload""" | ||
|
|
||
| DIST_DEF = ['approx', 'haversine', 'euclidian'] | ||
| """Distances""" | ||
|
|
||
| METHOD = ['NN'] | ||
| """Interpolation methods""" | ||
|
|
||
| THRESHOLD = 100 | ||
| """Distance threshold in km. Nearest neighbors with greater distances are | ||
| not considered.""" | ||
|
|
||
|
|
||
| @jit(nopython=True, parallel=True) | ||
| def _dist_sqr_approx(lats1, lons1, cos_lats1, lats2, lons2): | ||
| """ | ||
| Compute squared equirectangular approximation distance. Values need | ||
| to be sqrt and multiplicated by ONE_LAT_KM to obtain distance in km. | ||
| """ | ||
| d_lon = lons1 - lons2 | ||
| d_lat = lats1 - lats2 | ||
| return d_lon * d_lon * cos_lats1 * cos_lats1 + d_lat * d_lat | ||
|
|
||
| def interpol_index(centroids, coordinates, method=METHOD[0], | ||
| distance=DIST_DEF[1], threshold=THRESHOLD): | ||
| """Returns for each coordinate the centroids indexes used for | ||
tovogt marked this conversation as resolved.
Show resolved
Hide resolved
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This docstring is not fully clear. Could it be made from explicit? |
||
| interpolation. | ||
|
|
||
| Parameters | ||
| ---------- | ||
| centroids : 2d array | ||
| First column contains latitude, second column contains longitude. Each | ||
| row is a geographic point | ||
| coordinates : 2d array | ||
| First column contains latitude, second column contains longitude. Each | ||
| row is a geographic point | ||
| method : str, optional | ||
| interpolation method to use. NN default. | ||
| distance : str, optional | ||
| distance to use. Haversine default | ||
| threshold : float | ||
| distance threshold in km over which no neighbor will be found. Those | ||
| are assigned with a -1 index | ||
|
|
||
| Returns | ||
| ------- | ||
| interp : numpy array | ||
| with so many rows as coordinates containing the centroids indexes | ||
| """ | ||
| if (method == METHOD[0]) & (distance == DIST_DEF[0]): | ||
| # Compute for each coordinate the closest centroid | ||
| interp = index_nn_aprox(centroids, coordinates, threshold) | ||
| elif (method == METHOD[0]) & (distance == DIST_DEF[1]): | ||
| # Compute the nearest centroid for each coordinate using the | ||
| # haversine formula. This is done with a Ball tree. | ||
| interp = index_nn_haversine(centroids, coordinates, threshold) | ||
| elif (method == METHOD[0]) & (distance == DIST_DEF[2]): | ||
| interp = index_nn_euclidian(centroids, coordinates, threshold) | ||
| else: | ||
| LOGGER.error('Interpolation using %s with distance %s is not ' | ||
| 'supported.', method, distance) | ||
| interp = np.array([]) | ||
| return interp | ||
|
|
||
| def index_nn_aprox(centroids, coordinates, threshold=THRESHOLD): | ||
| """ | ||
| Compute the nearest centroid for each coordinate using the | ||
| euclidian distance d = ((dlon)cos(lat))^2+(dlat)^2. For distant points | ||
| (e.g. more than 100km apart) use the haversine distance. | ||
|
|
||
| Parameters | ||
| ---------- | ||
| centroids : 2d array | ||
| First column contains latitude, second column contains longitude. Each | ||
| row is a geographic point | ||
| coordinates : 2d array | ||
| First column contains latitude, second column contains longitude. Each | ||
| row is a geographic point | ||
| threshold : float | ||
| distance threshold in km over which no neighbor will be found. Those | ||
| are assigned with a -1 index | ||
|
|
||
| Returns | ||
| ------- | ||
| assigend: array | ||
| with so many rows as coordinates containing the centroids indexes | ||
| """ | ||
|
|
||
| # Compute only for the unique coordinates. Copy the results for the | ||
| # not unique coordinates | ||
| _, idx, inv = np.unique(coordinates, axis=0, return_index=True, | ||
| return_inverse=True) | ||
| # Compute cos(lat) for all centroids | ||
| centr_cos_lat = np.cos(np.radians(centroids[:, 0])) | ||
| assigned = np.zeros(coordinates.shape[0], int) | ||
| num_warn = 0 | ||
| for icoord, iidx in enumerate(idx): | ||
| dist = _dist_sqr_approx(centroids[:, 0], centroids[:, 1], | ||
| centr_cos_lat, coordinates[iidx, 0], | ||
| coordinates[iidx, 1]) | ||
| min_idx = dist.argmin() | ||
| # Raise a warning if the minimum distance is greater than the | ||
| # threshold and set an unvalid index -1 | ||
| if np.sqrt(dist.min()) * ONE_LAT_KM > threshold: | ||
| num_warn += 1 | ||
| min_idx = -1 | ||
|
|
||
| # Assign found centroid index to all the same coordinates | ||
| assigned[inv == icoord] = min_idx | ||
|
|
||
| if num_warn: | ||
| LOGGER.warning('Distance to closest centroid is greater than %s' | ||
| 'km for %s coordinates.', threshold, num_warn) | ||
|
|
||
| return assigned | ||
|
|
||
| def index_nn_haversine(centroids, coordinates, threshold=THRESHOLD): | ||
| """Compute the neareast centroid for each coordinate using a Ball | ||
| tree with haversine distance. | ||
|
|
||
| Parameters | ||
| ---------- | ||
| centroids : 2d array | ||
| First column contains latitude, second column contains longitude. | ||
| Each row is a geographic point | ||
| coordinates : 2d array | ||
| First column contains latitude, second column contains longitude. Each | ||
| row is a geographic point | ||
| threshold : float | ||
| distance threshold in km over which no neighbor will be found. Those | ||
| are assigned with a -1 index | ||
|
|
||
| Returns | ||
| ------- | ||
| np.array | ||
| with so many rows as coordinates containing the centroids indexes | ||
| """ | ||
| # Construct tree from centroids | ||
| tree = BallTree(np.radians(centroids), metric='haversine') | ||
| # Select unique exposures coordinates | ||
| _, idx, inv = np.unique(coordinates, axis=0, return_index=True, | ||
| return_inverse=True) | ||
|
|
||
| # query the k closest points of the n_points using dual tree | ||
| dist, assigned = tree.query(np.radians(coordinates[idx]), k=1, | ||
| return_distance=True, dualtree=True, | ||
| breadth_first=False) | ||
|
|
||
| # Raise a warning if the minimum distance is greater than the | ||
| # threshold and set an unvalid index -1 | ||
| num_warn = np.sum(dist * EARTH_RADIUS_KM > threshold) | ||
| if num_warn: | ||
| LOGGER.warning('Distance to closest centroid is greater than %s' | ||
| 'km for %s coordinates.', threshold, num_warn) | ||
| assigned[dist * EARTH_RADIUS_KM > threshold] = -1 | ||
|
|
||
| # Copy result to all exposures and return value | ||
| return np.squeeze(assigned[inv]) | ||
|
|
||
|
|
||
| def index_nn_euclidian(centroids, coordinates, threshold=THRESHOLD): | ||
| """Compute the neareast centroid for each coordinate using a Ball | ||
| tree with haversine distance. | ||
|
|
||
| Parameters | ||
| ---------- | ||
| centroids : 2d array | ||
| First column contains latitude, second column contains longitude. | ||
| Each row is a geographic point | ||
| coordinates : 2d array | ||
| First column contains latitude, second column contains longitude. Each | ||
| row is a geographic point | ||
| threshold : float | ||
| distance threshold in km over which no neighbor will be found. Those | ||
| are assigned with a -1 index | ||
|
|
||
| Returns | ||
| ------- | ||
| np.array | ||
| with so many rows as coordinates containing the centroids indexes | ||
| """ | ||
| # Construct tree from centroids | ||
| tree = sp.spatial.cKDTree(np.radians(centroids)) | ||
| # Select unique exposures coordinates | ||
| _, idx, inv = np.unique(coordinates, axis=0, return_index=True, | ||
| return_inverse=True) | ||
|
|
||
| # query the k closest points of the n_points using dual tree | ||
| dist, assigned = tree.query(np.radians(coordinates[idx]), k=1, p=2, workers=-1) | ||
|
|
||
| # Raise a warning if the minimum distance is greater than the | ||
| # threshold and set an unvalid index -1 | ||
| num_warn = np.sum(dist * EARTH_RADIUS_KM > threshold) | ||
| if num_warn: | ||
| LOGGER.warning('Distance to closest centroid is greater than %s' | ||
| 'km for %s coordinates.', threshold, num_warn) | ||
| assigned[dist * EARTH_RADIUS_KM > threshold] = -1 | ||
|
|
||
| # Copy result to all exposures and return value | ||
| return np.squeeze(assigned[inv]) | ||
|
|
||
|
|
||
| def latlon_to_geosph_vector(lat, lon, rad=False, basis=False): | ||
| """Convert lat/lon coodinates to radial vectors (on geosphere) | ||
|
|
||
|
|
@@ -318,6 +522,40 @@ def dist_approx(lat1, lon1, lat2, lon2, log=False, normalize=True, | |
| raise KeyError("Unknown distance approximation method: %s" % method) | ||
| return (dist, vtan) if log else dist | ||
|
|
||
| def compute_geodesic_lengths(gdf): | ||
| """Calculate the great circle (geodesic / spherical) lengths along any | ||
| (complicated) line geometry object, based on the pyproj.Geod implementation. | ||
|
|
||
| Parameters | ||
| ---------- | ||
| gdf : gpd.GeoDataframe with geometrical shapes of which to compute the length | ||
|
|
||
| Returns | ||
| ------- | ||
| series : a pandas series (column) with the great circle lengths of the | ||
| objects in metres. | ||
|
|
||
| See also | ||
| -------- | ||
| * dist_approx() which also offers haversine distance calculation options | ||
| between specific points (not along any geometries however). | ||
| * interpolation.interpolate_lines() | ||
|
|
||
| Note | ||
| ---- | ||
| This implementation relies on non-projected (i.e. geographic coordinate | ||
| systems that span the entire globe) crs only, which results in | ||
| sea-level distances and hence a certain (minor) level of distortion; cf. | ||
| https://gis.stackexchange.com/questions/176442/what-is-the-real-distance-between-positions | ||
| """ | ||
| # convert to non-projected crs if needed | ||
| gdf_tmp = gdf.to_crs(DEF_CRS) if not gdf.crs.is_geographic else gdf.copy() | ||
| geod = gdf_tmp.crs.get_geod() | ||
|
|
||
| return gdf_tmp.apply(lambda row: geod.geometry_length( | ||
| row.geometry), axis=1) | ||
|
|
||
|
|
||
| def get_gridcellarea(lat, resolution=0.5, unit='km2'): | ||
| """The area covered by a grid cell is calculated depending on the latitude | ||
| 1 degree = ONE_LAT_KM (111.12km at the equator) | ||
|
|
@@ -923,7 +1161,7 @@ def assign_coordinates(coords, coords_to_assign, method="NN", distance="haversin | |
| # assign remaining coordsinates to their geographically nearest neighbor | ||
| if threshold > 0 and exact_assign_idx.size != coords_view.size: | ||
| not_assigned_idx_mask = (assigned_idx == -1) | ||
| assigned_idx[not_assigned_idx_mask] = u_interp.interpol_index( | ||
| assigned_idx[not_assigned_idx_mask] = interpol_index( | ||
| coords_to_assign, coords[not_assigned_idx_mask], | ||
| method=method, distance=distance, threshold=threshold) | ||
| return assigned_idx | ||
|
|
||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a new dependency. Is it absolutely necessary? Has this been discussed with @emanuel-schmid . Please, as a general convention, before adding new dependencies in a pull request discuss it openly to see whether a) it is really needed b) it is compatible with the environment c) does not add many extra implicit dependencies.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pyprojis not a new dependency for CLIMADA. It has been included in the requirements as an indirect dependency (through geopandas) for a long time and is used in several places in the code base. But maybe it's true that we should explicitly list it in the requirements file for the hypothetical case that geopandas might drop this dependency at some point in the future.Still, it's not a big thing that @Evelyn-M imports it here and we should probably have this discussion in a different place.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is a full list of packages that are explicitly imported in CLIMADA, but not listed in the requirements files:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good points! It seems to me that some of these requirements should really be explicitly listed in the requirements file.
I agree that the import in general does not necessarily need to be discussed here, but this is still exactly the goal of pull requests of this type: discuss the general architecture which includes the discussion about external dependencies.