-
Notifications
You must be signed in to change notification settings - Fork 400
Update gshhs/gshhg data to GMT 5.x and GSHHG 2.3.5 #311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Previously, only readboundaries_shp.py actually wrote data in a format that basemap could read. It looks like readboundaries.py has not actually been used to produce boundary data in a _very_ long time. Instead, it's been produced from the GSHHS shapefiles.
New GSHHS versions simplify long national/state borders that are perfectly linear in geographic space. However, basemap assumes that these lines are already subsampled, and does the interpolation in between the two points in geographic space. This leads to extreme errors for long borders with only two points (e.g. the US/Canada border). Old GSHHS versions stored interpolated versions, so this is an issue that didn't come up in the past. To get around this, we'll interpolate long linear segments and store the interpolated versions.
|
A note on filesize: Previously the Also, the diffs are rather hard to look at, as they include the data files, but only files in |
|
This is great work, Joe! I'll give it all a spin once I figure out the travel paperwork. I'll owe you a beer at the next SciPy! |
|
The script |
|
Similar error in |
|
and |
|
At first glance it looks like these are being caused by lines with identical start and end points (and only two points in the line). I'll add a bit of logic to filter those out and update things tonight. Sorry that slipped through! |
|
run |
|
Seems to fix all of the ones that were throwing exceptions, but |
|
Yeah, I'm not sure about that one... I'll dig into more here in a bit. Definitely seems odd! |
|
Joe, at this point, I am ready to simply accept this as-is because it is a major improvement from before. I am also thinking that the issues we are seeing in hires.py isn't a regression, but may have been there before. Could you please do a squash rebase on this PR before I merge, please? |
|
Just did a test, and the spurious lines are not a regression. I also tried running hires.py with resolution set to 'h' to see if the spurious lines persists, and I get an exception: In fact, this error is raised for any resolution besides 'f'... weird. |
| 1) Download the shapefile formatted versions of GSHHS and extract them in the | ||
| "utils" directory. The data is available at: | ||
| https://www.soest.hawaii.edu/pwessel/gshhg/ | ||
| (Last updated with: ftp://ftp.soest.hawaii.edu/gshhg/gshhg-shp-2.3.5-1.zip) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just noticed a new version of this file on the ftp server. I am going to try using this new process and see if I can get updated data files to work.
| x = [0, dist[i]] | ||
| new_x = np.arange(0, dist[i], spacing) | ||
| out_lon.extend(np.interp(new_x, x, lons[i:i+2])) | ||
| out_lat.extend(np.interp(new_x, x, lats[i:i+2])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering if the logic here is faulty. The arange() call will never include dist[i] in its output, so then it will have a fairly significant jump to the next spot.
| out_lat.extend(np.interp(new_x, x, lats[i:i+2])) | ||
|
|
||
| out_lon.append(lons[-1]) | ||
| out_lat.append(lats[-1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it ok that in some cases, the line will have the last point twice at the end (no interpolation), and in other cases, it will have an interpolated point and then the last point?
|
On a lark, I turned off drawing of rivers in |
| # not including them here... | ||
| for level in range(1, 6): | ||
| poly, polymeta = get_wdb_boundaries(resolution,level,rivers=True) | ||
| offset = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sneaky little bugger... Apparently this script has never been used for processing and packaging basemap's data. The offset keeps getting reset for each level in this script, but it doesn't do that for the other readboundary.py script because it doesn't have different river levels in the dump.
If one just moves this offset to above the loop, the data produced by readboundary_shp.py works perfectly for examples/hires.py at all resolutions!
I'll create a new branch that has all the code work from @joferkington in a commit and credited to him as well, and a redo of the data packaging in a separate commit.
|
PR #320 has been merged, so I'll close this one. |
This PR updates the coastline, river, and political boundary data to the latest version of GSHHS/GSHHG.
A few notes:
Based on the metadata that was stored, it's obvious that the previous data was from the shapefile versions of GSHHS and not the "raw" GMT version. Therefore, I've generated this version from the shapefiles as well, and changed the README to note that that's the preferred route. However, I've also fixed the GMT scripts so that GMT can now be used to generate the data as well.
One side effect of this is that GMT is no longer required to update the data in most cases. Only the shapefiles (a separate download) are necessary. GMT is still required to update the landmasks, but they do not change frequently at the resolution they're stored at and should only need very infrequent updates.
@WeatherGod - Finally managed to get all of this working. I've tried to keep it to as minimal edits as possible, so the scripts are still a bit crufty, but they work now.