Skip to content

Conversation

@joferkington
Copy link
Contributor

This PR updates the coastline, river, and political boundary data to the latest version of GSHHS/GSHHG.

A few notes:

Based on the metadata that was stored, it's obvious that the previous data was from the shapefile versions of GSHHS and not the "raw" GMT version. Therefore, I've generated this version from the shapefiles as well, and changed the README to note that that's the preferred route. However, I've also fixed the GMT scripts so that GMT can now be used to generate the data as well.

One side effect of this is that GMT is no longer required to update the data in most cases. Only the shapefiles (a separate download) are necessary. GMT is still required to update the landmasks, but they do not change frequently at the resolution they're stored at and should only need very infrequent updates.

@WeatherGod - Finally managed to get all of this working. I've tried to keep it to as minimal edits as possible, so the scripts are still a bit crufty, but they work now.

Previously, only readboundaries_shp.py actually wrote data in a format
that basemap could read.  It looks like readboundaries.py has not
actually been used to produce boundary data in a _very_ long time.
Instead, it's been produced from the GSHHS shapefiles.
New GSHHS versions simplify long national/state borders that are
perfectly linear in geographic space.  However, basemap assumes that
these lines are already subsampled, and does the interpolation in
between the two points in geographic space.  This leads to extreme
errors for long borders with only two points (e.g. the US/Canada
border). Old GSHHS versions stored interpolated versions, so this is an
issue that didn't come up in the past.

To get around this, we'll interpolate long linear segments and store the
interpolated versions.
@joferkington
Copy link
Contributor Author

joferkington commented Jul 18, 2016

A note on filesize: Previously the data folder was ~186MB. After these updates, it's ~189MB. Overall, the data is 3MB larger, though I think that just reflects new data/detail added to GSHHG.

Also, the diffs are rather hard to look at, as they include the data files, but only files in utils and lib/mpl_toolkits/basemap/data have changed (i.e. only the data itself and the scripts to update the data).

@WeatherGod
Copy link
Member

This is great work, Joe! I'll give it all a spin once I figure out the travel paperwork. I'll owe you a beer at the next SciPy!

@WeatherGod
Copy link
Member

The script examples/nytolondon.py fails with the following:

plotting Great Circle from New York to London (Mercator)
plotting Great Circle from New York to London (Gnomonic)
Traceback (most recent call last):
  File "/rd22/scratch/broot/Programs/basemap/examples/nytolondon.py", line 78, in <module>
    m.drawcountries(color='#e0e0e0')
  File "/rd22/scratch/broot/centos6/lib/python2.7/site-packages/mpl_toolkits/basemap/__init__.py", line 1880, in drawcountries
    self.cntrysegs, types = self._readboundarydata('countries')
  File "/rd22/scratch/broot/centos6/lib/python2.7/site-packages/mpl_toolkits/basemap/__init__.py", line 1379, in _readboundarydata
    if not poly.is_valid(): poly=poly.fix()
  File "_geoslib.pyx", line 242, in _geoslib.BaseGeometry.fix (src/_geoslib.c:2279)
  File "_geoslib.pyx", line 346, in _geoslib.Polygon.__init__ (src/_geoslib.c:3609)
IndexError: index -1 is out of bounds for axis 0 with size 0

@WeatherGod
Copy link
Member

Similar error in examples/test.py:

min/max etopo20 data:
-9026.625 6228.8125
/nas/home/broot/centos6/lib/python2.7/site-packages/numpy-1.12.0.dev0+e62aa19-py2.7-linux-x86_64.egg/numpy/ma/core.py:3113: FutureWarning: Currently, slicing will try to return a view of the data, but will return a copy of the mask. In the future, it will try to return both as views.
  FutureWarning
plotting Cylindrical Equidistant example ...
+a=6378137.0 +b=6356752.31425 +lon_0=0.0 +proj=eqc +units=m 
plotting Miller Cylindrical example ...
+lon_0=0.0 +y_0=14675034.4036 +R=6370997.0 +proj=mill +x_0=-0.0 +units=m 
plotting Gall Stereographic Cylindrical example ...
+lon_0=0.0 +y_0=10875972.1816 +R=6370997.0 +proj=gall +x_0=-0.0 +units=m 
plotting Cylindrical Equal Area example ...
+lon_0=0.0 +lat_ts=30.0 +R=6370997.0 +proj=cea +x_0=17333565.4622 +units=m +y_0=7356593.66591 
plotting Mercator example ...
+lon_0=0.0 +lat_ts=20.0 +R=6370997.0 +proj=merc +x_0=-0.0 +units=m +y_0=12138729.5033 
plotting Cassini-Soldner example ...
+lon_0=-2.0 +y_0=548282.112852 +R=6370997.0 +proj=cass +x_0=291666.49891 +units=m +lat_0=54.0 
Traceback (most recent call last):
  File "/rd22/scratch/broot/Programs/basemap/examples/test.py", line 145, in <module>
    m.drawcountries()
  File "/rd22/scratch/broot/centos6/lib/python2.7/site-packages/mpl_toolkits/basemap/__init__.py", line 1880, in drawcountries
    self.cntrysegs, types = self._readboundarydata('countries')
  File "/rd22/scratch/broot/centos6/lib/python2.7/site-packages/mpl_toolkits/basemap/__init__.py", line 1379, in _readboundarydata
    if not poly.is_valid(): poly=poly.fix()
  File "_geoslib.pyx", line 242, in _geoslib.BaseGeometry.fix (src/_geoslib.c:2279)
  File "_geoslib.pyx", line 346, in _geoslib.Polygon.__init__ (src/_geoslib.c:3609)
IndexError: index -1 is out of bounds for axis 0 with size 0

@WeatherGod
Copy link
Member

and examples/setwh.py:

Traceback (most recent call last):
  File "/rd22/scratch/broot/Programs/basemap/examples/setwh.py", line 29, in <module>
    m.drawcountries()
  File "/rd22/scratch/broot/centos6/lib/python2.7/site-packages/mpl_toolkits/basemap/__init__.py", line 1880, in drawcountries
    self.cntrysegs, types = self._readboundarydata('countries')
  File "/rd22/scratch/broot/centos6/lib/python2.7/site-packages/mpl_toolkits/basemap/__init__.py", line 1379, in _readboundarydata
    if not poly.is_valid(): poly=poly.fix()
  File "_geoslib.pyx", line 242, in _geoslib.BaseGeometry.fix (src/_geoslib.c:2279)
  File "_geoslib.pyx", line 346, in _geoslib.Polygon.__init__ (src/_geoslib.c:3609)
IndexError: index -1 is out of bounds for axis 0 with size 0

@joferkington
Copy link
Contributor Author

At first glance it looks like these are being caused by lines with identical start and end points (and only two points in the line). I'll add a bit of logic to filter those out and update things tonight. Sorry that slipped through!

@WeatherGod
Copy link
Member

run examples/hires.py. There seems to be spurious lines or something?

@WeatherGod
Copy link
Member

Seems to fix all of the ones that were throwing exceptions, but hires.py still looks messed up.

@joferkington
Copy link
Contributor Author

Yeah, I'm not sure about that one... I'll dig into more here in a bit. Definitely seems odd!

@WeatherGod
Copy link
Member

Joe, at this point, I am ready to simply accept this as-is because it is a major improvement from before. I am also thinking that the issues we are seeing in hires.py isn't a regression, but may have been there before. Could you please do a squash rebase on this PR before I merge, please?

@WeatherGod
Copy link
Member

Just did a test, and the spurious lines are not a regression. I also tried running hires.py with resolution set to 'h' to see if the spurious lines persists, and I get an exception:

[centos6] [broot@rd22 basemap]$ python examples/hires.py
Traceback (most recent call last):
  File "examples/hires.py", line 18, in <module>
    m.drawrivers()
  File "/nas/home/broot/centos6/lib/python2.7/site-packages/mpl_toolkits/basemap/__init__.py", line 2010, in drawrivers
    self.riversegs, types = self._readboundarydata('rivers')
  File "/nas/home/broot/centos6/lib/python2.7/site-packages/mpl_toolkits/basemap/__init__.py", line 1379, in _readboundarydata
    if not poly.is_valid(): poly=poly.fix()
  File "_geoslib.pyx", line 242, in _geoslib.BaseGeometry.fix (src/_geoslib.c:2279)
  File "_geoslib.pyx", line 346, in _geoslib.Polygon.__init__ (src/_geoslib.c:3609)
IndexError: index -1 is out of bounds for axis 0 with size 0

In fact, this error is raised for any resolution besides 'f'... weird.

1) Download the shapefile formatted versions of GSHHS and extract them in the
"utils" directory. The data is available at:
https://www.soest.hawaii.edu/pwessel/gshhg/
(Last updated with: ftp://ftp.soest.hawaii.edu/gshhg/gshhg-shp-2.3.5-1.zip)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noticed a new version of this file on the ftp server. I am going to try using this new process and see if I can get updated data files to work.

x = [0, dist[i]]
new_x = np.arange(0, dist[i], spacing)
out_lon.extend(np.interp(new_x, x, lons[i:i+2]))
out_lat.extend(np.interp(new_x, x, lats[i:i+2]))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if the logic here is faulty. The arange() call will never include dist[i] in its output, so then it will have a fairly significant jump to the next spot.

out_lat.extend(np.interp(new_x, x, lats[i:i+2]))

out_lon.append(lons[-1])
out_lat.append(lats[-1])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it ok that in some cases, the line will have the last point twice at the end (no interpolation), and in other cases, it will have an interpolated point and then the last point?

@WeatherGod
Copy link
Member

On a lark, I turned off drawing of rivers in hires.py, and the artifacts are completely gone! So, the faulty data is strictly the river data. When left alone (full resolution), they have weird spurious data which is probably getting filtered out at the lower resolutions, but the filtering might be introducing data corruption, perhaps?

# not including them here...
for level in range(1, 6):
poly, polymeta = get_wdb_boundaries(resolution,level,rivers=True)
offset = 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sneaky little bugger... Apparently this script has never been used for processing and packaging basemap's data. The offset keeps getting reset for each level in this script, but it doesn't do that for the other readboundary.py script because it doesn't have different river levels in the dump.

If one just moves this offset to above the loop, the data produced by readboundary_shp.py works perfectly for examples/hires.py at all resolutions!

I'll create a new branch that has all the code work from @joferkington in a commit and credited to him as well, and a redo of the data packaging in a separate commit.

@WeatherGod WeatherGod mentioned this pull request Sep 23, 2016
@WeatherGod
Copy link
Member

PR #320 has been merged, so I'll close this one.

@WeatherGod WeatherGod closed this Sep 23, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants