-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Optional Read-Only RasterIO backend #790
Comments
👍 Rasterio shines at reading georeferencing metadata out of any file, and I guess it would be no big deal to treat the various info as attributes in an xarray dataset. It is also possible to do lazy reading out of rasterio files. (example with a geotiff file: https://github.com/fmaussion/salem/blob/master/salem/datasets.py#L263) |
Thanks @fmaussion. This was a helpful illustration of how this could be done. The Pros:
Cons:
|
Hi @jhamman , I tend to agree with your doubts. I'll still comment on your cons: To (1): I also think that xarray should avoid opening the projection can of worms. But the minimum things that xarray could do with rasterio is to read corner coordinates, dx and dy and define the two coordinates "x" and "y" out of it, without taking care of whether these are meters, degrees of arc or whatever. As long at the other rasterio file attributes are available as attribute of the To (2): some geotiffs files also have more than one band. I don't know if these bands are named or have metadata, so maybe xarray will have to take decisions about these names too (most probably 1, 2, 3...). I'll add a (3): rasterio depends on GDAL, which is huge and every now and then causes trouble on conda. This might also cause troubles to the continuous integration of xarray Altogether this might be more complicated than worth it, but maybe the rasterio folks have interest in this and might provide more support. If the idea for xarray accessors is implemented (#706 (comment)) this will allow more specific libraries like mine to do their own rasterio support at low cost. |
As for (1), I like your idea of leaving out the projection of the coordinates. That certainly makes things easier from the perspective of the backend. A I'm not concerned about the GDAL dependency (3). I would love to see more robust conda support for GDAL but that's another issue. This would be an optional backend, similar to Pynio, which isn't broadly available on conda. We could sort out the CI issues. So, if we took the simplest approach for implementing a |
@fmaussion - Here's an example of the basic functionality I'm thinking of implementing: https://anaconda.org/jhamman/rasterio_to_xarray/notebook A things to think about:
|
Hi @jhamman , this is close to how I would've done it, but I am maybe not the most qualified (probably the gis specialists from rasterio would be more helpful). But still, a couple of remarks from my side:
To your questions:
|
As for 1) I'm open to having more discussion on decoding the coordinates. My contention here is that are useful, even in their unstructured format, since it permits visualization out of the box. I'll ping @perrygeo for more on this.
|
@fmaussion @jhamman re: projecting coordinates to lat-lng. If you consider the raster cells as independent points, you can project them independently but they will likely not be regularly spaced. With few exceptions, if you need to maintain a regular grid, transforming data between projections will alter the shape of the array and require resampling (GDAL and rasterio call the process "warping" to reflect this). There are decisions and tradeoffs to be considered with the various resampling methods, selecting new extents and cell sizes, etc so it's typically not something you want to do on-the-fly for analyses. I think keeping the xarray coordinates as generic cartesian x-y makes sense, at least initially. Even in many GIS tools, analysis is done on a naive 2D plane and it's assumed that the inputs are of the same projection. I'd recommend doing any reprojection outside of xarray as a pre-processing step (with e.g. |
Because each point can be computed separately, we could straightforwardly add latitude/longitude as lazily computed 2D arrays (under "coordinates"), similarly to how we currently handle on-the-fly data rescaling. |
@shoyer - that's what I was thinking too. In fact, that's more or less what I did in this example, although this is a eager implementation: https://anaconda.org/jhamman/rasterio_to_xarray/notebook |
@jhamman Small observation from your notebook: It's not good to assume a negative y-step size. Rarely, I will come across a dataset that breaks convention with a positive y coordinate, meaning the first pixel is the lower-left corner, but at least the dataset is self-consistent. Rasterio works beautifully even with these black sheep, so we don't want an xarray reader to force the assumption. |
In a past life I made side library that wraps rasterio's API to take and return My most common use case was reading disparate rasters and aligning them to the same grid:
Reprojecting or clipping after reading xarray, like I do, goes against @perrygeo's recommendation. So maybe my example is moot, but I really like being able to do this programmatically in python, not CLI. Even if xarray's new rasterio backend only provides a reader (and not However, if you both expose the transform and realize the coordinate variables, it's possible for them to diverge as the single source of truth. In my above workflow, anytime I clip (step 2) or warp (step 3) data, my side library needed to manually re-set that DataArray's transform and coordinate variables. (This is surely out of scope for rasterio or xarray!) |
Thanks for the comments @IamJeffG. I haven't had any time recently to mess around with this so I haven't made any progress since the original notebook.
Agreed. My notebook was just a quick example of how this could work and it would certainly benefit from some generalization when applying this as an xarray backend.
Interesting. Any chance that's available for public viewing?
I only want to expose the reader and the necessary metadata to use the georeferenced dataset. Warping and other projection transformations would need to be handled downstream. |
@IamJeffG thanks for the example. I didn't realize you'd already been integrating xarray with rasterio already. Is that library open source?
To clarify, I just want to make sure that clipping/reprojecting/resampling remains an explicit step. That's a great approach you outlined, I just wouldn't want any software to make those assumptions for me!
Agreed. We've been doing more testing on this topic and found that rasterio generally works as expected for positive-y rasters. But there are still some built-in assumptions about negative-y rasters that cause spectacular failures. It's still a work in progress... |
Alas no, and in fact I don't even have access anymore; it's with my last employer. I believe we'll benefit from the fresh look as you guys are already doing! Mostly I aimed to point out the value of exposing the affine transform, in addition to the data & coordinate variables. 👍 Agreed to keep warping explicit, and not deal with it in this issue. |
RasterIO is a GDAL based library that provides Fast and direct raster I/O for use with Numpy and SciPy. I've just used it a bit but have been generally impressed with its support for a range of ASCII and binary raster formats. It might be a nice addition to the suite of backends already available in
xarray
.I'm envisioning a functionality akin to what we provide in the PyNIO backend, which is to say, read-only support for which ever file types RasterIO supports.
The text was updated successfully, but these errors were encountered: