-
Notifications
You must be signed in to change notification settings - Fork 39
WIP: Add functionality to virtualize GeoTIFFs using async_tiff #524
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks really nice already - ManifestStore
paying off!
|
||
from xarray import Dataset, Index | ||
import dataclasses |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem to be used in this file? (and the linter should have detected that...)
virtualizarr/readers/common.py
Outdated
@dataclasses.dataclass | ||
class ZstdProperties: | ||
level: int | ||
|
||
|
||
@dataclasses.dataclass | ||
class ShuffleProperties: | ||
elementsize: int | ||
|
||
|
||
@dataclasses.dataclass | ||
class ZlibProperties: | ||
level: int | ||
|
||
|
||
class CFCodec(TypedDict): | ||
target_dtype: np.dtype | ||
codec: Codec |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This stuff probably deserves to be in a dedicated codecs.py
file (which I think we already have?
virtual_backend_kwargs: Optional[dict] = None, | ||
reader_options: Optional[dict] = None, | ||
) -> xr.Dataset: | ||
raise NotImplementedError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once I've merged #522 I think we will be able to enable this reader immediately.
newargs = object_store.__getnewargs_ex__() | ||
at_store = ATStore(*newargs[0], **newargs[1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deserves a comment to explain whatever this is!
if not ifd.tile_height or not ifd.tile_width: | ||
raise NotImplementedError( | ||
f"TIFF reader currently only supports tiled TIFFs, but {path} has no internal tiling." | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly ignorant question, but can't we represent a non-tiled TIFF as a single virtual chunk?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there are three categories:
- no chunks (yes, would map to a single virtual chunk)
- tiled TIFFs (as implemented)
- striped TIFFs (basically only chunked along the
y
dimension
None of these are hard to represent as a manifest array, but it's just a matter of finding/using the proper tags to determine the chunk structure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do the striped TIFFs have compression applied independently along each line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, striped as single lines or in a group, same as any other tiling
chunks = (ifd.tile_height, ifd.tile_height) | ||
shape = (ifd.image_height, ifd.image_width) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is so neat
codec_configs = [ | ||
numcodec_config_to_configurable(codec.get_config()) for codec in codecs | ||
] | ||
dimension_names = ("y", "x") # Folllowing rioxarray's behavior |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably worth noting this in a public docstring somewhere
with xr.tutorial.open_dataset("air_temperature") as ds: | ||
ds.isel(time=0).rio.to_raster(filepath, driver="COG", COMPRESS="DEFLATE") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that you've been able to test this without adding data to the repo
assert isinstance(ds, xr.Dataset) | ||
expected = rioxarray.open_rasterio(geotiff_file).data.squeeze() | ||
observed = ds["0"].data.squeeze() | ||
np.testing.assert_allclose(observed, expected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you not use xarray.testing.assert_allclose
?
Co-authored-by: Tom Nicholas <tom@earthmover.io>
This PR is a new attempt to refactor the TIFFVirtualBackend to use async_tiff (closes #291) (would supersede #295, #297, #292)
docs/releases.rst
api.rst