WIP: Add functionality to virtualize GeoTIFFs using async_tiff #524

maxrjones · 2025-04-01T01:04:10Z

This PR is a new attempt to refactor the TIFFVirtualBackend to use async_tiff (closes #291) (would supersede #295, #297, #292)

TomNicholas

This looks really nice already - ManifestStore paying off!

pyproject.toml

TomNicholas · 2025-04-01T16:09:39Z

virtualizarr/readers/tiff.py


-from xarray import Dataset, Index
+import dataclasses


This doesn't seem to be used in this file? (and the linter should have detected that...)

TomNicholas · 2025-04-01T16:10:42Z

virtualizarr/readers/common.py

+@dataclasses.dataclass
+class ZstdProperties:
+    level: int
+
+
+@dataclasses.dataclass
+class ShuffleProperties:
+    elementsize: int
+
+
+@dataclasses.dataclass
+class ZlibProperties:
+    level: int
+
+
+class CFCodec(TypedDict):
+    target_dtype: np.dtype
+    codec: Codec


This stuff probably deserves to be in a dedicated codecs.py file (which I think we already have?

TomNicholas · 2025-04-01T16:38:06Z

virtualizarr/readers/tiff.py

+        virtual_backend_kwargs: Optional[dict] = None,
+        reader_options: Optional[dict] = None,
+    ) -> xr.Dataset:
+        raise NotImplementedError


Once I've merged #522 I think we will be able to enable this reader immediately.

TomNicholas · 2025-04-01T16:39:27Z

virtualizarr/readers/tiff.py

+        newargs = object_store.__getnewargs_ex__()
+        at_store = ATStore(*newargs[0], **newargs[1])


deserves a comment to explain whatever this is!

TomNicholas · 2025-04-01T16:48:35Z

virtualizarr/readers/tiff.py

+        if not ifd.tile_height or not ifd.tile_width:
+            raise NotImplementedError(
+                f"TIFF reader currently only supports tiled TIFFs, but {path} has no internal tiling."
+            )


Possibly ignorant question, but can't we represent a non-tiled TIFF as a single virtual chunk?

there are three categories:

no chunks (yes, would map to a single virtual chunk)

tiled TIFFs (as implemented)

striped TIFFs (basically only chunked along the y dimension

None of these are hard to represent as a manifest array, but it's just a matter of finding/using the proper tags to determine the chunk structure

Do the striped TIFFs have compression applied independently along each line?

yes, striped as single lines or in a group, same as any other tiling

TomNicholas · 2025-04-01T16:48:56Z

virtualizarr/readers/tiff.py

+        chunks = (ifd.tile_height, ifd.tile_height)
+        shape = (ifd.image_height, ifd.image_width)


This is so neat

TomNicholas · 2025-04-01T16:49:50Z

virtualizarr/readers/tiff.py

+        codec_configs = [
+            numcodec_config_to_configurable(codec.get_config()) for codec in codecs
+        ]
+        dimension_names = ("y", "x")  # Folllowing rioxarray's behavior


Probably worth noting this in a public docstring somewhere

TomNicholas · 2025-04-01T16:52:28Z

virtualizarr/tests/test_readers/conftest.py

+    with xr.tutorial.open_dataset("air_temperature") as ds:
+        ds.isel(time=0).rio.to_raster(filepath, driver="COG", COMPRESS="DEFLATE")


I like that you've been able to test this without adding data to the repo

TomNicholas · 2025-04-01T16:53:16Z

virtualizarr/tests/test_readers/test_tiff.py

+    assert isinstance(ds, xr.Dataset)
+    expected = rioxarray.open_rasterio(geotiff_file).data.squeeze()
+    observed = ds["0"].data.squeeze()
+    np.testing.assert_allclose(observed, expected)


Can you not use xarray.testing.assert_allclose?

Co-authored-by: Tom Nicholas <tom@earthmover.io>

maxrjones added 15 commits March 31, 2025 17:27

Add deps

20f23d0

Add option for dimension names

d3d2a1d

Move filter dataclasses to common

ed07300

Add test

18af38f

Add importorskip for tests

35aae28

Start on reader refactor

ffd17f7

Merge branch 'develop' into TIFF

6125624

Update typing

d65ea92

Consolidate test fixtures

7410544

Properly extract chunk keys for arrays with a single chunk

c3dbf27

Update test to specify chunk_key_encoding

68e7560

Specify compression in test

7ef0087

Add float64 support

5d55779

Merge branch 'improve-chunk-key-parsing' into TIFF

ec629a7

Update test

203cec9

maxrjones temporarily deployed to test-release April 1, 2025 01:04 — with GitHub Actions Inactive

abarciauskas-bgse mentioned this pull request Apr 1, 2025

Add VirtualiZarr reader for async-tiff NASA-IMPACT/veda-odd#88

Closed

TomNicholas added enhancement New feature or request readers labels Apr 1, 2025

TomNicholas reviewed Apr 1, 2025

View reviewed changes

Update pyproject.toml

6069ac7

Co-authored-by: Tom Nicholas <tom@earthmover.io>

maxrjones temporarily deployed to test-release April 1, 2025 17:06 — with GitHub Actions Inactive

maxrjones mentioned this pull request Apr 2, 2025

Tiff test error #526

Open

Merge branch 'develop' into TIFF

e8775a8

maxrjones had a problem deploying to test-release April 5, 2025 19:40 — with GitHub Actions Failure

sharkinsspatial mentioned this pull request Apr 7, 2025

No imagecodecs registered in zarr-python. #534

Open

maxrjones mentioned this pull request Apr 14, 2025

Improve virtualization of GeoTIFFs/COGs NASA-IMPACT/veda-odd#142

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Add functionality to virtualize GeoTIFFs using async_tiff #524

WIP: Add functionality to virtualize GeoTIFFs using async_tiff #524

maxrjones commented Apr 1, 2025

TomNicholas left a comment

TomNicholas Apr 1, 2025

TomNicholas Apr 1, 2025

TomNicholas Apr 1, 2025

TomNicholas Apr 1, 2025

TomNicholas Apr 1, 2025

maxrjones Apr 1, 2025

rabernat Apr 2, 2025

mdsumner Apr 2, 2025

TomNicholas Apr 1, 2025

TomNicholas Apr 1, 2025

TomNicholas Apr 1, 2025

TomNicholas Apr 1, 2025

		newargs = object_store.__getnewargs_ex__()
		at_store = ATStore(newargs[0], *newargs[1])

		chunks = (ifd.tile_height, ifd.tile_height)
		shape = (ifd.image_height, ifd.image_width)

		with xr.tutorial.open_dataset("air_temperature") as ds:
		ds.isel(time=0).rio.to_raster(filepath, driver="COG", COMPRESS="DEFLATE")

WIP: Add functionality to virtualize GeoTIFFs using async_tiff #524

Are you sure you want to change the base?

WIP: Add functionality to virtualize GeoTIFFs using async_tiff #524

Conversation

maxrjones commented Apr 1, 2025

TomNicholas left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment