-
Notifications
You must be signed in to change notification settings - Fork 229
Add pygmt.gmtread to read a dataset/grid/image into pandas.DataFrame/xarray.DataArray #3673
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
d913c86
f456bf8
c3cbb6e
f2a4ce4
1dd97c6
7790ea3
e588008
40d12ee
fa1021d
c378225
7b749e0
8befa58
a758752
9d66cf4
a05383a
6ca4ef2
7851ced
084b87a
b21997c
a812317
1f0f158
957c7eb
6aef3ca
72afbfe
03de9b7
85c533d
663c76d
3ed1032
7d320f4
2e72ebe
6d634cc
4dc7974
69f5c45
061f5f2
4f0779e
a06ddca
82b80f5
a6c4ee7
b4a0b9d
37fc1de
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -172,6 +172,7 @@ Input/output | |
:toctree: generated | ||
|
||
load_dataarray | ||
read | ||
|
||
GMT Defaults | ||
------------ | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
""" | ||
PyGMT input/output (I/O) utilities. | ||
""" | ||
|
||
from pygmt.io.gmtread import gmtread | ||
from pygmt.io.load_dataarray import load_dataarray |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,125 @@ | ||
""" | ||
Read a file into an appropriate object. | ||
""" | ||
|
||
from collections.abc import Mapping, Sequence | ||
from pathlib import PurePath | ||
from typing import Any, Literal | ||
|
||
import pandas as pd | ||
import xarray as xr | ||
from pygmt.clib import Session | ||
from pygmt.helpers import build_arg_list, is_nonstr_iter | ||
from pygmt.src.which import which | ||
|
||
|
||
def gmtread( | ||
file: str | PurePath, | ||
kind: Literal["dataset", "grid", "image"], | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does GMT read also handle 'cube'? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes (xref: https://github.com/GenericMappingTools/gmt/blob/9a8769f905c2b55cf62ed57cd0c21e40c00b3560/src/gmtread.c#L75-L81), but need to wait for #3150, which may have upstream bugs. |
||
region: Sequence[float] | str | None = None, | ||
header: int | None = None, | ||
column_names: pd.Index | None = None, | ||
dtype: type | Mapping[Any, type] | None = None, | ||
index_col: str | int | None = None, | ||
) -> pd.DataFrame | xr.DataArray: | ||
Comment on lines
+16
to
+24
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. On second thought, I'm thinking if we should make There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It seems the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, not needed for grids/images, but we could still use |
||
""" | ||
Read a dataset, grid, or image from a file and return the appropriate object. | ||
|
||
The returned object is a :class:`pandas.DataFrame` for datasets, and | ||
:class:`xarray.DataArray` for grids and images. | ||
|
||
For datasets, keyword arguments ``column_names``, ``header``, ``dtype``, and | ||
``index_col`` are supported. | ||
|
||
Parameters | ||
---------- | ||
file | ||
The file name to read. | ||
kind | ||
The kind of data to read. Valid values are ``"dataset"``, ``"grid"``, and | ||
``"image"``. | ||
region | ||
The region of interest. Only data within this region will be read. | ||
column_names | ||
A list of column names. | ||
header | ||
Row number containing column names. ``header=None`` means not to parse the | ||
column names from table header. Ignored if the row number is larger than the | ||
number of headers in the table. | ||
dtype | ||
Data type. Can be a single type for all columns or a dictionary mapping | ||
column names to types. | ||
index_col | ||
Column to set as index. | ||
Comment on lines
+43
to
+53
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should we indicate in the docstring that these params are only used for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. At line 31:
|
||
|
||
Returns | ||
------- | ||
data | ||
Return type depends on the ``kind`` argument: | ||
|
||
- ``"dataset"``: :class:`pandas.DataFrame` | ||
- ``"grid"`` or ``"image"``: :class:`xarray.DataArray` | ||
|
||
|
||
Examples | ||
-------- | ||
Read a dataset into a :class:`pandas.DataFrame` object: | ||
|
||
>>> from pygmt import gmtread | ||
>>> df = gmtread("@hotspots.txt", kind="dataset") | ||
>>> type(df) | ||
<class 'pandas.core.frame.DataFrame'> | ||
|
||
Read a grid into an :class:`xarray.DataArray` object: | ||
|
||
>>> dataarray = gmtread("@earth_relief_01d", kind="grid") | ||
>>> type(dataarray) | ||
<class 'xarray.core.dataarray.DataArray'> | ||
|
||
Read an image into an :class:`xarray.DataArray` object: | ||
>>> image = gmtread("@earth_day_01d", kind="image") | ||
>>> type(image) | ||
<class 'xarray.core.dataarray.DataArray'> | ||
""" | ||
if kind not in {"dataset", "grid", "image"}: | ||
msg = f"Invalid kind '{kind}': must be one of 'dataset', 'grid', or 'image'." | ||
raise ValueError(msg) | ||
|
||
if kind != "dataset" and any( | ||
v is not None for v in [column_names, header, dtype, index_col] | ||
): | ||
msg = ( | ||
"Only the 'dataset' kind supports the 'column_names', 'header', 'dtype', " | ||
"and 'index_col' arguments." | ||
) | ||
raise ValueError(msg) | ||
|
||
kwdict = { | ||
"R": "/".join(f"{v}" for v in region) if is_nonstr_iter(region) else region, # type: ignore[union-attr] | ||
"T": {"dataset": "d", "grid": "g", "image": "i"}[kind], | ||
} | ||
|
||
with Session() as lib: | ||
with lib.virtualfile_out(kind=kind) as voutfile: | ||
lib.call_module( | ||
module="read", args=[file, voutfile, *build_arg_list(kwdict)] | ||
) | ||
|
||
match kind: | ||
case "dataset": | ||
return lib.virtualfile_to_dataset( | ||
vfname=voutfile, | ||
column_names=column_names, | ||
header=header, | ||
dtype=dtype, | ||
index_col=index_col, | ||
) | ||
case "grid" | "image": | ||
raster = lib.virtualfile_to_raster(vfname=voutfile, kind=kind) | ||
# Add "source" encoding | ||
source = which(fname=file) | ||
raster.encoding["source"] = ( | ||
source[0] if isinstance(source, list) else source | ||
) | ||
_ = raster.gmt # Load GMTDataArray accessor information | ||
return raster | ||
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,61 @@ | ||
""" | ||
Test the gmtread function. | ||
""" | ||
|
||
import importlib | ||
|
||
import numpy as np | ||
import pytest | ||
import rioxarray | ||
import xarray as xr | ||
from pygmt import gmtread, which | ||
|
||
_HAS_NETCDF4 = bool(importlib.util.find_spec("netCDF4")) | ||
_HAS_RIORASTERIO = bool(importlib.util.find_spec("rioxarray")) | ||
|
||
|
||
@pytest.mark.skipif(not _HAS_NETCDF4, reason="netCDF4 is not installed.") | ||
def test_io_gmtread_grid(): | ||
""" | ||
Test that reading a grid returns an xr.DataArray and the grid is the same as the one | ||
loaded via xarray.load_dataarray. | ||
""" | ||
grid = gmtread("@static_earth_relief.nc", kind="grid") | ||
assert isinstance(grid, xr.DataArray) | ||
expected_grid = xr.load_dataarray(which("@static_earth_relief.nc", download="a")) | ||
assert np.allclose(grid, expected_grid) | ||
Comment on lines
+17
to
+26
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also should have a similar test for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done in a6c4ee7. When I tried to add a test for reading datasets, I realized that the DataFrame returned by the
The last column We have three options:
I'm inclined to option 3. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Agree with this. We should also add dtype related checks for the tabular dataset tests in |
||
|
||
|
||
@pytest.mark.skipif(not _HAS_RIORASTERIO, reason="rioxarray is not installed.") | ||
def test_io_gmtread_image(): | ||
""" | ||
Test that reading an image returns an xr.DataArray. | ||
""" | ||
image = gmtread("@earth_day_01d", kind="image") | ||
assert isinstance(image, xr.DataArray) | ||
with rioxarray.open_rasterio( | ||
which("@earth_day_01d", download="a") | ||
) as expected_image: | ||
assert np.allclose(image, expected_image) | ||
|
||
|
||
def test_io_gmtread_invalid_kind(): | ||
""" | ||
Test that an invalid kind raises a ValueError. | ||
""" | ||
with pytest.raises(ValueError, match="Invalid kind"): | ||
gmtread("file.cpt", kind="cpt") | ||
|
||
|
||
def test_io_gmtread_invalid_arguments(): | ||
""" | ||
Test that invalid arguments raise a ValueError for non-'dataset' kind. | ||
""" | ||
with pytest.raises(ValueError, match="Only the 'dataset' kind supports"): | ||
gmtread("file.nc", kind="grid", column_names="foo") | ||
|
||
with pytest.raises(ValueError, match="Only the 'dataset' kind supports"): | ||
gmtread("file.nc", kind="grid", header=1) | ||
|
||
with pytest.raises(ValueError, match="Only the 'dataset' kind supports"): | ||
gmtread("file.nc", kind="grid", dtype="float") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
load_dataarray
function was put under thepygmt.io
namespace. Should we consider puttingread
underpygmt.io
too? (Thinking about whether we need a low-levelpygmt.clib.read
and high-levelpygmt.io.read
in my other comment).Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that sounds good. I have two questions:
read
source code inpygmt/io.py
, or restructureio.py
into a directory and put it inpygmt/io/read.py
instead?load_dataarray
function in favor of the newread
function?I'm expecting to have a
write
function that writes a pandas.DataFrame/xarray.DataArray into a tabular/netCDF fileGMT.jl also wraps the
read
module (xref: https://www.generic-mapping-tools.org/GMTjl_doc/documentation/utilities/gmtread/). The differences are:gmtread
, which I think is better sinceread
is a little to general.GMTVector
,GMTGrid
. [This doesn't work in PyGMT]Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think making the
io
directory sounds good, especially if you're planning on making awrite
function in the future.No, let's keep
load_dataarray
for now. Something I'm contemplating is to make an xarray BackendEntrypoint that uses GMTread
, so that users can then dopygmt.io.load_dataarray(..., engine="gmtread")
or something like that. Theload_dataarray
function would use this newgmtread
backend engine by default instead ofnetcdf4
.