Description
Description of the desired feature
As a user, I'd like to be able to pass in any type of data into a PyGMT function in a consistent way, be it:
- a
str
type filename - a
numpy.ndarray
- a
pandas.DataFrame
matrix - a
geopandas.GeoDataFrame
(Integration with geopandas to plot shapely geometries #608) - an
xarray.DataArray
grid, or even 1d arrays! - etc
Current state
We have a not so nice if... then
block that currently handles the Input/Output (I/O) of data into PyGMT. This results in an incosistent API. E.g. fig.plot
requires numpy.ndarray
inputs while pygmt.grdtrack
requires pandas.DataFrame
inputs. Users (e.g. at https://forum.generic-mapping-tools.org/t/pygmt-plot-errorbars) should not have to figure out which PyData format is needed.
Originally posted by @weiji14 in #946 (comment)
I was going to create a
virtualfile_from_data()
function to replace this common block of code in our functions:
with Session() as lib: # Choose how data will be passed in to the module if kind == "file": file_context = dummy_context(data) elif kind == "matrix": file_context = lib.virtualfile_from_matrix(data) elif kind == "vectors": file_context = lib.virtualfile_from_vectors(...)
Future state
The idea is to have a virtualfile_from_data()
function (open to other names) that wraps around the current dummy_context
/virtualfile_from_matrix
/virtualfile_from_vectors
/virtualfile_from_grid
functions, which makes things easier for both end users and developers wrapping new modules that handle data.
This universal virtualfile_from_data()
may pave the way to support more data inputs in the future:
geopandas.GeoDataFrame
inputs, useful forfig.plot
Integration with geopandas to plot shapely geometries #608- ObsPy
Trace
/Stream
/Inventory
objects ObsPy integration #967 io.StringIO
, useful forfig.legend
RFC Allow for io.StringIO inputs to certain modules #576pint
arrays? Allow passing in a column to plot for error_bars #735- other OGR/GDAL formats?
- etc
TLDR: Be able to throw any sort of data into a PyGMT function and it will 'just work'.
Note that xarray
is in the process of refactoring to an APIv2 (see pydata/xarray#4309) that has some similarities with this issue. They call it a 'Flexible Backend", a sort of extensible 'plugin'-type system which is a bit more future proofed to allow for new data formats (e.g. GPU backed arrays) coming in.
Are you willing to help implement and maintain this feature? Yes, but help is very welcome
Update: The virtualfile_from_data
function has been created in #961. Feel free people to refactor some of the plotting/data processing functions in https://github.com/GenericMappingTools/pygmt/tree/master/pygmt/src to use this (e.g. contour
, meca
, etc).
TODO:
- contour (Expand table-like input options for Figure.contour #1531)
- grdtrack (Refactor grdtrack to use virtualfile_from_data and improve i/o to pandas.DataFrame #1189)
- meca (Rewrite the meca function to support offsetting and labeling beachballs #1784)
- surface
- text (Refactor text to use virtualfile_from_data #1121)
- grdview/grdimage (Refactor grdview and grdimage to use virtualfile_from_data #1988)