-
Notifications
You must be signed in to change notification settings - Fork 15
Revision of Cate's Data Management
Related issue: https://github.com/CCI-Tools/cate-core/issues/153
This is the current DataSource
API (of Cate v0.7):
class DataSource(metaclass=ABCMeta):
@abstractmethod
def open_dataset(self,
time_range: Tuple[datetime, datetime]=None,
protocol: str=None) -> xr.Dataset:
pass
def sync(self,
time_range: Tuple[datetime, datetime]=None,
protocol: str=None,
monitor: Monitor=Monitor.NONE) -> Tuple[int, int]:
pass
Here are some suggestions how to improve the API and make it more appropriate wrt our current needs.
The most important concept change is that we would like to replace the implicit data source synchronization by an explicit creation of local data sources.
For this, the DataSource
method sync
shall be replaced by make_local
and shall receive a new parameter local_id
. The general contract is to generate a new local data source from an existing one which is usually remote but may also be another local one. The new local data source is implicitly added to the data store local
.
It is not really required to specify a protocol with open_dataset
when there is not implicit synchronization.
However, we'd like to add two new subset parameters:
-
region
- a polygon-like geometry for spatial subsets. If None, we mean global. -
var_names
- a list of variables names. If None, we mean all.
The new DataSource
method make_local
should also have all three subset parameters:
-
time_range
- a date range for temporal subsets. If None, we mean all. -
region
- a polygon-like geometry for spatial subsets. If None, we mean global. -
var_names
- a list of variables names. If None, we mean all.
We'd also like to use the new Like types for the subset parameters time_range
, region
, var_names
.
open_dataset
and make_local
should also work with vector data, not only xarray
datasets and netCDF files.
This may be implemented later.
Considering point 1 to 5, here is the new DataSource
API:
class DataSource(metaclass=ABCMeta):
@abstractmethod
def open_dataset(self,
time_range: DateRangeLike = None,
region: PolygonLike = None,
var_names: VarNamesLike = None) -> Any:
pass
@abstractmethod
def make_local(self,
local_name: str,
time_range: DateRangeLike = None,
region: PolygonLike = None,
var_names: VarNamesLike = None,
protocol: Optional[str] = None,
monitor: Monitor=Monitor.NONE) -> DataSource:
pass
-
open_dataset
always uses OPeNDAP. -
make_local
can use OPeNDAP. If global coverage requested, then HTTP file download is much faster. If subset parameters given, it should use OPeNDAP.
-
open_dataset
opens the dataset as usual from local files.xarray
/geopandas
ops may be performed if subset parameters given. -
make_local
ignoresprotocol
parameter.xarray
/geopandas
ops may be performed if subset parameters given. New files are being written. (Note,make_local
for local data sources without subset parameters is just a file copy).
7. Rename LocalFilePatternDataStore
and LocalFilePatternDataSource
to LocalDataStore
and LocalDataSource
For obvious reasons our local data store/sources are not bound to file patterns, instead they just comprise local files.