Xarray backends or storage api's 

I think we should discuss whether or not to use xarray as the common interface for all of the benchmarks we evaluate as part of this project. There are pros and cons to using/not-using xarray. I bring this up because I noticed the direct use of h5py in #4. 

Pros include: 
1. Xarray provides a common interface wherein we can build real world science problems without writing custom interfaces to each storage api (thats what xarray does)
1. Within pangeo, we are promoting the use of high-level data-structures (typically xarray but Iris as well)

Cons include:
1. There are some known performance problems with xarray backends, some of which are not particularly storage format specific and can potentially be side-stepped by using the lower level storage api.
1. Using xarray assumes that we have implemented each api in a fair / equivalent way. We may introduce bias into one backend because of an incomplete/ill-performing  implementation. 

My vote would be to use xarray until we see it necessary to have more fine-grained tests. I think this will make implementation of real-world workflows easier and will be useful to us xarray developers in understanding chokepoints in the backends that we currently support.  

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Xarray backends or storage api's #6

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Xarray backends or storage api's #6

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions