Skip to content

Flexible Backend - AbstractDataStore definition  #4309

Closed
@aurghs

Description

@aurghs

I just want to do a small recap of the current proposals for the class AbstractDataStore refactor discussed with @shoyer, @jhamman, and @alexamici.

Proposal 1: Store returns:

  • xr.Variables with the list of filters to apply to every variable
  • dataset attributes
  • encodings

Xarray applies to every variable only the filters selected by the backend before building the xr.Dataset.

Proposal 2: Store returns:

  • xr.Variables with all needed filters applied (configured by xarray),
  • dataset attributes
  • encodings

Xarray builds the xr.Dataset

Proposal 3: Store returns:

  • xr.Dataset

Before going on I'd like to collect pros and cons. For my understanding:

Proposal 1

pros:

  • the backend is free to decide which representation to provide.
  • more control on the backend (? not necessary true, the backend can decide to apply all the filters internally and provide xarray and empty list of filters to be applied)
  • enable / disable filters logic would be in xarray.
  • all the filters (applied by xarray) should have a similar interface.
  • maybe registered filters could be used by other backends

cons:

  • confusing backend-xarray interface.
  • more difficult to define interfaces. More conflicts (registered filters with the same name...)
  • need more structure to define this interface, more code to maintain.

Proposal 2

pros:

  • interface backend-xarray is clearer / backend and xarray have well different defined tasks.
  • interface would be minimal and easier to implement
  • no intermediate representations
  • less code to maintain

cons:

  • less control on filters.
  • more complex explicit definition of the interface (every filter must understand what decode_times means in their case)
  • more complexity inside the filters

The minimal interface would be something like that:

class AbstractBackEnd:
    def __init__(self, path, encode_times=True, ..., **kwargs):  # signature of open_dataset
        raise NotImplementedError
    def get_variables():
        """Return a dictionary of variable name and xr.Variable"""
        raise NotImplementedError
    def get_attrs():
        """returns """
        raise NotImplementedError
    def get_encoding():
        """ """
        raise NotImplementedError
    def close(self): 
        pass

Proposal 3

pros w.r.t. porposal 2:

  • decode_coordinates is done by the backend as the other filters.

cons?

Any suggestions?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions