Skip to content

ValueError when loading a 'bad' set of variables from a view #19

Open
@robin-cls

Description

@robin-cls

Hi,

I encountered an issue when loading a subset of variables from a view:

Image

This error occurs when one tries to select variables from the view and its reference but with different dimensions. I think the problem is that the two datasets (one from the view, one from the reference) are loaded then merged, but if the dimensions does not match the merging process fails.

Below is a simple example to reproduce the issue. The core of the example is that we load a 1D variable from the reference and a 2D variable from the view:

from __future__ import annotations

from typing import Iterator
import datetime
import pprint

import dask.distributed as dist
import fsspec
import numpy

import zcollection as zc
import zcollection.tests.data as zc_data

# Create collection
zds = next(zc_data.create_test_dataset_with_fillvalue())
fs = fsspec.filesystem('memory')
cluster = dist.LocalCluster(processes=False)
client = dist.Client(cluster)
partition_handler = zc.partitioning.Date(('time', ), resolution='M')
collection = zc.create_collection('time', zds, partition_handler, '/my_collection', filesystem=fs)
collection.insert(zds)

# Create view
new_var = zds.metadata().variables['var1']
new_var_config = new_var.get_config()
new_var_config['name'] = 'var_view'
new_var = new_var.from_config(new_var_config)
view = zc.create_view('/my_view', zc.ViewReference(collection.partition_properties.dir, filesystem=fs), filesystem=fs)
view.add_variable(new_var)

# Query everything, this works properly
view.load()

# Query a 'bad' set of variables -> ValueError
view.load(selected_variables=['time', 'var_view'])

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions