Skip to content

AttributeError during unpickling DataContainer: '_allow_compute' attribute missing #239

Open
@n0rdp0l

Description

@n0rdp0l

Describe the bug
I am encountering an AttributeError when unpickling a DataContainer object from the xeofs package. The error message is:

AttributeError: 'DataContainer' object has no attribute '_allow_compute'

This occurs when I attempt to unpickle a dictionary containing a DataContainer instance using dill. It seems that during the unpickling process, the __init__ method is not called, so the _allow_compute attribute is not initialized. As a result, when __setitem__ is called during unpickling, it tries to access _allow_compute, which doesn't exist.

Reproducible Minimal Working Example

import dill
from xeofs.data_container.data_container import DataContainer

# Create a DataContainer object and add an item
data_container = DataContainer()
data_container['key'] = 'value'

# Serialize the DataContainer object
with open('data_container.pkl', 'wb') as f:
    dill.dump(data_container, f)

# Deserialize the DataContainer object
with open('data_container.pkl', 'rb') as f:
    loaded_data_container = dill.load(f)  # This line raises the AttributeError

Traceback

File ".../bwcore/forecasting/forecaster.py", line 76, in load_forecast_dict
    forecast_dict = dill.load(f)
  File ".../site-packages/dill/_dill.py", line 297, in load
    return Unpickler(file, ignore=ignore, **kwds).load()
  File ".../site-packages/dill/_dill.py", line 452, in load
    obj = StockUnpickler.load(self)
  File ".../site-packages/xeofs/data_container/data_container.py", line 24, in __setitem__
    self._allow_compute[__key] = self._allow_compute.get(__key, True)
AttributeError: 'DataContainer' object has no attribute '_allow_compute'

Expected behavior
I expect the DataContainer object to be successfully unpickled without errors, with all its attributes properly initialized.

Desktop (please complete the following information):

  • OS: Ubuntu 24.04 LTS aarch64
  • Python [3.10]
  • xeofs version [3.0.2]

Additional context
I believe the issue arises because __init__ is not called during unpickling, so any attributes initialized there (like _allow_compute) are missing. I came across this explanation which mentions that __init__ is not called during unpickling.

To address this, I did a monkey patch to the __setitem__ method in the DataContainer class to check for the existence of _allow_compute and initialize it if necessary:

class DataContainer(dict):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self._allow_compute = dict({k: True for k in self.keys()})

    def __setitem__(self, __key: str, __value: DataArray) -> None:
        super().__setitem__(__key, __value)
        # Ensure _allow_compute exists
        if not hasattr(self, '_allow_compute'):
            self._allow_compute = {}
        self._allow_compute[__key] = self._allow_compute.get(__key, True)

With this change, the unpickling process completes without errors.

I'm not entirely sure if this issue is due to something on my side, since my pipeline uses Python 3.10, and perhaps the pickling behavior differs in this version. If this is indeed a bug, perhaps implementing __getstate__ and __setstate__ methods in the DataContainer class could ensure that _allow_compute is properly handled during pickling and unpickling.

Could you please look into this issue? Any guidance or confirmation on whether this is a known issue or if there's a better way to handle it would be appreciated. If you need any more information let me know :)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions