Description
Describe the bug
I am encountering an AttributeError when unpickling a DataContainer object from the xeofs package. The error message is:
AttributeError: 'DataContainer' object has no attribute '_allow_compute'
This occurs when I attempt to unpickle a dictionary containing a DataContainer
instance using dill. It seems that during the unpickling process, the __init__
method is not called, so the _allow_compute
attribute is not initialized. As a result, when __setitem__
is called during unpickling, it tries to access _allow_compute
, which doesn't exist.
Reproducible Minimal Working Example
import dill
from xeofs.data_container.data_container import DataContainer
# Create a DataContainer object and add an item
data_container = DataContainer()
data_container['key'] = 'value'
# Serialize the DataContainer object
with open('data_container.pkl', 'wb') as f:
dill.dump(data_container, f)
# Deserialize the DataContainer object
with open('data_container.pkl', 'rb') as f:
loaded_data_container = dill.load(f) # This line raises the AttributeError
Traceback
File ".../bwcore/forecasting/forecaster.py", line 76, in load_forecast_dict
forecast_dict = dill.load(f)
File ".../site-packages/dill/_dill.py", line 297, in load
return Unpickler(file, ignore=ignore, **kwds).load()
File ".../site-packages/dill/_dill.py", line 452, in load
obj = StockUnpickler.load(self)
File ".../site-packages/xeofs/data_container/data_container.py", line 24, in __setitem__
self._allow_compute[__key] = self._allow_compute.get(__key, True)
AttributeError: 'DataContainer' object has no attribute '_allow_compute'
Expected behavior
I expect the DataContainer
object to be successfully unpickled without errors, with all its attributes properly initialized.
Desktop (please complete the following information):
- OS: Ubuntu 24.04 LTS aarch64
- Python [3.10]
xeofs
version [3.0.2]
Additional context
I believe the issue arises because __init__
is not called during unpickling, so any attributes initialized there (like _allow_compute
) are missing. I came across this explanation which mentions that __init__
is not called during unpickling.
To address this, I did a monkey patch to the __setitem__
method in the DataContainer
class to check for the existence of _allow_compute
and initialize it if necessary:
class DataContainer(dict):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self._allow_compute = dict({k: True for k in self.keys()})
def __setitem__(self, __key: str, __value: DataArray) -> None:
super().__setitem__(__key, __value)
# Ensure _allow_compute exists
if not hasattr(self, '_allow_compute'):
self._allow_compute = {}
self._allow_compute[__key] = self._allow_compute.get(__key, True)
With this change, the unpickling process completes without errors.
I'm not entirely sure if this issue is due to something on my side, since my pipeline uses Python 3.10, and perhaps the pickling behavior differs in this version. If this is indeed a bug, perhaps implementing __getstate__
and __setstate__
methods in the DataContainer
class could ensure that _allow_compute
is properly handled during pickling and unpickling.
Could you please look into this issue? Any guidance or confirmation on whether this is a known issue or if there's a better way to handle it would be appreciated. If you need any more information let me know :)