You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the function set_data_source of base_streams.py , the handling of pd.DataFrame might not be ideal, considering it converts into a numpy array with the following snippet:
This might confuse user and raise an error when the intended output is a Dataframe rather than a numpy array. I believe the conversion to numpy array should be omitted:
Here's the full code for the modified set_data_source function.
def set_data_source(
self,
quantities: Union[List, DataFrame, np.ndarray] = None,
target: Optional[Union[List, DataFrame, np.ndarray]] = None,
time: Optional[Union[List, DataFrame, np.ndarray]] = None,
):
"""
This sets the data source by providing up to three iterables: ``quantities`` ,
``time`` and ``target`` which are assumed to be aligned.
For sensors data, we assume:
The format shape for 2D data stream (timesteps, n_sensors)
The format shape for 3D data stream (num_cycles, timesteps , n_sensors)
Parameters
----------
quantities : Union[List, DataFrame, np.ndarray]
Measured quantities such as sensors readings.
target : Optional[Union[List, DataFrame, np.ndarray]]
Target label in the context of machine learning. This can be
Remaining Useful Life in predictive maintenance application. Note this
can be an unobservable variable in real-time and applies only for
validation during offline analysis.
time : Optional[Union[List, DataFrame, np.ndarray]]
``dtype`` can be either ``float`` or ``datetime64`` to indicate the time
when the ``quantities`` were measured.
"""
self._sample_idx = 0
self._current_sample_quantities = None
self._current_sample_target = None
self._current_sample_time = None
if quantities is None and target is None:
self._quantities = list(np.arange(10))
self._target = list(np.arange(10))
self._time = list(np.arange(10))
self._target.reverse()
else:
self._quantities = quantities
self._target = target
self._time = time
# infer number of samples
if type(self._quantities).__name__ == "list":
self._n_samples = len(self._quantities)
elif type(self._quantities).__name__ == "DataFrame": # dataframe or numpy
self._n_samples = self._quantities.shape[0]
elif type(self._quantities).__name__ == "ndarray":
self._n_samples = self._quantities.shape[0]
self._set_data_source_type("dataset")
This could be a pull request but i'm quite occupied to start a new pull request atm!
The text was updated successfully, but these errors were encountered:
Hi @bangxiangyong ! Good to see you again. I guess the reason for the conversion was, that as of now, some mechanisms later in the process rely on the quantities being of type np.ndarray although this indeed is not ideal. This could apply to printing and buffering for instance, but I did not check. As a very first measure I would suggest to inform about the conversion and the expected output, in case a Dataframe was used to initialize. In a second step though, we should thoroughly check, what would be needed to allow for processing pd.DataFrames and implement these required changes.
In the function
set_data_source
ofbase_streams.py
, the handling of pd.DataFrame might not be ideal, considering it converts into a numpy array with the following snippet:This might confuse user and raise an error when the intended output is a Dataframe rather than a numpy array. I believe the conversion to numpy array should be omitted:
Here's the full code for the modified
set_data_source
function.This could be a pull request but i'm quite occupied to start a new pull request atm!
The text was updated successfully, but these errors were encountered: