Fix view/copy bug in epochs#12117
Fix view/copy bug in epochs#12117pablomainar wants to merge 8 commits intomne-tools:mainfrom pablomainar:main
Conversation
|
Hello! 👋 Thanks for opening your first pull request here! ❤️ We will try to get back to you soon. 🚴 |
for more information, see https://pre-commit.ci
mne/epochs.py
Outdated
There was a problem hiding this comment.
As a user, this would very much confuse me. How would you know exactly when a copy and when a view is returned? Can we please either clearly document this or make the behavior consistent?
There was a problem hiding this comment.
Please see the discussion in #12114 (we decided to implement a copy=True parameter). @pablomainar can you please update this PR accordingly? If these changes are too much, just let us know and we can take over. In any case, the warning should definitely not be issued by default.
There was a problem hiding this comment.
Right. Maybe the best option is to do what has been recently discussed in the issue thread. Add a copy argument to the get_data() method of Raw, Epochs and Evoked and have it default to True and always return a copy. If the user selects to pass False, then wtry to return a view, and if not possible (because of numpy indexing), warn that a copy is being returned.
There was a problem hiding this comment.
If the user selects to pass False, then wtry to return a view, and if not possible (because of numpy indexing), warn that a copy is being returned.
I think the most common case here is probably for the user "I don't care if it's a copy or not just make it as efficient as possible" often for passing to a sklearn classifier or so, which this proposal does not really allow -- copy=False is almost there but the user will get unnecessary warnings. I guess we could add something like copy=None for this, which would mean "view if possible, copy otherwise". But I think copy=False is actually a good choice for this use case as it keeps us in line with how copy works in other places, e.g.:
-
https://numpy.org/devdocs/reference/generated/numpy.array.html
If true (default), then the object is copied. Otherwise, a copy will only be made if array returns a copy, if obj is a nested sequence, or if a copy is needed to satisfy any of the other requirements
-
https://numpy.org/doc/stable/reference/generated/numpy.ndarray.astype.html
By default, astype always returns a newly allocated array. If this is set to false, and the dtype, order, and subok requirements are satisfied, the input array is returned instead of a copy.
-
https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html
It doesn't say this explicitly, but if you pass it non-
floatvalues then it does copy your data even if you docopy=False:Details
>>> from sklearn.decomposition import PCA >>> import numpy as np >>> x = np.ones((3, 3), int) >>> PCA(copy=False).fit_transform(x) array([[0., 0., 0.], [0., 0., 0.], [0., 0., 0.]]) >>> x array([[1, 1, 1], [1, 1, 1], [1, 1, 1]]) >>> x = np.ones((3, 3), float) >>> PCA(copy=False).fit_transform(x) array([[0., 0., 0.], [0., 0., 0.], [0., 0., 0.]]) >>> x array([[0., 0., 0.], [0., 0., 0.], [0., 0., 0.]])
So I think it's a pretty common practice to behave this way.
So my vote would be just write in the Notes something about how to get a guaranteed view on the data with get_data, since I think it's almost a corner case that they actually need it e.g. for inplace modification, and not bother emitting a warning when copy=False and it's not a view.
There was a problem hiding this comment.
agreed. copy=False should mean what it means in NumPy, i.e., "avoid copying if possible". I also agree that we should not warn if using a view is impossible, though a logger.info message would be fine if people think it's important to emit something
There was a problem hiding this comment.
I'd appreciate an info message.
There was a problem hiding this comment.
Ok agree! I will work on the PR during the weekend. I will submit a first version for Epochs and if we agree I'll also make the changes for Raw and Evoked for all of them to be consistent.
for more information, see https://pre-commit.ci
Fix mentioned in issue #12114
When Epoch data is preloaded and get_data is called with units argument, now a copy is created and returned. This is to avoid modifying the underlying Epochs._data when multiplying by the units factor.
Any other call to get_data will return a view of _data, which is dangerous and can easily break the Epochs object. Therefore also a warning has been added.
The test_epochs.py has been modify to test that a copy is returned in the case of preloaded and passing units.