Dataset.data
property with ArrowTable
(aka. MemoryMappedTable
) not updated after filter
?
#6413
-
Hi to however is reading this! 🤗 I just wanted to know whether there's any reason why the I've seen that the indices are updated and the magic methods implemented within I understand this is not a big issue in most of the scenarios, but at the end what I want is to be able to So my other question would be, is there any efficient way to generate a new dataset after a filter without having to serialize to Python dict and then read from Python dict? Thanks in advance! cc @lhoestq @mariosasko |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Hi !
Because we only need to update the indices to keep and recreating a new pyarrow Table can take time and disk space.
You can remove the indices on top of the arrow table using |
Beta Was this translation helpful? Give feedback.
Hi !
Because we only need to update the indices to keep and recreating a new pyarrow Table can take time and disk space.
You can remove the indices on top of the arrow table using
ds = ds.flatten_indices()