Replies: 1 comment 4 replies
-
I've been looking for something like this and would be very excited if it existed. I have two usecases:
For 2) having some notion of order preserved would be nice but not required. As far as I can tell the best current option for this is converting to a ray dataset and using It's also possible my use case would be achieved by a window function implementation (which I saw yall working on). It would also be possible to achieve my objectives if I could repartition with an exact number of rows per partition (or even partition boundaries as dask allows) plus a map_partition (I believe your udfs are effectively doing a map_partition even though the docs don't make promises that each time the function is called its with the data from one partition), or if I could assign an incrementing global index (which I know is expensive but would make it possible). I know you are trying to avoid some of these things. Really excited about the project, by the way. Just moved from dask as you can probably tell and its quite refreshing to have a library which works as advertised. |
Beta Was this translation helpful? Give feedback.
-
Summary
Retrieval of rows to the driver from a DataFrame through a streaming interface
Please upvote this discussion and share your use-cases if this is a feature you would like to see implemented!
Use-cases
Proposal
Extensions to this API should be made for:
Beta Was this translation helpful? Give feedback.
All reactions