You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
By introducing partitioning in the selector, the meaning of limit has shifted: The limit is currently applied per partition and not globally. This means that with a limit of 2, we can still have many data points if there are million partitions.
We should again shift the limit to be a global setting. This is not straightforward since we somehow need to sample across multiple partitions. One way might be generating indices that map into all partitions (i.e., count globally), and then before yielding a partition, only choose the samples whose indices are in our pre-generated list.
The text was updated successfully, but these errors were encountered:
pre-generating a list of keys does not work. however, we could count the number of potential rows that we select, and then generate indices from 0-len(result) and somehow say: pls give me these lines to avoid materialization
By introducing partitioning in the selector, the meaning of limit has shifted: The limit is currently applied per partition and not globally. This means that with a limit of 2, we can still have many data points if there are million partitions.
We should again shift the limit to be a global setting. This is not straightforward since we somehow need to sample across multiple partitions. One way might be generating indices that map into all partitions (i.e., count globally), and then before yielding a partition, only choose the samples whose indices are in our pre-generated list.
The text was updated successfully, but these errors were encountered: