-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: sampling selects #7030
Comments
@petermattis @RaduBerinde what do you suggest we do about this? |
Will this be added to cockroach, and if so when? It's been 3 years since the issue was reported, so just wondering |
deferring to @RaduBerinde @rytaft for comments |
I am not aware of this feature being on the roadmap (cc @awoods187), but it wouldn't be very hard to implement the BERNOULLI method described above given that we're already doing something very similar for table statistics collection. Adding a REPEATABLE option should be relatively easy, but I'm not sure it would be that useful since any changes in data distribution could change the result, even without changes to the data itself. Implementing the BERNOULLI method would actually be simpler than what we're already doing for Adding a SYSTEM sampling method would require a different approach that is aware of how data is stored in RocksDB. |
@Kumamon38 could you tell me a little bit more about how and why you'd like to use this potential feature? |
Hi and thanks everyone who answered and explained! |
We have marked this issue as stale because it has been inactive for |
Enterprise customer here, would like to throw our hat in this ring, we would be interested in this feature. Thank you. |
cc @vy-ton for visibility |
We have marked this issue as stale because it has been inactive for |
still relevant |
Still relevant, I would benefit from this feature. |
Postgres in 9.5 introduced the
TABLESAMPLE
clause:Prior to 9.5, similar things could be done manually: https://www.periscopedata.com/blog/how-to-sample-rows-in-sql-273x-faster.html https://stackoverflow.com/questions/8674718/best-way-to-select-random-rows-postgresql
Implementing something like
TABLESAMPLE
is likely relatively difficult, but we could check that a manual query which performs something similar is available and gets a somewhat decent query plan.Opened this issue because I was asked about it in a recent tech talk.
Jira issue: CRDB-6181
The text was updated successfully, but these errors were encountered: