Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add count_min_sketch aggregate and approx_count accessor #458

Merged
merged 1 commit into from
Jul 1, 2022

Conversation

rtwalker
Copy link
Contributor

@rtwalker rtwalker commented Jun 28, 2022

Add count_min_sketch aggregate and approx_count accessor.

The initial implementations work only on text columns

Resolves #373

#[pg_extern(immutable, parallel_safe, schema = "toolkit_experimental")]
pub fn approx_count(item: String, aggregate: Option<CountMinSketch>) -> i64 {
let sketch = aggregate.unwrap();
CountMinSketch::to_internal_countminsketch(&sketch).estimate(item)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the antipattern that is my biggest grievance with how we use the crates subdirectory. Converting the Postgres CountMinSketch object back into the crate object so that we can call a function on it is really not efficient. We can leave this in place for now, but we'll need to implement the estimate functionality directly on top of the Postgres structure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having implemented this two or three times so far myself, I couldn't agree more that it's a mess! But I think the answer is to break extension up into more crates, so that we can build business logic directly against pg_type structs without all this destruction and reconstruction and so that we can avoid giving rustc an ever-growing pile of files to handle when I make a one-line change to only one of those files. What stops us now is so much of what goes into a pg_type is locked inside the extension crate.

We've nibbled at the edges of this discussion a few times... how about I spend a day to prototype something like this when I get back from vacation?

Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be interested to see that!

extension/src/countminsketch.rs Show resolved Hide resolved
extension/src/countminsketch.rs Outdated Show resolved Hide resolved
extension/src/countminsketch.rs Outdated Show resolved Hide resolved
@rtwalker rtwalker changed the base branch from main to rw/count-min-sketch June 29, 2022 00:25
@rtwalker rtwalker force-pushed the rw/count-min-sketch branch from 9d6ef7f to 5e792c0 Compare June 29, 2022 02:02
@rtwalker rtwalker force-pushed the rw/count-min-sql-agg branch from b6e3064 to 63490fc Compare June 29, 2022 02:03
@rtwalker rtwalker force-pushed the rw/count-min-sketch branch from 5e792c0 to 22ca881 Compare June 29, 2022 14:45
@rtwalker rtwalker force-pushed the rw/count-min-sql-agg branch 2 times, most recently from ccc69fb to 10da4f3 Compare June 29, 2022 14:55
@rtwalker rtwalker force-pushed the rw/count-min-sketch branch from 22ca881 to 4c2fefe Compare June 30, 2022 01:33
@rtwalker rtwalker force-pushed the rw/count-min-sql-agg branch from 10da4f3 to 276bc46 Compare June 30, 2022 01:34
@rtwalker rtwalker marked this pull request as ready for review June 30, 2022 01:37
@rtwalker rtwalker force-pushed the rw/count-min-sketch branch from 0bb64e3 to 43944c2 Compare June 30, 2022 15:04
@rtwalker rtwalker force-pushed the rw/count-min-sql-agg branch 2 times, most recently from c328c10 to f316426 Compare June 30, 2022 17:33
Copy link
Contributor

@WireBaron WireBaron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just one last suggestion.

extension/src/countminsketch.rs Outdated Show resolved Hide resolved
@rtwalker rtwalker force-pushed the rw/count-min-sketch branch from 43944c2 to 564d6c4 Compare July 1, 2022 17:10
@rtwalker rtwalker force-pushed the rw/count-min-sql-agg branch from f316426 to cde5010 Compare July 1, 2022 17:14
@rtwalker rtwalker force-pushed the rw/count-min-sketch branch from 564d6c4 to b93ec95 Compare July 1, 2022 17:38
@rtwalker rtwalker changed the base branch from rw/count-min-sketch to main July 1, 2022 17:39
@rtwalker rtwalker force-pushed the rw/count-min-sql-agg branch from cde5010 to 4c4628e Compare July 1, 2022 20:09
@rtwalker
Copy link
Contributor Author

rtwalker commented Jul 1, 2022

bors r+

@bors
Copy link
Contributor

bors bot commented Jul 1, 2022

@bors bors bot merged commit 223ef47 into main Jul 1, 2022
@bors bors bot deleted the rw/count-min-sql-agg branch July 1, 2022 20:48
@rtwalker rtwalker mentioned this pull request Jul 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement an Approximate Count aggregate using Count-Min Sketch
3 participants