Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add distinctOn to QueryBuilder #805

Closed
wants to merge 14 commits into from
Closed

Add distinctOn to QueryBuilder #805

wants to merge 14 commits into from

Conversation

stwiname
Copy link

Description

Adds the ability to provide DISTINCT ON to Query Builder.

Performance impact

unknown

Security impact

unknown

Checklist

  • My code matches the project's code style and yarn lint:fix passes.
  • I've added tests for the new feature, and yarn test passes.
  • I have detailed the new feature in the relevant documentation.
  • I have added this feature to 'Pending' in the RELEASE_NOTES.md file (if one exists).
  • If this is a breaking change I've explained why.

Copy link
Member

@benjie benjie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please can you add a test that uses this functionality?

It's probably easiest to do so using our new testing framework (the one on v4 branch right now is a nightmare); if you branch off of the last good commit in #794 (currently 18b184d38d88f2b9cd7dbeb8bc47c358ec8cfee3) and

Add a test similar to the "named_query_builder" test:

https://github.com/graphile/graphile-engine/blob/18b184d38d88f2b9cd7dbeb8bc47c358ec8cfee3/packages/postgraphile-core/__tests__/queries/base/named_query_builder.toys.test.graphql

The line in the config

#> ToyCategoriesPlugin: true

is used here:

config.ToyCategoriesPlugin ? ToyCategoriesPlugin : null,

to add the plugin that uses it. That's where you'd add a plugin that uses the distinctOn. Then it's just a case of updating the assertions at the top of the test file and running the tests, doing so should auto-generate the SQL and JSON5 snapshots (if it doesn't, run it again with UPDATE_SNAPSHOTS=1 environmental variable set).

packages/graphile-build-pg/src/QueryBuilder.js Outdated Show resolved Hide resolved
@benjie
Copy link
Member

benjie commented Aug 26, 2022

No need to base off that specific commit any more, I've merged #794

@stwiname stwiname requested a review from benjie August 29, 2022 02:45
@stwiname stwiname marked this pull request as draft August 29, 2022 04:31
@stwiname
Copy link
Author

I've found a bug with order by I will improve tests further and fix.

@stwiname stwiname marked this pull request as ready for review August 30, 2022 02:23
Copy link
Member

@benjie benjie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent work on filling this out!

My main concerns right now are:

  1. connections - are they affected by distinctOn? Does changing the order break them? Will cursors work in all cases? Does totalCount still work as expected?
  2. relations - if we select a relation inside the selection set, will Postgres execute this efficiently for only the first match in the distinct on, or will it actually execute it for every match and then throw results away (which could be a significant performance cost)? Even if this is a cost, that can just be a cost related to distinctOn and we can document it, but it's important to understand it.

packages/graphile-build-pg/src/QueryBuilder.js Outdated Show resolved Hide resolved
packages/graphile-build-pg/src/QueryBuilder.js Outdated Show resolved Hide resolved
@@ -925,6 +950,7 @@ order by (row_number() over (partition by 1)) desc`; /* We don't need to factor
this.lock("limit");
this.lock("first");
this.lock("last");
this.lock("distinctOn");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned that because distinctOn changes the order, it may break connections (specifically cursors) in some cases. This is one of the reasons that we lock orderBy when we do. E.g. if you order by [USERNAME_ASC, NAME_DESC, ID_ASC] and then distinctOn: [NAME] then the order would be changed to NAME_DESC, USERNAME_ASC, ID_ASC and pagination would work differently.

If you don't want to dig into this, maybe the best bet is to forbid combining distinctOn with connections - e.g. before setting distinctOn you lock cursorComparator/selectCursor/etc and then you enforce that they aren't used (otherwise you throw).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added the appropriate locks as I don't have the time to dig into it. I hope I got the locks correct.

From the little research I did cursors should work correctly with DISTINCT ON

@stwiname
Copy link
Author

stwiname commented Sep 6, 2022

Excellent work on filling this out!

My main concerns right now are:

1. connections - are they affected by distinctOn? Does changing the order break them? Will cursors work in all cases? Does totalCount still work as expected?

2. relations - if we select a relation inside the selection set, will Postgres execute this efficiently for only the first match in the distinct on, or will it actually execute it for every match and then throw results away (which could be a significant performance cost)? Even if this is a cost, that can just be a cost related to distinctOn and we can document it, but it's important to understand it.

I'm struggling to answer these. But I have added a test that includes totalCount and the result doesn't take into consideration of distinct.

@benjie
Copy link
Member

benjie commented Sep 6, 2022

Would using distinctOn only in lists satisfy your needs? Then you don’t need to worry about aggregates, cursors, etc. (See docs on —simple-collections.)

@stwiname
Copy link
Author

stwiname commented Sep 8, 2022

Would using distinctOn only in lists satisfy your needs? Then you don’t need to worry about aggregates, cursors, etc. (See docs on —simple-collections.)

We need both cursors and aggregates.

@benjie
Copy link
Member

benjie commented Oct 5, 2023

I spent some time tidying this up, but unfortunately it breaks very easily. I was expecting it to break when you combine last with distinctOn because that reverses the ordering; but the moment you start trying to do cursor pagination it fails because cursor pagination uses conditions and those conditions change what the result of each group is. For example:

foo bar
1 a
1 b
1 c
2 a
2 b
2 c

If you do distinctOn:"foo" of this, then you'd expect results like:

[
  {foo: 1, bar: a}, // cursor = C1
  {foo: 2, bar: a}  // cursor = C2
]

But as soon as you do cursor pagination, this falls down. If you ask for the first record after cursor C1 then you don't get C2, instead what you actually get is the result {foo: 1, bar: b} because cursor pagination effectively says foo > 1 OR (foo = 1 AND bar > 'a').

To make distinctOn work with cursor pagination would involve significantly more work; and since that's a requirement of this ("We need both cursors and aggregates.") I'm going to close this PR for now because it does not address the need safely. I think the best bet to add distinctOn safely would be to do it in a subquery and then select the results from that, but that would be somewhat expensive in many cases.

@benjie benjie closed this Oct 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants