Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resource embedding disambiguation #1419

Closed
wants to merge 6 commits into from

Conversation

steve-chavez
Copy link
Member

@steve-chavez steve-chavez commented Nov 25, 2019

Finishes the work started on #918. This should make resource embedding handle complex OLTP schemas in a non-ambiguous way.

The idea is to use the foreign key constraint name(as opposed to a column that it's part of the FK, previous discussion in #1078 (comment)) and the cardinality to do the disambiguation.

(These ideas share some similarities with Sybase's KEY JOIN, which also does an AUTO JOIN based on a FK)

Foreign key constraint name

With the fk name, the previous duck typing features are not necessary and "shorthands" can be defined in the schema. For example:

-- Having a m2o relationship on projects >- clients

-- We can define the FK constraint in this way
-- Note the "client" constraint
alter table projects add constraint client foreign key (client_id) references clients(id);

-- Or if the FK was already created we can rename it like
-- (this also provides a clear path for migration to our new version)
alter table projects rename constraint projects_client_id_fkey to client;

And the request with the embedding would be like:

-- Note the singular "client"
GET /projects?select=id,client(*)

-- If the fk is not named it'll be like
GET /projects?select=id,projects_client_id_fkey(*)

-- If there's only one relationship the name of the table("clients") 
-- can be used as usual
GET /projects?select=id,clients(*)

Another advantage with this is that we handle composite keys disambiguations with a short url.

Cardinality

In self reference cases the FK name is not enough to do the embedding since there's an m2o(many-to-one) and o2m(one-to-many) relationship with the same FK. So the cardinality can be specified to disambiguate:

GET /family_tree?select=*,antecessor:family_tree!m2o(*)

GET /family_tree?select=*,successors:family_tree!o2m(*)

many-to-many would be m2m. In this case the ability to specify the junction table remains the same as in #918.

Also added some tests for circular references.

Combined

These capabilities can be combined:

GET /<view_or_table>?select=*,<view_or_table>!<cardinality>!<fk_name>(*)

Though in most cases just one hint will be enough to disambiguate. The error is also more clear(see the tests) and points what hint to use to disambiguate.

Limitation

There's a limitation for junction tables that have more than one FK to the source and target. This is a rare case that cannot be disambiguated without complicating the syntax. In this case the table should be split(which I'd argue it's better design from a normalization standpoint). A clear error will be shown in this case(see the tests).

Edit: this limitation can be lifted later if the need arises. The solution would be to allow to specify two more fks. Like /users?select=tasks!m2m!fk1!fk2(*)(long url, though there's no other choice). This could be added without causing a breaking change.

@ruslantalpa
Copy link
Contributor

ruslantalpa commented Nov 26, 2019

I don't agree with this change for several reasons:

  • It's rarely who "names" their constraints, most of the time it's done with REFERENCES and the name is automatic and ugly (evidence you had to rename the constraints yourself). Most people do not even know they have names (It's hard enough getting them to use views for the api).
  • This "ties" the internal implementation (tables and constraints between them) to the exposed api views
  • The above two points suggest postgrest is for internal api and can't be exposed directly to the outside world, since it leaks internal implementation.
  • Even people that will know/understand this feature (and the ones that implement the api) will have to think twice about writing the "correct url" since they will have to mentally "transfer" the concept of constraint between tables to the relation between views (which have nothing to do with constraints). This becomes event harder for people (frontend developers) who will be using the api.
  • Specifying the fk_column is not duck taping, this is by design. Once as a frontend dev (user of the api) I learn that I can specify the exact fk column name based on which to embed, I can now write all my urls like that (like fully qualified) to future proof them, and I can do that because I know all the column names (and can look them up in openapi docs even), there is no way to "menially" do that with constraints names, the frontend dev will have to look up the internal database implementation detail (constraint name, which again, is autogenerated) and use that.
  • In cases where the "api" user is some frontend lib (similar to react admin connector), it becomes impossible to generate "fully qualified" urls based on the incoming request, just by looking at the request since it required "prior knowledge" of the relation name.
  • this is a breaking change and it adds 0 advantages over the "column" method and only introduces disadvantages and additional mental overhead for api users

@steve-chavez
Copy link
Member Author

steve-chavez commented Nov 26, 2019

It's rarely who "names" their constraints... Most people do not even know they have names... It's hard enough getting them to use views for the api.

That's because of a lack of documentation. Given the mediums, users will annotate and care for their schema. Recently I saw this project through postgresweekly. Note the usage of postgraphile specific annotations.
The work I'm asking here is minimal and truer to the db compared to that.

This "ties" the internal implementation (tables and constraints between them) to the exposed api views.. Even people that will know/understand this feature (and the ones that implement the api) will have to think twice about writing the "correct url" since they will have to mentally "transfer" the concept of constraint between tables to the relation between views (which have nothing to do with constraints).

Views obey constraints defined on their source tables, it's not true that they have nothing to do with constraints.
So, we can say views have source foreign keys. That is, views have source tables and source columns that can be part of a foreign key(which we detect). All of these points are also matter of documentation.

Specifying the fk_column is not duck taping, this is by design. Once as a frontend dev (user of the api) I learn that I can specify the exact fk column name based on which to embed, I can now write all my urls like that (like fully qualified) to future proof them, and I can do that because I know all the column names (and can look them up in openapi docs even), there is no way to "menially" do that with constraints names, the frontend dev will have to look up the internal database implementation detail (constraint name, which again, is autogenerated) and use that.

No need to look at the database or the openapi schema in case there's an ambiguous request. That's the whole point of the error message added in #1412. To "menially" disambiguate.

In cases where the "api" user is some frontend lib (similar to react admin connector), it becomes impossible to generate "fully qualified" urls based on the incoming request, just by looking at the request since it required "prior knowledge" of the relation name.

"prior knowledge" is also required for the previous "column" solution so I don't see the point there. In fact, with this fk solution the api creator could better guide the api consumer with a proper constraint name.

this is a breaking change and it adds 0 advantages over the "column" method and only introduces disadvantages and additional mental overhead for api users

With the "column" method it's basically impossible right now to disambiguate if the FKs are composite. Also as I mentioned, for api users it'll also make the disambiguation error message clearer.

There are many disadvantages with the #918 approach. First of all, the duck typing stuff is just "magical" and most users just randomly try the url(we have had many issues that proves this) that works for their desired result.

The worst of the previous approach is that is not deterministic or "future proof" since the requests can give different results if the schema becomes more complex.

@steve-chavez
Copy link
Member Author

steve-chavez commented Nov 26, 2019

I take the point about api consumers. The name "foreignKey" in the error could be a bit harsh.

How about this, I think this example would better illustrate the advantage of this approach.

Using these orders-addresses tables, rename their fk constraints to billing and shipping and make this ambiguous request:

GET /orders?select=*,addresses(*)"

You'd get a 300 Multiple Choices error with this body:

{
  "details": [
    {
        "cardinality": "m2o(many-to-one)",
        "relationship": "billing",
        "source": "test.orders",
        "target": "test.addresses"
    },
    {
        "cardinality": "m2o(many-to-one)",
        "relationship": "shipping",
        "source": "test.orders",
        "target": "test.addresses"
    }
  ],
  "hint": "Disambiguate by choosing a relationship from the `details` key",
  "message": "More than one relationship was found for orders and addresses"
}

I've renamed "foreignKey" to "relationship"(name comes from foreign key relationship) and omitted the fk cols.

Also note that the error message now no longer seems like it's "leaking internal details". It denotes a relationship between two resources.

So the user would have to disambiguate with:

GET /orders?select=*,addresses!billing(*)"
-- or
GET /orders?select=*,addresses!shipping(*)"

@ruslantalpa
Copy link
Contributor

ruslantalpa commented Nov 26, 2019

users will annotate and care

That's just one more thing to do (that people do not usually do) to be able have a good api when it's not needed

Views obey ...

I am talking about how things are on the db side (not postgrest), there is no concept of FK for views, only tables have this. This is why I say they have to think twice

No need to look ...

I don't think you understand what I am saying here.
Say I have a orders table with shipping_id column (no duplicate).
With column convention I can write shipping_address:shipping_id(...) (or whatever the format is) without looking anywhere because I know the columns in the model.
With constraint name, the only way to "fix" the path is to look at the table definition and check the constraint name (because there is no conflict at this time so you can not even make a request)

the duck typing stuff is just "magical"

No it's not, all the issues were for two reasons

  • format for specifying the the FK column is not documented (no wonder ppl tried different urls)
  • the first detected relation was used (without returning a error)
    Leave this "constraint name" feature undocumented and remove the error message and you've get the users behaving in the exact same way

composite FK

That probably is true but this PR also does not support that and also in 5 years I never saw ppl mentioning composite keys at all (I am sure it comes up even less in the embeding context)

The worst of the previous approach is that is not deterministic or "future proof" since the requests can give different results if the schema becomes more complex.

This is just not true. If you specify the FK column based on which to embed the request is always deterministic (just as with constraint name). If you do not specify the the FK column, and there is a conflict, you get a error (just as with the constraint name) this is in the master after your PR. If at the beginning, there is no conflict, and you do not specify the FK col, and later another column is added and there is conflict, old request will start to fail (error message for conflict), and this will be the exact same behavior as with the "constraint name"

To summarize:
You changed here two things (I do not have a problem with the error message on conflict, hence I did not object on the previous PR where it's implented)

  • the separator (which is just a matter of taste, so you break old clients for "taste")
  • the property by which to uniquely to identify the relation
    leaving the "composite key" problem aside,
    let's say internally each relation is stored as (constraint_name, fk_name, ......)
    each relation can be uniquely identified by both properties so constraint_name has 0 advantages over FK column name from that perspective.

With just documenting the current format in the "Resource embedding" section (which anyway has to be done for this pr or else the users will "randomly try urls" as before ) and in the "conflict error message" substitute "relationship": "shipping" with something like "relationship_column": "shipping_id" (or whatever). You get the exact same functionality without breaking the old clients and without this additional requirement to "care" for the constraint names and without leaking internal implementation to the api

Another problem I see with constraint_name approach (I did not see any examples)
say I have table A and B having two m2m relations through tables C and D
so in m2m cases, there are 2 constraints involved.
In old approach, I could specify after the . table C or D to "lock down" the path.
How do you specify this with constraint_name

@steve-chavez
Copy link
Member Author

steve-chavez commented Nov 26, 2019

If you specify the FK column based on which to embed the request is always deterministic (just as with constraint name). If you do not specify the the FK column, and there is a conflict, you get a error (just as with the constraint name) this is in the master after your PR.

The problem about the previous "column" approach is that even with the improved ambiguity detection the resulting error message is very confusing:

get "/users?select=*,id(*)" `shouldRespondWith`
[json|
{
"details": [
{
"cardinality": "one-to-many",
"source": "test.users[id]",
"target": "test.articleStars[userId]"
},
{
"cardinality": "one-to-many",
"source": "test.users[id]",
"target": "test.limited_article_stars[user_id]"
},
{
"cardinality": "one-to-many",
"source": "test.users[id]",
"target": "test.comments[commenter_id]"
},
{
"cardinality": "one-to-many",
"source": "test.users[id]",
"target": "test.users_projects[user_id]"
},
{
"cardinality": "one-to-many",
"source": "test.users[id]",
"target": "test.users_tasks[user_id]"
},
{
"cardinality": "many-to-many",
"junction": "private.article_stars[user_id][article_id]",
"source": "test.users[id]",
"target": "test.articles[id]"
},
{
"cardinality": "many-to-many",
"junction": "test.articleStars[userId][articleId]",
"source": "test.users[id]",
"target": "test.articles[id]"
},
{
"cardinality": "many-to-many",
"junction": "test.limited_article_stars[user_id][article_id]",
"source": "test.users[id]",
"target": "test.articles[id]"
},
{
"cardinality": "many-to-many",
"junction": "test.users_projects[user_id][project_id]",
"source": "test.users[id]",
"target": "test.projects[id]"
},
{
"cardinality": "many-to-many",
"junction": "test.users_projects[user_id][project_id]",
"source": "test.users[id]",
"target": "test.materialized_projects[id]"
},
{
"cardinality": "many-to-many",
"junction": "test.users_projects[user_id][project_id]",
"source": "test.users[id]",
"target": "test.projects_view[id]"
},
{
"cardinality": "many-to-many",
"junction": "test.users_projects[user_id][project_id]",
"source": "test.users[id]",
"target": "test.projects_view_alt[t_id]"
},
{
"cardinality": "many-to-many",
"junction": "test.users_tasks[user_id][task_id]",
"source": "test.users[id]",
"target": "test.tasks[id]"
},
{
"cardinality": "many-to-many",
"junction": "test.users_tasks[user_id][task_id]",
"source": "test.users[id]",
"target": "test.filtered_tasks[myId]"
}
],
"hint": "Disambiguate by choosing a relationship from the `details` key",
"message": "More than one relationship was found for users and id"
}
|]
{ matchStatus = 300
, matchHeaders = [matchContentTypeJson]
}

This is all because of trying to "get clever" with the regex. You can match undesired results in unexpected ways.

The way I see it, the duck typing was just a temporary hack. It was also frowned upon by @begriffs when it was proposed. Design-wise, using the fk names is much better and also simpler for the codebase.

Breaking changes are not bad if done gradually, in this release this would be the only major breaking change.

Btw, the previously merged #1412 would also break old clients, likely all of the clients that used the #918 additions(because it just threw the first found result). If we're already going to break clients, let's do this right and not expect someone reporting a composite key disambiguation case to break clients again.

the separator (which is just a matter of taste, so you break old clients for "taste")

It's a matter of not being ambiguous and overload the dot . for everything. Check this comment. You could think of it as a css !important if you'd like.

Say I have table A and B having two m2m relations through tables C and D
so in m2m cases, there are 2 constraints involved.

That works the same as before, and there are better tests about that.

@ruslantalpa
Copy link
Contributor

resulting error message is very confusing

so change that and make it clear, to me this argument is like saying "this technical implementation is bad because it's undocumented"?

"get clever" with the regex

I just rechecked the code to remember the regex part.
it used just so one could say client(...) instead of client:clients(...), it's not used for disambiguation. If this "hack" it is removed, then the resulting capability is the same as with this PR ... i.e you have use the table name and if there is a conflict you have to disambiguate by specifying the fk col.
This feature is a added bonus to have a nicer api for free based on any preexisting schema without needing to change the schema.
I thought this PR as breaking only apis for cases where "disambiguation" was used, i see now that it breaks even the requests like /projects?select=id,client(id), which is worse because it's extensively used. EVERY user of postgrest will need to make changes both to the backend (rename constraints) and to the client (js/mobile) to be able to upgrade without getting any new features/capabilites.

Breaking changes need to have a reason beyond "i think this implementation is more solid", the reason should be something like "i can not implement this new major/important feature without breaking some old functionality", anything less that that is disregard for the users.

It's a matter of not being ambiguous

If it were a comment form someone else it would be an argument but this is your comment so the "being confusing" is your opinion/bias, not a established fact (by lots of people pointing that out). Any issues related to resource embedding being sometimes confusing is strictly related to it not being properly documented (0 mentions on how to specify the used FK or even a hint of some kind) and not returning an error when there is a conflict and silently using the first match, it has no relation to the select parameter format. The . is also used in filters and since it's properly documented, there are no problems with it and no one is confused.

That works the same as before

In the comment that you linked you criticized the format because it meant different things in different situations (column vs table), and in this PR i see the part after ! can mean either a constraint name or a table ... so how is this better then column vs table?

If this was a green field thing, i'd be less against this (for example i don't have a problem with ! it looks ok) but you are breaking everyone's code for no benefit. Make no mistake about this, everyone using postgrest will have to make changes to both backend and client to upgrade, it's not just "in some cases", and this is very "fun" especially when you are talking about mobile apps.

Your feeling of this PR feeling more "solid" compared to current master comes from a mix of things which are not really true if you carefully look at them:

  • old implementation is hacky because or regexp -- the regexp is ONLY used to support client vs clients naming when embedding, people can use the correct table and rename it (as in this PR). This is a "feature" that you removed, you did not implement it in a more "solid" manner
  • people always had trouble with this -- this is a symptom of no docs and no error message, it has nothing to do with how it's implemented
  • the notation is ambiguous because it can mean different thing -- true, but so is this one, you are basically changing table.<column|table> to table!<constraint|table> .. i see no difference
  • error message on conflict is confusing -- agreed, it can be made just as clear as the on you pasted above, this has no bearing on ! vs . or constraint vs column

Talking about this has distilled this for me a bit, so the debate is:

  • ! vs ., i am not against the transition but it should be a long one (over 1-2 major versions) since it's just a matter of a different (simple) parser
  • removing use of client in place of clients (the regex part), this is a major breaking change and a bad one. That regexp is well contained within findRelation function and has no influence on the rest of the code. I see no reason for removing this but maybe not document it, as it is now (though i am not sure what would be the disadvantage of documenting it like "in some cases pgrst can detect client vs clients use, but this is not guaranteed" )
  • Considering both !. are supported, the use of table!<column|table> vs table!<constraint|table> can very well coexist, they are not in conflict. This is again very well restricted to the findRelations function and enabling this it's just a matter of more "match" cases in that function and the match order should be in the order constraint -> column_name -> guess_with_regexp

@steve-chavez
Copy link
Member Author

I thought this PR as breaking only apis for cases where "disambiguation" was used, i see now that it breaks even the requests like /projects?select=id,client(id), which is worse because it's extensively used. EVERY user of postgrest will need to make changes both to the backend (rename constraints) and to the client (js/mobile) to be able to upgrade without getting any new features/capabilites.

I already mentioned a migration path for requests like /projects?select=client(id)(just rename the fk to client). It will not involve any change on the frontend.

Breaking changes need to have a reason beyond "i think this implementation is more solid", the reason should be something like "i can not implement this new major/important feature without breaking some old functionality", anything less that that is disregard for the users.

Once again, composite key disambiguation cannot be done without this feature and the ambiguous error messages are confusing for the users. I'd say disregard for the users is to put known half-baked designs that are known to fail in obvious ways.

Documenting that previous design will also make PostgREST design seem "sloppy". It is obvious the "column" approach will fail.

Also, once again, #1412 is already a breaking change.

In the comment that you linked you criticized the format because it meant different things in different situations (column vs table), and in this PR i see the part after ! can mean either a constraint name or a table ... so how is this better then column vs table?

Only on the M2M cases it's interpreted as a junction(or link) table. For the other cases it'll be the fk constraint. This can be documented as a select=table!<embed_hint>, it's not ambiguous.

@steve-chavez
Copy link
Member Author

steve-chavez commented Dec 2, 2019

I've added a composite fk test case(based off a production deployment I have). In general these are rare but I wouldn't want to influence the schema design because of a shortcoming in pgrst(this would be similar to ORMs that don't support composite fks).

I've also improved the error message in case of ambiguity:

{
    "details": [
        {
            "cardinality": "m2o",
            "relationship": "billing",
            "source": "test.orders",
            "target": "test.addresses"
        },
        {
            "cardinality": "m2o",
            "relationship": "shipping",
            "source": "test.orders",
            "target": "test.addresses"
        }
    ],
    "hint": "By following the 'details' key, disambiguate the request by changing the url to /source?select=relationship(*) or /source?select=target!<relationship|cardinality|junction>(*)",
    "message": "More than one relationship was found for orders and addresses"
}

The hint should pretty much cover what needs to be done in case the user gets this error:

By following the 'details' key, disambiguate the request by changing the url to /source?select=relationship(*) or /source?select=target!<relationship|cardinality|junction>(*)

I don't think this error message can be shortened more and remain legible. In principle, we could keep backwards compatibility and the previous "column" approach. But that would make the error much more verbose(like the one I show here plus we'd need to add things like relationship_column) and that would be like a long stack trace that asks to be ignored. The idea is that the user can resolve the ambiguity without needing chat support(recent example).

Ideally we wouldn't have breaking changes, but this is sometimes necessary to keep us improving. This issue in particular would make #818 doable(ambiguity stopped me from pursuing that one).

Also I don't think this as a "major" breaking change(such as the one regarding search_path introduced in v0.5.0.0, that one broke a lot of functions). The impact should be normal(also considering the undocumented feature), and there's a migration path that will be announced on the CHANGELOG.

@bolerodan
Copy link

Adding my two cents into this, the renaming of FKs is no simple task for us, especially because our ORM (SQLAlchemy) generates these automatically (and thus can get gnarly and long), and Alembic which applies them via generated migration plans. This would require quite a lot of tedious updates to our python code, to manually specify and rename foreignkeys, and make sure our developers require that this is now necessary rather than letting the ORM/Migration planner just handle this automatically.

This also means our schema output is now dependant on also displaying foreign key constraint names. (We do not use the openAPI in postgrest as our primary API does a lot more and thus our schema inspection for users to view handles other use cases).

Generally for our use case, its easy for our schema to output model definitions and users understand column names that point to other tables to use in postgrest calls.

My two cents on the FK rename thing, this requirement will ripple through our code base.

@steve-chavez
Copy link
Member Author

@bolerodan Can you estimate how many cases of ambiguous resource embedding do you have? I remember you mentioned one case in #1406.

@steve-chavez
Copy link
Member Author

Ok, points taken. I've decided to minimize the impact of this breaking change.

  • Disambiguation with dot symbol . will still be supported but not documented.
  • Disambiguation with a column name that is part of the fk(only if has one) will still be supported. Docs will encourage the FK disambiguation(also encourage renaming it with an example) but will also mention that it's possible to disambiguate with the column name.
  • Self reference embedding will give the o2m relationship by default as it did previously. No need for specifying the cardinality. This will be documented.
  • Only duck typing will be dropped as this still would give too many matches and makes the error confusing. Also, I've checked and all of the past issues disambiguate with the column name. As far as I've seen only the original PR used the duck typing. A clear migration path for this would be renaming the FK and it won't break existing clients. We'll be clear about this in the CHANGELOG.

Since self reference will be a special case and handled as before, perhaps the ability to disambiguate with the cardinality(o2m/m2o/m2m) will not be of much use. I've also noted that this capability could be dangerous to expose to clients since it could break in unexpected ways. Hardcoding a /users?select=tasks!m2m(*) in a client would cause the embedding to break if another junction table is added. Specifying the fk/column would be more stable.

I'll give this more thought and see if cardinality disambiguation can be removed. Or perhaps just enable it for self reference cases.

@steve-chavez
Copy link
Member Author

This PR had too many breaking changes, besides removing the column name disambiguation(which could be made to work with the fk rename, without patching the client) the self reference cases were broken and in this case patching the client was needed.

So in #1430, I've left these options like before and instead I've focused on adding the FK capability(not dropping support for column) and removing the duck typing. Let's continue there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants