Add support for UPSERT#1048
Conversation
|
🥂 Wow, this is a great feature. People have been wanting this one for a long long time, thank you! I'll review the code and add notes. |
src/PostgREST/App.hs
Outdated
|
|
||
| (ActionDelete, TargetIdent _, Nothing) -> | ||
| case mutateSqlParts of | ||
| (ActionSingleUpsert, TargetIdent (QualifiedIdentifier tSchema tName), Just (PayloadJSON payload pType pKeys)) -> |
There was a problem hiding this comment.
I was confused below because pKeys made me think "primary keys" rather than "payload keys." Maybe name it pjKeys?
| createWriteStatement sq mq (contentType == CTSingularJSON) False | ||
| (contentType == CTTextCSV) (iPreferRepresentation apiRequest) [] | ||
| let (_, queryTotal, _, body) = extractQueryResult row | ||
| if queryTotal /= 1 |
There was a problem hiding this comment.
Nice, queryTotal /= 1 is a clever way to detect that the PUT is being applied incorrectly, rather than comparing the filters in the request to primary keys etc etc.
There was a problem hiding this comment.
Looking further into the code, I see that mutateRequest checks for ActionSingleUpsert that the request contains equality filters on all primary keys. Does that make the queryTotal /= 1 here unnecessary, or what is the interaction between the logic?
There was a problem hiding this comment.
Is not clear from the code but I'm enforcing a different condition here, so first of all my interpretation of RFC 7231 is that the PUT uri must identify the payload being inserted so that the next GET can return the same exact payload right? So what I'm enforcing here is this:
-- table numbers is empty
it "fails if the uri doesn't identify the payload" $
put "/numbers?id=eq.7" [str| [ { "id": 11, "name": "eleven"} ]|] `shouldRespondWith` 405If that PUT would succeed the subsequent GET on /numbers?id=eq.7 would give an empty array instead of the payload. This is actually done by the sql query that WHEREs the payload according to the querystring parameters, if that WHERE gives false then nothing gets upserted and the queryTotal will be 0.
There was a problem hiding this comment.
I tried to not have the restriction of just allowing primary key cols eq filters first, but without that there's no reliable way of identifying the payload with the same sql query, consider:
-- id is persons pk, there's already many records with the name John
PUT /persons?name=eq.John
{ "id": 12, "first_name": "John", "last_name": "Doe" }The WHERE would give true and the PUT would succeed with a queryTotal == 1 but since name is not the pk the subsequent GET /persons?name=eq.John would have many records and not just the same payload.
There was a problem hiding this comment.
@begriffs Let me know if I'm interpreting the RFC right and if so how can this logic be made more simple/clear.
src/PostgREST/App.hs
Outdated
| if queryTotal /= 1 | ||
| then do | ||
| HT.condemn | ||
| return $ simpleError status405 [] "The URI must univocally identify the payload when using PUT" |
There was a problem hiding this comment.
This error message might be unclear. What about, "PUT must be applied to exactly one resource." Maybe include a hint like, "Request attempted to modify ${queryTotal} table rows"
There was a problem hiding this comment.
Check my comments in #1048 (comment), if I'm misunderstanding something in the RFC I'll change it but for now not sure what would be the clearer message.
There was a problem hiding this comment.
I'm haven't figured out how to make a request that causes this error message to happen. What do I need to do?
There was a problem hiding this comment.
This test shows how to make the error happen: https://github.com/begriffs/postgrest/pull/1048/files#diff-385e57e2f6697b6568f08e4d151b1d7cR133
There was a problem hiding this comment.
Thanks, now I see. What about adding a hint to the error message, like
{
"message": "The URI must univocally identify the payload when using PUT",
"hint": "Changing primary key values with PUT is not allowed"
}The existing message is fine, but the hint could make it more concrete for people.
There was a problem hiding this comment.
Or "hint": "Payload values do not match resource in primary key column(s)"
whatever you think explains it best.
test/Feature/UpsertSpec.hs
Outdated
| `shouldRespondWith` | ||
| [json|[ { "id": 1 }, { "id": 2 }, { "id": 4} ]|] { matchStatus = 201 , matchHeaders = [matchContentTypeJson] } | ||
|
|
||
| it "succeeds and ignores the Prefer: resolution header if the table has no PK" $ |
There was a problem hiding this comment.
What about this alternative: if a table has no pk then make the Prefer header resolution=ignore-duplicates translate to ON CONFLICT DO NOTHING (which has an empty conflict target). But have resolution=merge-duplicates raise an error on tables without a pk because there's no obvious conflict target other than the primary keys. Currently it appears that the entire ON CONFLICT clause is dropped for these tables.
I feel like if a client explicitly requests merge-duplicates then it's an error to ignore it on tables without a pk. Is there a use-case you have in mind that I am overlooking?
There was a problem hiding this comment.
I chose to drop the ON CONFLICT clause for table with no pks because I thought that Prefer semantics meant the condition it's optional and shouldn't give an error, also according to RFC 7240:
A server that does not recognize or is unable to comply with
particular preference tokens in the Prefer header field of a request
MUST ignore those tokens and continue processing instead of signaling
an error.
I thought to later add the Preference-Applied feature so the client can check the success of the condition.
There was a problem hiding this comment.
Great point that this is only a Prefer header so we shouldn't raise an error if the header can't be applied.
About tables with no primary keys, we could conceivably do ON CONFLICT with no columns in parentheses. The docs say:
If no list of column names is given at all, the default is all the columns of the table in their declared order
Only problem is this might cause an expensive sequential scan for duplicates rather than using indices, so it would put extra load on the server. Your choice to drop the ON CONFLICT for tables without keys is probably the right thing to do.
Would it be difficult to include a Preference-Applied response header for resolution? Might help people debug a surprising result.
|
This is really pleasant and readable code. I just had a few small comments, and a question about the proper behavior on tables lacking a primary key. |
|
Wow this is great! First time I have actually looked at the PostgREST code base, but just looking at the PR this is really well done. Thanks |
|
I tried this out locally again and it's fun to use. 🎈 Noticed a few things:
|
|
@begriffs Added latest corrections. |
943ec9e to
283a994
Compare
| it "succeeds if not a single resource is created" $ do | ||
| request methodPost "/tiobe_pls" [("Prefer", "return=representation"), ("Prefer", "resolution=ignore-duplicates")] | ||
| [json|[ { "name": "Java", "rank": 1 } ]|] `shouldRespondWith` | ||
| [json|[]|] { matchStatus = 201 , matchHeaders = [matchContentTypeJson] } |
There was a problem hiding this comment.
If no records got created, then a 201 response doesn't seem accurate. Maybe a 200 in this case?
If it's a pain to change then it maybe it's not such a big deal. What do you think?
There was a problem hiding this comment.
Actually tried that before by applying a queryTotal == 0 condition but other unrelated tests failed.
Like this one, the issue is the return=minimal always makes the queryTotal == 0 despite rows being inserted.
There's another way of detecting the insertion by using the xmax system column, tried to use that for PUT as well (since it should respond 200 for update and 201 for insert according to RFC) but it involves a not-so-easy refactoring.
@begriffs How about if I merge as it is and open an issue for that to correct it later?
There was a problem hiding this comment.
Also some old insert tests like this one https://github.com/begriffs/postgrest/blob/master/test/Feature/InsertSpec.hs#L129-L135 respond with 201 despite no row being inserted.
There was a problem hiding this comment.
OK, since it's an existing issue, we can merge this PR and fix that one in its own.
(Cool xmax trick btw!)
|
I added one more note about response codes. Aside from that and rebasing it looks good to merge! |
- Ensure creating nothing on ignore-duplicates succeeds - Refactor locationF query
Closes #256. UPSERT is done with the
ON CONFLICTclause available on PostgreSQL >= 9.5.Bulk UPSERT can be done by adding
Prefer: resolution=ignore-duplicates/merge-duplicatesheaders, these do aINSERT .. ON CONFLICT(<pkCols>) DO NOTHING/DO UPDATE.The
ignore-duplicates/merge-duplicatesPrefer values only work with tables that have a PK, if the table doesn't have a PK thePreferwill be ignored(no error).Single row UPSERT can be done with PUT if the client specifies only PK columns as
eqfilters and the payload is identified by the URI, other restrictions include not allowingContent-Range/limit/offset(took into account the previous implementation).