Skip to content

Add support for UPSERT#1048

Merged
steve-chavez merged 3 commits intoPostgREST:masterfrom
steve-chavez:upsert
Feb 21, 2018
Merged

Add support for UPSERT#1048
steve-chavez merged 3 commits intoPostgREST:masterfrom
steve-chavez:upsert

Conversation

@steve-chavez
Copy link
Member

Closes #256. UPSERT is done with the ON CONFLICT clause available on PostgreSQL >= 9.5.

Bulk UPSERT can be done by adding Prefer: resolution=ignore-duplicates/merge-duplicates headers, these do a INSERT .. ON CONFLICT(<pkCols>) DO NOTHING/DO UPDATE.
The ignore-duplicates/merge-duplicates Prefer values only work with tables that have a PK, if the table doesn't have a PK the Prefer will be ignored(no error).

Single row UPSERT can be done with PUT if the client specifies only PK columns as eq filters and the payload is identified by the URI, other restrictions include not allowing Content-Range/limit/offset(took into account the previous implementation).

@begriffs
Copy link
Member

🥂 Wow, this is a great feature. People have been wanting this one for a long long time, thank you!

I'll review the code and add notes.


(ActionDelete, TargetIdent _, Nothing) ->
case mutateSqlParts of
(ActionSingleUpsert, TargetIdent (QualifiedIdentifier tSchema tName), Just (PayloadJSON payload pType pKeys)) ->
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was confused below because pKeys made me think "primary keys" rather than "payload keys." Maybe name it pjKeys?

createWriteStatement sq mq (contentType == CTSingularJSON) False
(contentType == CTTextCSV) (iPreferRepresentation apiRequest) []
let (_, queryTotal, _, body) = extractQueryResult row
if queryTotal /= 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, queryTotal /= 1 is a clever way to detect that the PUT is being applied incorrectly, rather than comparing the filters in the request to primary keys etc etc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking further into the code, I see that mutateRequest checks for ActionSingleUpsert that the request contains equality filters on all primary keys. Does that make the queryTotal /= 1 here unnecessary, or what is the interaction between the logic?

Copy link
Member Author

@steve-chavez steve-chavez Jan 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is not clear from the code but I'm enforcing a different condition here, so first of all my interpretation of RFC 7231 is that the PUT uri must identify the payload being inserted so that the next GET can return the same exact payload right? So what I'm enforcing here is this:

-- table numbers is empty
it "fails if the uri doesn't identify the payload" $
  put "/numbers?id=eq.7" [str| [ { "id": 11, "name": "eleven"} ]|] `shouldRespondWith` 405

If that PUT would succeed the subsequent GET on /numbers?id=eq.7 would give an empty array instead of the payload. This is actually done by the sql query that WHEREs the payload according to the querystring parameters, if that WHERE gives false then nothing gets upserted and the queryTotal will be 0.

Copy link
Member Author

@steve-chavez steve-chavez Jan 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to not have the restriction of just allowing primary key cols eq filters first, but without that there's no reliable way of identifying the payload with the same sql query, consider:

-- id is persons pk, there's already many records with the name John
PUT /persons?name=eq.John
{ "id": 12, "first_name": "John", "last_name": "Doe" }

The WHERE would give true and the PUT would succeed with a queryTotal == 1 but since name is not the pk the subsequent GET /persons?name=eq.John would have many records and not just the same payload.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@begriffs Let me know if I'm interpreting the RFC right and if so how can this logic be made more simple/clear.

if queryTotal /= 1
then do
HT.condemn
return $ simpleError status405 [] "The URI must univocally identify the payload when using PUT"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error message might be unclear. What about, "PUT must be applied to exactly one resource." Maybe include a hint like, "Request attempted to modify ${queryTotal} table rows"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check my comments in #1048 (comment), if I'm misunderstanding something in the RFC I'll change it but for now not sure what would be the clearer message.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm haven't figured out how to make a request that causes this error message to happen. What do I need to do?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, now I see. What about adding a hint to the error message, like

{
  "message": "The URI must univocally identify the payload when using PUT",
  "hint": "Changing primary key values with PUT is not allowed"
}

The existing message is fine, but the hint could make it more concrete for people.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or "hint": "Payload values do not match resource in primary key column(s)"
whatever you think explains it best.

`shouldRespondWith`
[json|[ { "id": 1 }, { "id": 2 }, { "id": 4} ]|] { matchStatus = 201 , matchHeaders = [matchContentTypeJson] }

it "succeeds and ignores the Prefer: resolution header if the table has no PK" $
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about this alternative: if a table has no pk then make the Prefer header resolution=ignore-duplicates translate to ON CONFLICT DO NOTHING (which has an empty conflict target). But have resolution=merge-duplicates raise an error on tables without a pk because there's no obvious conflict target other than the primary keys. Currently it appears that the entire ON CONFLICT clause is dropped for these tables.

I feel like if a client explicitly requests merge-duplicates then it's an error to ignore it on tables without a pk. Is there a use-case you have in mind that I am overlooking?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chose to drop the ON CONFLICT clause for table with no pks because I thought that Prefer semantics meant the condition it's optional and shouldn't give an error, also according to RFC 7240:

A server that does not recognize or is unable to comply with
particular preference tokens in the Prefer header field of a request
MUST ignore those tokens and continue processing instead of signaling
an error.

I thought to later add the Preference-Applied feature so the client can check the success of the condition.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point that this is only a Prefer header so we shouldn't raise an error if the header can't be applied.

About tables with no primary keys, we could conceivably do ON CONFLICT with no columns in parentheses. The docs say:

If no list of column names is given at all, the default is all the columns of the table in their declared order

Only problem is this might cause an expensive sequential scan for duplicates rather than using indices, so it would put extra load on the server. Your choice to drop the ON CONFLICT for tables without keys is probably the right thing to do.

Would it be difficult to include a Preference-Applied response header for resolution? Might help people debug a surprising result.

@begriffs
Copy link
Member

This is really pleasant and readable code. I just had a few small comments, and a question about the proper behavior on tables lacking a primary key.

@jekirl
Copy link

jekirl commented Jan 27, 2018

Wow this is great! First time I have actually looked at the PostgREST code base, but just looking at the PR this is really well done. Thanks

@begriffs
Copy link
Member

begriffs commented Feb 9, 2018

I tried this out locally again and it's fun to use. 🎈

Noticed a few things:

  • "You must specify all columns in the payload when using PUT"
    • This returns HTTP 405, which is misleading because PUT is supported on the resource, it's just that the payload was wrong. Seems more like it should be a 400 error. (405 is perfect for the "Filters must include all and only primary key columns with 'eq' operator" though)
  • This was weird, but when I did a POST with a single item that duplicated a row in the table, and used Prefer: resolution=ignore-duplicates, then the server had an internal error.
    • {
        "details": "Row number 0",
        "message": "Row error: unexpected null"
      }
    • Seems like a haskell error because running the generated query in psql just gives
      ┌──────────────────┬────────────┬───────────┬──────────┐
      │ total_result_set │ page_total │ array_agg │ ?column? │
      ├──────────────────┼────────────┼───────────┼──────────┤
      │                  │          0 │ ∅         │          │
      └──────────────────┴────────────┴───────────┴──────────┘
      
      Including multiple items in the POST array makes the error go away, even when both items are dups of existing rows.

@steve-chavez
Copy link
Member Author

@begriffs Added latest corrections.

@steve-chavez steve-chavez force-pushed the upsert branch 2 times, most recently from 943ec9e to 283a994 Compare February 19, 2018 13:51
it "succeeds if not a single resource is created" $ do
request methodPost "/tiobe_pls" [("Prefer", "return=representation"), ("Prefer", "resolution=ignore-duplicates")]
[json|[ { "name": "Java", "rank": 1 } ]|] `shouldRespondWith`
[json|[]|] { matchStatus = 201 , matchHeaders = [matchContentTypeJson] }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If no records got created, then a 201 response doesn't seem accurate. Maybe a 200 in this case?

If it's a pain to change then it maybe it's not such a big deal. What do you think?

Copy link
Member Author

@steve-chavez steve-chavez Feb 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually tried that before by applying a queryTotal == 0 condition but other unrelated tests failed.

Like this one, the issue is the return=minimal always makes the queryTotal == 0 despite rows being inserted.

There's another way of detecting the insertion by using the xmax system column, tried to use that for PUT as well (since it should respond 200 for update and 201 for insert according to RFC) but it involves a not-so-easy refactoring.

@begriffs How about if I merge as it is and open an issue for that to correct it later?

Copy link
Member Author

@steve-chavez steve-chavez Feb 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also some old insert tests like this one https://github.com/begriffs/postgrest/blob/master/test/Feature/InsertSpec.hs#L129-L135 respond with 201 despite no row being inserted.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, since it's an existing issue, we can merge this PR and fix that one in its own.

(Cool xmax trick btw!)

@begriffs
Copy link
Member

I added one more note about response codes. Aside from that and rebasing it looks good to merge!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants