Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: insert embedded resources via computed columns #3226

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

laurenceisla
Copy link
Member

@laurenceisla laurenceisla commented Feb 14, 2024

WIP

@laurenceisla laurenceisla changed the title Implement inserting embedding resources via computed columns feat: insert embedded resources via computed columns Feb 15, 2024
@laurenceisla laurenceisla force-pushed the computed-insert branch 2 times, most recently from d7c3354 to 85ad53e Compare April 3, 2024 01:41
@laurenceisla laurenceisla force-pushed the computed-insert branch 2 times, most recently from dfdb39f to 739af4e Compare April 25, 2024 22:30
@steve-chavez
Copy link
Member

739af4e here was merged on 7671d63

@laurenceisla
Copy link
Member Author

laurenceisla commented May 30, 2024

It seems that #2933 is definitely going to be an issue when Prefer: return=representation is requested. Doing something like modifying a table inside a function and then doing a JOIN to that same table, uses the OLD data:

-- "tags" is modified inside "upsert_posts"
-- but the LEFT JOIN will show the OLD unmodified "tags"
SELECT * FROM upsert_posts(...) p
LEFT JOIN tags t ON p.id = t.pid;

It could be solved by embedding the RETURNING value of the function, but it cannot be further embedded. For example:

Here, the tags(post, ...) function returns a post table type. As mentioned above the embed will not be possible since the insertion of tags are not known:

curl "http://localhost:3000/posts?columns=name,tags(name)?select=*,tags(name) \
  -H "Content-Type: application/json" \
  -H "Prefer: return=representation" \
  -d '{"name": "my post", "tags": [{"name": "a"},{"name": "b"}]}'
[{
  "name": "my post",
  "tags": []
}]

Making tags(posts, ...) return tags and then using that as an embedding by default (without adding it to ?select=) could be a solution (but further embeds like select=tags(*,other_embed(*)) are not possible):

curl "http://localhost:3000/posts?columns=name,tags(name) \
  -H "Content-Type: application/json" \
  -H "Prefer: return=representation" \
  -d '{"name": "my post", "tags": [{"name": "a"},{"name": "b"}]}'
[{
  "name": "my post",
  "tags": [{"name": "a"},{"name": "b"}]
}]

@jdgamble555
Copy link

It seems that #2933 is definitely going to be an issue when Prefer: return=representation is requested. Doing something like modifying a table inside a function and then doing a JOIN to that same table, uses the OLD data:

-- "tags" is modified inside "upsert_posts"
-- but the LEFT JOIN will show the OLD unmodified "tags"
SELECT * FROM upsert_posts(...) p
LEFT JOIN tags t ON p.id = t.pid;

It could be solved by embedding the RETURNING value of the function, but it cannot be further embedded. For example:

Here, the tags(post, ...) function returns a post table type. As mentioned above the embed will not be possible since the insertion of tags are not known:

curl "http://localhost:3000/posts?columns=name,tags(name)?select=*,tags(name) \
  -H "Content-Type: application/json" \
  -H "Prefer: return=representation" \
  -d '{"name": "my post", "tags": [{"name": "a"},{"name": "b"}]}'
[{
  "name": "my post",
  "tags": []
}]

Making tags(posts, ...) return tags and then using that as an embedding by default (without adding it to ?select=) could be a solution (but further embeds like select=tags(*,other_embed(*)) are not possible):

curl "http://localhost:3000/posts?columns=name,tags(name) \
  -H "Content-Type: application/json" \
  -H "Prefer: return=representation" \
  -d '{"name": "my post", "tags": [{"name": "a"},{"name": "b"}]}'
[{
  "name": "my post",
  "tags": [{"name": "a"},{"name": "b"}]
}]

This is because it is in a transaction. Since PostgREST uses CTEs, it can't return an after value before the transaction has been committed. I think all functions are transactional by default. Theoretically, you could write a nesting RETURNING * that keeps going and keeps returning the data. That would be complex, not impossible.

I think allowing nested data is more important than it returning the nested value. I'm not going to have a posts and tags table, I'm going to have a posts, tags, and post_tag table, so I will need nested... nested inserts to work.

Perhaps a work around would be an option to the select statement like select_after: true, that would run a new select statement outside of the transaction. This is not ideal, but speed in a mutation is not as important as speed in a query.

J

@laurenceisla
Copy link
Member Author

I think allowing nested data is more important than it returning the nested value.

Yes, I agree, this is the priority right now. Once we complete this, we can worry on how to return the data. Also the Location header could be easier to build and used as an alternative in the meantime.

@jdgamble555
Copy link

jdgamble555 commented Jul 4, 2024

Side note on this, I was asking Edge DB how they would handle the return, and they would have to do a query like this:

with user := (
  insert User {
    username := 'tim'
  }
),
todos := {(
  insert Todo {
    text := 'one',
    created_by := user
  }
), (
  insert Todo {
    text := 'two',
    created_by := user
  }
)},
select user {
  todos := select todos {
    id,
    text
  }
};

Although I'm not sure what this EdgeQL compiles to in postgres.

Note sure if this helps any though process.

This is inserting a user with a todo in example.

J

@wolfgangwalther
Copy link
Member

This is because it is in a transaction. Since PostgREST uses CTEs, it can't return an after value before the transaction has been committed.

This is not about transactions, but about snapshots. All CTEs in a single top-level statement run with the same snapshot, that's why they can't see modifications of other CTEs, except for via RETURNING.

So there does not need to be a COMMIT, just a new statement to see the changes.

Theoretically, you could write a nesting RETURNING * that keeps going and keeps returning the data. That would be complex, not impossible.

Yeah, this is the way to go. Note that we not only need to fetch the RETURNING data, but possibly also the already existing data, so we might need UNIONs there. Example:

[{
  "name": "my post",
  "tags": [{"name": "a"},{"name": "b"}]
}]

Assume that tag "a" already exists and tag "b" does not, yet. The relation insert will:

  • insert tag b.
  • insert the new post
  • insert two rows in post_tags to connect them.

To return the full response, we'll then need to embed both tags again, so we will need a UNION between SELECT * FROM tags and SELECT * FROM inserted_tags_cte.

@wolfgangwalther
Copy link
Member

One more note on this:

It seems that #2933 is definitely going to be an issue when Prefer: return=representation is requested.

I think it's in fact the opposite: The relational insert is the solution to this problem, if we do it right. If we support all the RETURNING stuff for the relation insert, then that's the only proper way out of the snapshot-not-visible problem, which we currently encounter when rolling our own relational inserts via RPCs.

@jdgamble555
Copy link

jdgamble555 commented Jul 6, 2024

Food for thought...

There are different situations that need to be considered that I talked about in the original post:

  • Insert new data with all new nested data
  • Insert new data with some new nested data
  • Update existing data with all new nested data (update all connections)
  • Update existing data with some new nested data (update some connections)

Also, there could be the option to work like an array:

  • Insert new data, update nested data (delete all, then insert)
  • Insert new data, update nested data (keep existing data, insert new data)

@wolfgangwalther - Could you give an example of what returning nested data might look like in a function?

J

@wolfgangwalther
Copy link
Member

Could you give an example of what returning nested data might look like in a function?

More of that here: #818 (comment)

@laurenceisla
Copy link
Member Author

laurenceisla commented Jul 6, 2024

It seems that #2933 is definitely going to be an issue when Prefer: return=representation is requested.

I think it's in fact the opposite: The relational insert is the solution to this problem, if we do it right.

Could you give an example of what returning nested data might look like in a function?

More of that here: #818 (comment)

That clarified it for me too. That would solve the main issue I was getting while working on this feature. So now I realize that my approach of implementing the virtual columns first is not the best one here (the design can be reused later though). I'll continue with the relational inserts right now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants