Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rfc: id reference syntax in UDF/View queries rewriting #105353

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

chengxiong-ruan
Copy link
Contributor

Informs: #83233

Release note: None

@chengxiong-ruan chengxiong-ruan requested a review from a team as a code owner June 22, 2023 15:44
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Collaborator

@rharding6373 rharding6373 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Nice work!

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @chengxiong-ruan, @DrewKimball, @knz, and @mgartner)


docs/RFCS/20230622_id_reference_syntax.md line 10 at r1 (raw file):

## Summary

This document proposes a new SQL syntax for referencing tables, columns, indexes

Do we also have to use the syntax for views, since we can rename them with ALTER VIEW ... RENAME TO as well?


docs/RFCS/20230622_id_reference_syntax.md line 51 at r1 (raw file):

Note that table and column aliases won’t be rewritten since they don’t matter in
the cross-reference context. Everything from system and virtual tables won’t be

nit: Do you mean that references to system and virtual tables won't be rewritten? The language here is ambiguous.

Also, can you add a system table to an example?


docs/RFCS/20230622_id_reference_syntax.md line 63 at r1 (raw file):

With the id references stored in UDF and view descriptors, SHOW CREATE function
needs to be adjusted so that it can rewrite the id references of objects back to
names which is more human-readable to users.

nit: s/is/are


docs/RFCS/20230622_id_reference_syntax.md line 72 at r1 (raw file):

## Other Problems

### Current sequence and UDT id reference

nit: s/id/ID


docs/RFCS/20230622_id_reference_syntax.md line 94 at r1 (raw file):

Original query: `SELECT ‘a’:::some_enum`

Serialized query: `SELECT ‘a’::::::@100105` -- 100105 is the UDT OID

Are the 2 :: vs 3 ::: vs 6 :::::: colons a typo or are they relevant to the syntax? If they're relevant, can that be called out in the text?

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r1, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @chengxiong-ruan, @DrewKimball, @mgartner, and @rharding6373)


docs/RFCS/20230622_id_reference_syntax.md line 1 at r1 (raw file):

- Feature Name: ID references syntax in UDF/View queries

There are a few technical issues in the RFC (see my comments below) but I'd like also to ask a general question:

The RFC currently has a syntax proposal but it does not explain how this change is to be implemented. Last time I looked into this (in PR #17501) I ran into the fundamental problem that the AST does not contain enough semantic context to do the replacement of names to IDs.

In your PR #99954 you are merely solving the problem of looking up a table from its numeric ID reference. (Which was already implemented btw - we already had by-ID table references before). Where is the solution to replace names by IDs? Is it even possible? The RFC should propose a compelling suggestion that this substitution is indeed possible and sketch how it should be performed.


docs/RFCS/20230622_id_reference_syntax.md line 6 at r1 (raw file):

- Authors: Chengxiong Ruan, Marcus Gartner
- RFC PR: [#105353](https://github.com/cockroachdb/cockroach/pull/105353)
- Cockroach Issue: [#83233](https://github.com/cockroachdb/cockroach/issues/83233)

It looks to me as if this RFC is also building a solution for #10083. I would gladly see in the RFC how the two problems connect to each other.


docs/RFCS/20230622_id_reference_syntax.md line 13 at r1 (raw file):

and user-defined functions by their descriptor ID. This syntax is necessary in
order to allow tables, columns, indexes and UDFs that are referenced in views and
UDFs to be renamed.

As I have mentioned in the PR, this RFC should mention the existing syntax to refer to objects by ID, and explain what the plan is about them.
I already have questioned the need to preserve two equivalent features side-by-side. We need to at least make a honest effort at reducing the cognitive burden of maintenance on our team and keep an eye on engineering velocity. This implies phasing out subsystems that are superseded. Marcus has done this already in the past with ordinal column references, maybe this is the time for another such change.


docs/RFCS/20230622_id_reference_syntax.md line 32 at r1 (raw file):

The main idea of proposed serialization syntaxes is pattern `{object_type:object_id}`.
`object_type` includes `TABLE`, `COLUMN`, `INDEX` and `FUNCTION`. `object_id` is

For the sake of economy at scale, please consider using single-character types here: R (relation, not table -- this needs to include views and sequences too), C, I, F.

Also you forgot numeric type references (UDTs).
And we might need also numeric region references for multi-region SQL.


docs/RFCS/20230622_id_reference_syntax.md line 33 at r1 (raw file):

The main idea of proposed serialization syntaxes is pattern `{object_type:object_id}`.
`object_type` includes `TABLE`, `COLUMN`, `INDEX` and `FUNCTION`. `object_id` is
table/function descriptor id, column id or index id. Rewriting a UDF body or view

nit: throughout the RFC: replace "id" by "ID".


docs/RFCS/20230622_id_reference_syntax.md line 67 at r1 (raw file):

## Risk of syntax conflict with other syntax?
`{xxx:yyy}` syntax seems not used by other statement grammars in cockroach’s
`sql.y` file. And, there isn't any use case of curly braces in the postgres' sql

nit: "postgres' sql" -> "postgres' SQL"

Throughout the RFC replace "sql" by "SQL". (Except in quoted file names of course)


docs/RFCS/20230622_id_reference_syntax.md line 82 at r1 (raw file):

Original query: `SELECT nextval(‘seq_name’)`

Serialized query: `SELECT nextval(104::REGCLASS)`  -- 104 is the sequence descriptor id

I'm confused by the explanation: is this replacement of 'seq_name' by 104::REGCLASS already implemented? Or is this a change this RFC is proposing?

If already implemented, please point to where this substitution takes place.


docs/RFCS/20230622_id_reference_syntax.md line 88 at r1 (raw file):

Original query: `SELECT ‘a’::some_enum`

Serialized query: `SELECT ‘a’:::@100105`  -- 100105 is the UDT OID

I have deep issues with this syntax :::@. There are multiple problems with this, but the foremost is that per your proposal above this should really be SELECT 'a'::{TYPE:100105}. Why did you write something different?


docs/RFCS/20230622_id_reference_syntax.md line 94 at r1 (raw file):

Previously, rharding6373 (Rachael Harding) wrote…

Are the 2 :: vs 3 ::: vs 6 :::::: colons a typo or are they relevant to the syntax? If they're relevant, can that be called out in the text?

I had the same question as Rachael, with the additional concern that :::::: is really not acceptable as a technical solution. Please explain where this comes from and, if that is already something implemented inside CockroachDB, what you propose to fix it.


docs/RFCS/20230622_id_reference_syntax.md line 109 at r1 (raw file):

since we reference it by ID.
[Here](https://github.com/cockroachdb/cockroach/blob/743486e5a13b9c9c037675462c61753a239ad872/pkg/sql/logictest/testdata/logic_test/udf#L1742)
is a logic test shows such kind of failures : This will be fixed if we reference

nit: no space before `:~.


docs/RFCS/20230622_id_reference_syntax.md line 110 at r1 (raw file):

[Here](https://github.com/cockroachdb/cockroach/blob/743486e5a13b9c9c037675462c61753a239ad872/pkg/sql/logictest/testdata/logic_test/udf#L1742)
is a logic test shows such kind of failures : This will be fixed if we reference
the table by id with the new syntax (yes a sequence is technically a table).

It's not clear to me what this "other problems" section is for in this RFC.

Are you listing problems that you plan to address with your proposal? If so, how?

Are you merely pointing out their existence and stating that they are "out of scope"? Then that needs to be mentioned in the intro at the top.
(I do think the various syntax examples you've mentioned above are deeply troubling and really need a solution.)

@bdarnell
Copy link
Contributor

What about TRUNCATE? In the past, truncating a table was equivalent to drop and recreate, so the table's ID would change. However, this appears to have changed at some point; I just tried it and saw the table keep the same ID. Do we now guarantee stability of all IDs used here (table, index, column, function) across all kinds of schema changes?

Copy link
Contributor Author

@chengxiong-ruan chengxiong-ruan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bdarnell Thanks for pointing this out. For TRUNCATE, we actually only create new empty indexes and drop old indexes. But it's true that we'll need to rewrite these IDs in UDF and Views whenever TRUNCATE happens. This can be done by looking up back-reference record in the table descriptor. Another case where ID can be changed is ALTER COLUMN TYPE, in some cases we create a new column with the new type, backfill it and drop the old one. We would want to do a id rewrite when this happens as well. We tracks information like which column of the table is referenced by which function or view. Other than that, I don't know of other cases ID can be changed. But I'll double check with the team.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @DrewKimball, @knz, @mgartner, and @rharding6373)


docs/RFCS/20230622_id_reference_syntax.md line 1 at r1 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

There are a few technical issues in the RFC (see my comments below) but I'd like also to ask a general question:

The RFC currently has a syntax proposal but it does not explain how this change is to be implemented. Last time I looked into this (in PR #17501) I ran into the fundamental problem that the AST does not contain enough semantic context to do the replacement of names to IDs.

In your PR #99954 you are merely solving the problem of looking up a table from its numeric ID reference. (Which was already implemented btw - we already had by-ID table references before). Where is the solution to replace names by IDs? Is it even possible? The RFC should propose a compelling suggestion that this substitution is indeed possible and sketch how it should be performed.

Thanks for pointing this out! Yes, the major challenge of this problem is the scoping of table (and others) names. And optbuilder has everything about the name scopes. The proposed solution in #101755 is to rewrite in place when optbuilder builds the UDF queries (note that this is a necessary step to also fetch all the cross-reference dependency information). PR #99954 has been closed since it's not needed at the moment. Sorry again for the confusion. But I'll add a section to explain how to implement this.


docs/RFCS/20230622_id_reference_syntax.md line 6 at r1 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

It looks to me as if this RFC is also building a solution for #10083. I would gladly see in the RFC how the two problems connect to each other.

thanks, will add that as well.


docs/RFCS/20230622_id_reference_syntax.md line 10 at r1 (raw file):

Previously, rharding6373 (Rachael Harding) wrote…

Do we also have to use the syntax for views, since we can rename them with ALTER VIEW ... RENAME TO as well?

yes, we need to mention VIEW in this sentence as well. as it was mentioned in the second sentence that we want to be able to rename objects referenced by views as well.


docs/RFCS/20230622_id_reference_syntax.md line 13 at r1 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

As I have mentioned in the PR, this RFC should mention the existing syntax to refer to objects by ID, and explain what the plan is about them.
I already have questioned the need to preserve two equivalent features side-by-side. We need to at least make a honest effort at reducing the cognitive burden of maintenance on our team and keep an eye on engineering velocity. This implies phasing out subsystems that are superseded. Marcus has done this already in the past with ordinal column references, maybe this is the time for another such change.

👍 Good call and thanks for the reminder. I'll create an issue to track the unification of this new syntax and the old one as the first step.


docs/RFCS/20230622_id_reference_syntax.md line 32 at r1 (raw file):

For the sake of economy at scale, please consider using single-character types here: R (relation, not table -- this needs to include views and sequences too), C, I, F.

very true!

Also you forgot numeric type references (UDTs)

good catch!

And we might need also numeric region references for multi-region SQL.

Do you mean those DDL statements like ALTER DATABASE...SET PRIMARY REGION...? Could you share some examples? Thanks


docs/RFCS/20230622_id_reference_syntax.md line 33 at r1 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

nit: throughout the RFC: replace "id" by "ID".

Good point, will do.


docs/RFCS/20230622_id_reference_syntax.md line 51 at r1 (raw file):

Previously, rharding6373 (Rachael Harding) wrote…

nit: Do you mean that references to system and virtual tables won't be rewritten? The language here is ambiguous.

Also, can you add a system table to an example?

Thanks for pointing this out. But you're right, I'll rephrase this a bit and add an example for that.


docs/RFCS/20230622_id_reference_syntax.md line 82 at r1 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

I'm confused by the explanation: is this replacement of 'seq_name' by 104::REGCLASS already implemented? Or is this a change this RFC is proposing?

If already implemented, please point to where this substitution takes place.

Sorry about the confusion. The syntax is an existing syntax. I'll add more info about this. Thanks for pointing this out.


docs/RFCS/20230622_id_reference_syntax.md line 88 at r1 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

I have deep issues with this syntax :::@. There are multiple problems with this, but the foremost is that per your proposal above this should really be SELECT 'a'::{TYPE:100105}. Why did you write something different?

Ah, this is also existing syntax to reference types by ID in expressions. I'll add more words to explains this.


docs/RFCS/20230622_id_reference_syntax.md line 94 at r1 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

I had the same question as Rachael, with the additional concern that :::::: is really not acceptable as a technical solution. Please explain where this comes from and, if that is already something implemented inside CockroachDB, what you propose to fix it.

so...:::::: is definitely a typo! thanks for catching that.
But :: is the syntax for type casting, and ::: is the syntax for type annotated expression.


docs/RFCS/20230622_id_reference_syntax.md line 110 at r1 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

It's not clear to me what this "other problems" section is for in this RFC.

Are you listing problems that you plan to address with your proposal? If so, how?

Are you merely pointing out their existence and stating that they are "out of scope"? Then that needs to be mentioned in the intro at the top.
(I do think the various syntax examples you've mentioned above are deeply troubling and really need a solution.)

Yeah, mentioning them basically wanted to let people know their existence and it's a non-goal for this specific project. And you're right that we should unify all the syntax eventually. An issue will be created to track this.

@mgartner
Copy link
Collaborator

@chengxiong-ruan have we implemented parts of this proposal? If so, can you finish this up and merge it?

@chengxiong-ruan
Copy link
Contributor Author

@chengxiong-ruan have we implemented parts of this proposal? If so, can you finish this up and merge it?

@mgartner there is a pr that was trying to do that, but I think we agreed to pause on that. Sorry that I should have closed the PR, but I lost track of it unfortunately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants