-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rfc: id reference syntax in UDF/View queries rewriting #105353
base: master
Are you sure you want to change the base?
rfc: id reference syntax in UDF/View queries rewriting #105353
Conversation
Informs: cockroachdb#83233 Release note: None
eded7b8
to
014e3e9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @chengxiong-ruan, @DrewKimball, @knz, and @mgartner)
docs/RFCS/20230622_id_reference_syntax.md
line 10 at r1 (raw file):
## Summary This document proposes a new SQL syntax for referencing tables, columns, indexes
Do we also have to use the syntax for views, since we can rename them with ALTER VIEW ... RENAME TO
as well?
docs/RFCS/20230622_id_reference_syntax.md
line 51 at r1 (raw file):
Note that table and column aliases won’t be rewritten since they don’t matter in the cross-reference context. Everything from system and virtual tables won’t be
nit: Do you mean that references to system and virtual tables won't be rewritten? The language here is ambiguous.
Also, can you add a system table to an example?
docs/RFCS/20230622_id_reference_syntax.md
line 63 at r1 (raw file):
With the id references stored in UDF and view descriptors, SHOW CREATE function needs to be adjusted so that it can rewrite the id references of objects back to names which is more human-readable to users.
nit: s/is/are
docs/RFCS/20230622_id_reference_syntax.md
line 72 at r1 (raw file):
## Other Problems ### Current sequence and UDT id reference
nit: s/id/ID
docs/RFCS/20230622_id_reference_syntax.md
line 94 at r1 (raw file):
Original query: `SELECT ‘a’:::some_enum` Serialized query: `SELECT ‘a’::::::@100105` -- 100105 is the UDT OID
Are the 2 ::
vs 3 :::
vs 6 ::::::
colons a typo or are they relevant to the syntax? If they're relevant, can that be called out in the text?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r1, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @chengxiong-ruan, @DrewKimball, @mgartner, and @rharding6373)
docs/RFCS/20230622_id_reference_syntax.md
line 1 at r1 (raw file):
- Feature Name: ID references syntax in UDF/View queries
There are a few technical issues in the RFC (see my comments below) but I'd like also to ask a general question:
The RFC currently has a syntax proposal but it does not explain how this change is to be implemented. Last time I looked into this (in PR #17501) I ran into the fundamental problem that the AST does not contain enough semantic context to do the replacement of names to IDs.
In your PR #99954 you are merely solving the problem of looking up a table from its numeric ID reference. (Which was already implemented btw - we already had by-ID table references before). Where is the solution to replace names by IDs? Is it even possible? The RFC should propose a compelling suggestion that this substitution is indeed possible and sketch how it should be performed.
docs/RFCS/20230622_id_reference_syntax.md
line 6 at r1 (raw file):
- Authors: Chengxiong Ruan, Marcus Gartner - RFC PR: [#105353](https://github.com/cockroachdb/cockroach/pull/105353) - Cockroach Issue: [#83233](https://github.com/cockroachdb/cockroach/issues/83233)
It looks to me as if this RFC is also building a solution for #10083. I would gladly see in the RFC how the two problems connect to each other.
docs/RFCS/20230622_id_reference_syntax.md
line 13 at r1 (raw file):
and user-defined functions by their descriptor ID. This syntax is necessary in order to allow tables, columns, indexes and UDFs that are referenced in views and UDFs to be renamed.
As I have mentioned in the PR, this RFC should mention the existing syntax to refer to objects by ID, and explain what the plan is about them.
I already have questioned the need to preserve two equivalent features side-by-side. We need to at least make a honest effort at reducing the cognitive burden of maintenance on our team and keep an eye on engineering velocity. This implies phasing out subsystems that are superseded. Marcus has done this already in the past with ordinal column references, maybe this is the time for another such change.
docs/RFCS/20230622_id_reference_syntax.md
line 32 at r1 (raw file):
The main idea of proposed serialization syntaxes is pattern `{object_type:object_id}`. `object_type` includes `TABLE`, `COLUMN`, `INDEX` and `FUNCTION`. `object_id` is
For the sake of economy at scale, please consider using single-character types here: R
(relation, not table -- this needs to include views and sequences too), C
, I
, F
.
Also you forgot numeric type references (UDTs).
And we might need also numeric region references for multi-region SQL.
docs/RFCS/20230622_id_reference_syntax.md
line 33 at r1 (raw file):
The main idea of proposed serialization syntaxes is pattern `{object_type:object_id}`. `object_type` includes `TABLE`, `COLUMN`, `INDEX` and `FUNCTION`. `object_id` is table/function descriptor id, column id or index id. Rewriting a UDF body or view
nit: throughout the RFC: replace "id" by "ID".
docs/RFCS/20230622_id_reference_syntax.md
line 67 at r1 (raw file):
## Risk of syntax conflict with other syntax? `{xxx:yyy}` syntax seems not used by other statement grammars in cockroach’s `sql.y` file. And, there isn't any use case of curly braces in the postgres' sql
nit: "postgres' sql" -> "postgres' SQL"
Throughout the RFC replace "sql" by "SQL". (Except in quoted file names of course)
docs/RFCS/20230622_id_reference_syntax.md
line 82 at r1 (raw file):
Original query: `SELECT nextval(‘seq_name’)` Serialized query: `SELECT nextval(104::REGCLASS)` -- 104 is the sequence descriptor id
I'm confused by the explanation: is this replacement of 'seq_name'
by 104::REGCLASS
already implemented? Or is this a change this RFC is proposing?
If already implemented, please point to where this substitution takes place.
docs/RFCS/20230622_id_reference_syntax.md
line 88 at r1 (raw file):
Original query: `SELECT ‘a’::some_enum` Serialized query: `SELECT ‘a’:::@100105` -- 100105 is the UDT OID
I have deep issues with this syntax :::@
. There are multiple problems with this, but the foremost is that per your proposal above this should really be SELECT 'a'::{TYPE:100105}
. Why did you write something different?
docs/RFCS/20230622_id_reference_syntax.md
line 94 at r1 (raw file):
Previously, rharding6373 (Rachael Harding) wrote…
Are the 2
::
vs 3:::
vs 6::::::
colons a typo or are they relevant to the syntax? If they're relevant, can that be called out in the text?
I had the same question as Rachael, with the additional concern that ::::::
is really not acceptable as a technical solution. Please explain where this comes from and, if that is already something implemented inside CockroachDB, what you propose to fix it.
docs/RFCS/20230622_id_reference_syntax.md
line 109 at r1 (raw file):
since we reference it by ID. [Here](https://github.com/cockroachdb/cockroach/blob/743486e5a13b9c9c037675462c61753a239ad872/pkg/sql/logictest/testdata/logic_test/udf#L1742) is a logic test shows such kind of failures : This will be fixed if we reference
nit: no space before `:~.
docs/RFCS/20230622_id_reference_syntax.md
line 110 at r1 (raw file):
[Here](https://github.com/cockroachdb/cockroach/blob/743486e5a13b9c9c037675462c61753a239ad872/pkg/sql/logictest/testdata/logic_test/udf#L1742) is a logic test shows such kind of failures : This will be fixed if we reference the table by id with the new syntax (yes a sequence is technically a table).
It's not clear to me what this "other problems" section is for in this RFC.
Are you listing problems that you plan to address with your proposal? If so, how?
Are you merely pointing out their existence and stating that they are "out of scope"? Then that needs to be mentioned in the intro at the top.
(I do think the various syntax examples you've mentioned above are deeply troubling and really need a solution.)
What about |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bdarnell Thanks for pointing this out. For TRUNCATE
, we actually only create new empty indexes and drop old indexes. But it's true that we'll need to rewrite these IDs in UDF and Views whenever TRUNCATE
happens. This can be done by looking up back-reference record in the table descriptor. Another case where ID can be changed is ALTER COLUMN TYPE
, in some cases we create a new column with the new type, backfill it and drop the old one. We would want to do a id rewrite when this happens as well. We tracks information like which column of the table is referenced by which function or view. Other than that, I don't know of other cases ID can be changed. But I'll double check with the team.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @DrewKimball, @knz, @mgartner, and @rharding6373)
docs/RFCS/20230622_id_reference_syntax.md
line 1 at r1 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
There are a few technical issues in the RFC (see my comments below) but I'd like also to ask a general question:
The RFC currently has a syntax proposal but it does not explain how this change is to be implemented. Last time I looked into this (in PR #17501) I ran into the fundamental problem that the AST does not contain enough semantic context to do the replacement of names to IDs.
In your PR #99954 you are merely solving the problem of looking up a table from its numeric ID reference. (Which was already implemented btw - we already had by-ID table references before). Where is the solution to replace names by IDs? Is it even possible? The RFC should propose a compelling suggestion that this substitution is indeed possible and sketch how it should be performed.
Thanks for pointing this out! Yes, the major challenge of this problem is the scoping of table (and others) names. And optbuilder has everything about the name scopes. The proposed solution in #101755 is to rewrite in place when optbuilder builds the UDF queries (note that this is a necessary step to also fetch all the cross-reference dependency information). PR #99954 has been closed since it's not needed at the moment. Sorry again for the confusion. But I'll add a section to explain how to implement this
.
docs/RFCS/20230622_id_reference_syntax.md
line 6 at r1 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
It looks to me as if this RFC is also building a solution for #10083. I would gladly see in the RFC how the two problems connect to each other.
thanks, will add that as well.
docs/RFCS/20230622_id_reference_syntax.md
line 10 at r1 (raw file):
Previously, rharding6373 (Rachael Harding) wrote…
Do we also have to use the syntax for views, since we can rename them with
ALTER VIEW ... RENAME TO
as well?
yes, we need to mention VIEW
in this sentence as well. as it was mentioned in the second sentence that we want to be able to rename objects referenced by views as well.
docs/RFCS/20230622_id_reference_syntax.md
line 13 at r1 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
As I have mentioned in the PR, this RFC should mention the existing syntax to refer to objects by ID, and explain what the plan is about them.
I already have questioned the need to preserve two equivalent features side-by-side. We need to at least make a honest effort at reducing the cognitive burden of maintenance on our team and keep an eye on engineering velocity. This implies phasing out subsystems that are superseded. Marcus has done this already in the past with ordinal column references, maybe this is the time for another such change.
👍 Good call and thanks for the reminder. I'll create an issue to track the unification of this new syntax and the old one as the first step.
docs/RFCS/20230622_id_reference_syntax.md
line 32 at r1 (raw file):
For the sake of economy at scale, please consider using single-character types here:
R
(relation, not table -- this needs to include views and sequences too),C
,I
,F
.
very true!
Also you forgot numeric type references (UDTs)
good catch!
And we might need also numeric region references for multi-region SQL.
Do you mean those DDL statements like ALTER DATABASE...SET PRIMARY REGION...
? Could you share some examples? Thanks
docs/RFCS/20230622_id_reference_syntax.md
line 33 at r1 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
nit: throughout the RFC: replace "id" by "ID".
Good point, will do.
docs/RFCS/20230622_id_reference_syntax.md
line 51 at r1 (raw file):
Previously, rharding6373 (Rachael Harding) wrote…
nit: Do you mean that references to system and virtual tables won't be rewritten? The language here is ambiguous.
Also, can you add a system table to an example?
Thanks for pointing this out. But you're right, I'll rephrase this a bit and add an example for that.
docs/RFCS/20230622_id_reference_syntax.md
line 82 at r1 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
I'm confused by the explanation: is this replacement of
'seq_name'
by104::REGCLASS
already implemented? Or is this a change this RFC is proposing?If already implemented, please point to where this substitution takes place.
Sorry about the confusion. The syntax is an existing syntax. I'll add more info about this. Thanks for pointing this out.
docs/RFCS/20230622_id_reference_syntax.md
line 88 at r1 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
I have deep issues with this syntax
:::@
. There are multiple problems with this, but the foremost is that per your proposal above this should really beSELECT 'a'::{TYPE:100105}
. Why did you write something different?
Ah, this is also existing syntax to reference types by ID in expressions. I'll add more words to explains this.
docs/RFCS/20230622_id_reference_syntax.md
line 94 at r1 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
I had the same question as Rachael, with the additional concern that
::::::
is really not acceptable as a technical solution. Please explain where this comes from and, if that is already something implemented inside CockroachDB, what you propose to fix it.
so...::::::
is definitely a typo! thanks for catching that.
But ::
is the syntax for type casting, and :::
is the syntax for type annotated expression.
docs/RFCS/20230622_id_reference_syntax.md
line 110 at r1 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
It's not clear to me what this "other problems" section is for in this RFC.
Are you listing problems that you plan to address with your proposal? If so, how?
Are you merely pointing out their existence and stating that they are "out of scope"? Then that needs to be mentioned in the intro at the top.
(I do think the various syntax examples you've mentioned above are deeply troubling and really need a solution.)
Yeah, mentioning them basically wanted to let people know their existence and it's a non-goal for this specific project. And you're right that we should unify all the syntax eventually. An issue will be created to track this.
@chengxiong-ruan have we implemented parts of this proposal? If so, can you finish this up and merge it? |
@mgartner there is a pr that was trying to do that, but I think we agreed to pause on that. Sorry that I should have closed the PR, but I lost track of it unfortunately. |
Informs: #83233
Release note: None