-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-25006][SQL] Add CatalogTableIdentifier. #21978
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-25006][SQL] Add CatalogTableIdentifier. #21978
Conversation
@gatorsmile and @cloud-fan, this adds catalog to |
This comment has been minimized.
This comment has been minimized.
d61e0f1
to
c0683a8
Compare
This comment has been minimized.
This comment has been minimized.
7abc8b8
to
f34e23b
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
f34e23b
to
6fe2d07
Compare
This comment has been minimized.
This comment has been minimized.
Retest this please. |
This comment has been minimized.
This comment has been minimized.
6fe2d07
to
00295ee
Compare
FYI @jzhuge |
This comment has been minimized.
This comment has been minimized.
Retest this please. |
This comment has been minimized.
This comment has been minimized.
@cloud-fan, that's fine with me since #17185 is already merged. Would this conflict with #17185? We can just add a case that detects whether the first identifier in the seq is a catalog when updating expressions. This PR is just the start for adding catalog to table identifiers. None of the SQL statements are modified in this PR on purpose: code paths will need to be updated to support |
Test build #94375 has finished for PR 21978 at commit
|
@cloud-fan, when do you think we can get this in? It doesn't need to go in 2.4 because it doesn't change any read or write paths -- nothing uses CatalogTableIdentifier yet -- but it would be great to get it into master so we can start building paths that do support CatalogTableIdentifier. |
Wanted to follow up here - are we planning on merging this or are there more things we need to discuss? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is fine, just some minor comments.
@@ -18,48 +18,106 @@ | |||
package org.apache.spark.sql.catalyst | |||
|
|||
/** | |||
* An identifier that optionally specifies a database. | |||
* An identifier that optionally specifies a database and catalog. | |||
* | |||
* Format (unquoted): "name" or "db.name" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update formats in these scaladocs.
* unquotedString as the function name. | ||
* Identifies a table in a database and catalog. | ||
* If `database` is not defined, the current catalog's default database is used. | ||
* If `catalog` is not defined, the current catalog is used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"current" meaning "global"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, we want to move away from a special global catalog. I think that Spark should have a current catalog, like a current database, which is used to resolve references that don't have an explicit catalog. That would have a default, just like the current database has a default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. When we add the logical side of leveraging catalogs we can revisit the API of how to set the current catalog.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. This introduces the ability to expose a catalog to Spark. It doesn't actually add any user-facing operations.
This adds CatalogTableIdentifier, which is an identifier that consists of a triple: catalog, database, and table. Catalog and database are optional. The existing TableIdentifier class extends CatalogTableIdentifier and is guarateed to have no defined catalog component. Classes that expect a TableIdentifier should continue to use TableIdentifier to ensure that catalogs are not leaked into code paths that do not support them.
00295ee
to
beebccf
Compare
Rebased on master. |
Test build #99480 has finished for PR 21978 at commit
|
val identifier: String | ||
|
||
def database: Option[String] | ||
|
||
def catalog: Option[String] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Default to None
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an abstract method definition and catalog is always implemented by a val.
Think there's a failing build, also do we still need this or have the underlying ideas changed in our discussion? My understanding is that we still need this and that catalog identifiers are important to start with to build the follow up table catalog APIs on. Also should this include multi-part identifier? |
Hi Matt,
Agree we still need it. My PR for SPARK-26946 to implement multi-part
identifier will be built on top of this because CatalogTableIdentifier at
least provides a good way to incrementally migrate code paths. Do not
review that PR yet, I will update in a few days, then we will have a better
picture.
Thanks,
…On Wed, Feb 27, 2019 at 7:27 PM mccheah ***@***.***> wrote:
Think there's a failing build, also do we still need this or have the
underlying ideas changed in our discussion? My understanding is that we
still need this and that catalog identifiers are important to start with to
build the follow up table catalog APIs on.
Also should this include multi-part identifier?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#21978 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABy-pIdmtFgKnHpqK38MHEv1viN5eGtgks5vR0yTgaJpZM4VtC5y>
.
--
John Zhuge
|
Identifiers for multi-catalog support were added in #23848. I'm closing this. |
What changes were proposed in this pull request?
This adds
CatalogTableIdentifier
, which is an identifier that consists of a triple: catalog, database, and table. Catalog and database are optional.The existing
TableIdentifier
class extendsCatalogTableIdentifier
and is guarateed to have no defined catalog component. Classes that expect aTableIdentifier
will continue to useTableIdentifier
to ensure that catalogs are not leaked into code paths that do not support them.This adds a parser rule,
catalogTableIdentifier
, that can parse identifiers with a catalog. An identifier with only two components will match database and table, leaving the catalog undefined. Only identifiers with three components will have a defined catalog. In addition, rules must be re-written to supportcatalogTableIdentifier
. Existing rules will continue to usetableIdentifier
with no catalog.How was this patch tested?
Existing tests. This should not change any behavior.