-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-30018][SQL] Support ALTER DATABASE SET OWNER syntax #26775
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Test build #114928 has finished for PR 26775 at commit
|
Test build #114930 has finished for PR 26775 at commit
|
retest this please |
cc @cloud-fan @maropu @wangyum, thanks for reviewing. |
sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
Show resolved
Hide resolved
sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
Outdated
Show resolved
Hide resolved
@@ -94,11 +97,18 @@ class ResolveCatalogs(val catalogManager: CatalogManager) | |||
s"because view support in catalog has not been implemented yet") | |||
|
|||
case AlterNamespaceSetPropertiesStatement(NonSessionCatalog(catalog, nameParts), properties) => | |||
AlterNamespaceSetProperties(catalog.asNamespaceCatalog, nameParts, properties) | |||
val availableProps = properties -- RESERVED_PROPERTIES.asScala |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better to fail if reserved properties are being set/altered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I will go this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I notice that the RESERVED_PROPERTIES
is unremovable but not reversed.. the comment
and location
is not exactly same with ownerName/Type
. the name RESERVED_PROPERTIES
here is deceptive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can Hive alter table location via ALTER TABLE SET TBLPROPERTIES?
Seems like Hive can only do it for table comment: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AlterTableComment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hive will not change Database field members via properties, if we set dbproperites('location'='xxx'), the location
here is just a property inside properties, meaningless to the catalog
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK so we can't follow Hive here. It seems wrong to me to change location and owner with SET TBLPROPERTIES or SET DBPROPERTIES, but comment should be fine.
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statements.scala
Outdated
Show resolved
Hide resolved
Test build #114934 has finished for PR 26775 at commit
|
Code path for now:
Shall we expose a api in [[SupportsNamespace]] to deal with ownership staff? Then we need not to care what the underlying catalog delegation it is. We handle these in [[HiveClientImpl]] for now, it seems not a good solution. |
Test build #114951 has finished for PR 26775 at commit
|
Test build #114957 has finished for PR 26775 at commit
|
How do we use |
@@ -172,6 +173,10 @@ class ResolveSessionCatalog( | |||
throw new AnalysisException( | |||
s"The database name is not valid: ${nameParts.quoted}") | |||
} | |||
if (properties.keySet.intersect(REVERSED_PROPERTIES.asScala.toSet).nonEmpty) { | |||
throw new AnalysisException(s"Cannot directly modify the reversed properties" + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: drop s
in the head.
* The list of namespace ownership properties, cannot be used in `CREATE` syntax. | ||
* | ||
* Only support in: | ||
* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about just saying like this? We need to mention CREATE syntax here?
* The list of namespace ownership properties, which can be used in `ALTER` syntax:
*
* {{
* ALTER (DATABASE|SCHEMA|NAMESPACE) SET OWNER ...
* }}
@@ -94,11 +97,21 @@ class ResolveCatalogs(val catalogManager: CatalogManager) | |||
s"because view support in catalog has not been implemented yet") | |||
|
|||
case AlterNamespaceSetPropertiesStatement(NonSessionCatalog(catalog, nameParts), properties) => | |||
if (properties.keySet.intersect(REVERSED_PROPERTIES.asScala.toSet).nonEmpty) { | |||
throw new AnalysisException(s"Cannot directly modify the reversed properties" + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: drop s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this change related to this pr to support SET OWNER
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to prohibit changing ownership ·SET PROPERTIES·
For data source API, we do want to use properties to set owner/location/comment to make the API flexible. The problem is about end-user API: how do we expect end-users to set them? What I expect:
|
For DataSource API developers: For end users: Also modifying these properties need specific commands. |
hmm, isn't it better to let Spark decide the default owner instead of catalog implementations? |
E.g. we currently don't support multi tenancy in Spark's ThriftServer, but if can use such a flexible implementation, we may can implement |
OK let's discuss it later. For now let's focus on |
Test build #115012 has finished for PR 26775 at commit
|
retest this please |
* Specify the default owner name for `CREATE` namespace. | ||
* | ||
*/ | ||
default String defaultOwner() { return Utils.getCurrentUserName(); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do it later? I don't think there is a consensus yet. For now let's let spark decide the default owner, as how we do it for tables in hive metastore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
Test build #116124 has finished for PR 26775 at commit
|
sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
Show resolved
Hide resolved
sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/SupportsNamespaces.java
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/SupportsNamespaces.java
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/CreateNamespaceExec.scala
Show resolved
Hide resolved
@@ -884,7 +885,8 @@ class DataSourceV2SQLSuite | |||
.isEmpty, s"$key is a reserved namespace property and ignored") | |||
val meta = | |||
catalog("testcat").asNamespaceCatalog.loadNamespaceMetadata(Array("reservedTest")) | |||
assert(meta.get(key) === null, "reserved properties should not have side effects") | |||
assert(!Option(meta.get(key)).exists(_.contains("foo")), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's wrong with meta.get(key) === null
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can be fixed if we revert the change in Create syntax
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it simply meta.get(key) != "foo"
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'location' might change the path from relative to absolute
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
meta.get(key) != "foo"
still works, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, but once if the location
goes wrong and have side effects != 'foo'
is not able to check that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about meta.get(key) == null || !meta.get(key).contains("foo")
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
rows += toCatalystRow("Location", metadata.get(SupportsNamespaces.PROP_LOCATION)) | ||
rows += toCatalystRow("Description", metadata.get(PROP_COMMENT)) | ||
rows += toCatalystRow("Location", metadata.get(PROP_LOCATION)) | ||
rows += toCatalystRow("Owner Name", metadata.getOrDefault(PROP_OWNER_NAME, "")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should skip printing owner if the properties don't exist
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should do the same for other fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, this fields are optional across different catalog implementations
Test build #116350 has finished for PR 26775 at commit
|
Test build #116351 has finished for PR 26775 at commit
|
* The list of reserved namespace properties, which can not be removed or changed directly by | ||
* the syntax: | ||
* {{ | ||
* ALTER NAMESPACE ... SET PROPERTIES ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we support UNSET?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have no ALTER NAMESPACE UNSET PROPERTIES yet
test("AlterNamespaceSetOwner using v2 catalog") { | ||
withNamespace("testcat.ns1.ns2") { | ||
sql("CREATE NAMESPACE IF NOT EXISTS testcat.ns1.ns2 COMMENT " + | ||
"'test namespace' LOCATION '/tmp/ns_test_3'") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not related to this PR, does ANSI SQL define a OWNER clause for CREATE TABLE/NAMESPACE?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked the Standard and did not find such a clause for CREATE TABLE/NAMESPACE
syntaxes
checkOwner(db2, "a") | ||
checkOwner(db1, currentUser, "USER") | ||
val e = intercept[AnalysisException](sql(s"ALTER DATABASE $db1 SET DBPROPERTIES ('a'='a'," | ||
+ s"'ownerName'='$owner','ownerType'='XXX')")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does case sensitivity matter for reserved properties? what if users specify Ownername
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok maybe it's fine to treat Ownername
as a normal property.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not matter but I guess we should mention this
spark-sql> create namespace abcde with properties('LOCATION'= 'b');
20/01/09 17:21:54 INFO HiveMetaStore: 0: get_database: global_temp
20/01/09 17:21:54 INFO audit: ugi=kentyao ip=unknown-ip-addr cmd=get_database: global_temp
20/01/09 17:21:54 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
20/01/09 17:21:54 INFO HiveMetaStore: 0: create_database: Database(name:abcde, description:, locationUri:file:/Users/kentyao/Downloads/spark/spark-3.0.0-SNAPSHOT-bin-20200103/spark-warehouse/abcde.db, parameters:{LOCATION=b}, ownerName:kentyao, ownerType:USER)
20/01/09 17:21:54 INFO audit: ugi=kentyao ip=unknown-ip-addr cmd=create_database: Database(name:abcde, description:, locationUri:file:/Users/kentyao/Downloads/spark/spark-3.0.0-SNAPSHOT-bin-20200103/spark-warehouse/abcde.db, parameters:{LOCATION=b}, ownerName:kentyao, ownerType:USER)
20/01/09 17:21:54 WARN ObjectStore: Failed to get database abcde, returning NoSuchObjectException
20/01/09 17:21:54 INFO FileUtils: Creating directory if it doesn't exist: file:/Users/kentyao/Downloads/spark/spark-3.0.0-SNAPSHOT-bin-20200103/spark-warehouse/abcde.db
Time taken: 1.891 seconds
spark-sql> create namespace abcdef with properties('location'= 'b');
20/01/09 17:22:52 INFO HiveMetaStore: 0: create_database: Database(name:abcdef, description:, locationUri:file:/Users/kentyao/Downloads/spark/spark-3.0.0-SNAPSHOT-bin-20200103/b, parameters:{}, ownerName:kentyao, ownerType:USER)
20/01/09 17:22:52 INFO audit: ugi=kentyao ip=unknown-ip-addr cmd=create_database: Database(name:abcdef, description:, locationUri:file:/Users/kentyao/Downloads/spark/spark-3.0.0-SNAPSHOT-bin-20200103/b, parameters:{}, ownerName:kentyao, ownerType:USER)
20/01/09 17:22:52 WARN ObjectStore: Failed to get database abcdef, returning NoSuchObjectException
20/01/09 17:22:52 INFO FileUtils: Creating directory if it doesn't exist: file:/Users/kentyao/Downloads/spark/spark-3.0.0-SNAPSHOT-bin-20200103/b
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spark-sql> desc namespace extended abcde;
20/01/09 17:24:14 INFO HiveMetaStore: 0: get_database: abcde
20/01/09 17:24:14 INFO audit: ugi=kentyao ip=unknown-ip-addr cmd=get_database: abcde
20/01/09 17:24:14 INFO HiveMetaStore: 0: get_database: abcde
20/01/09 17:24:14 INFO audit: ugi=kentyao ip=unknown-ip-addr cmd=get_database: abcde
Database Name abcde
Description
Location file:/Users/kentyao/Downloads/spark/spark-3.0.0-SNAPSHOT-bin-20200103/spark-warehouse/abcde.db
Owner Name kentyao
Owner Type USER
Properties ((LOCATION,b))
Time taken: 0.048 seconds, Fetched 6 row(s)
20/01/09 17:24:14 INFO SparkSQLCLIDriver: Time taken: 0.048 seconds, Fetched 6 row(s)
spark-sql> desc namespace extended abcdef;
20/01/09 17:24:21 INFO HiveMetaStore: 0: get_database: abcdef
20/01/09 17:24:21 INFO audit: ugi=kentyao ip=unknown-ip-addr cmd=get_database: abcdef
20/01/09 17:24:21 INFO HiveMetaStore: 0: get_database: abcdef
20/01/09 17:24:21 INFO audit: ugi=kentyao ip=unknown-ip-addr cmd=get_database: abcdef
Database Name abcdef
Description
Location file:/Users/kentyao/Downloads/spark/spark-3.0.0-SNAPSHOT-bin-20200103/b
Owner Name kentyao
Owner Type USER
Properties
Time taken: 0.016 seconds, Fetched 6 row(s)
Test build #116376 has finished for PR 26775 at commit
|
Test build #116378 has finished for PR 26775 at commit
|
Test build #116388 has finished for PR 26775 at commit
|
retest this please |
Test build #116406 has finished for PR 26775 at commit
|
Test build #116404 has finished for PR 26775 at commit
|
thanks, merging to master! |
What changes were proposed in this pull request?
In this pull request, we are going to support
SET OWNER
syntax for databases and namespaces,ALTER (DATABASE|SCHEME|NAMESPACE) database_name SET OWNER [USER|ROLE|GROUP] user_or_role_group;
Before this commit 332e252, we didn't care much about ownerships for the catalog objects. In 332e252, we determined to use properties to store ownership staff, and temporarily used
alter database ... set dbproperties ...
to support switch ownership of a database. This PR aims to use the formal syntax to replace it.In hive,
ownerName/Type
are fields of the database objects, also they can be normal properties.The create/alter database syntax will not change the owner to
yaooqinn
but store it in parameters. e.g.In this pull request, because we let the
ownerName
become reversed, so it will neither change the owner nor store in dbproperties, just be omitted silently.Why are the changes needed?
Formal syntax support for changing database ownership
Does this PR introduce any user-facing change?
yes, add a new syntax
How was this patch tested?
add unit tests