Skip to content

[SPARK-30018][SQL] Support ALTER DATABASE SET OWNER syntax #26775

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 18 commits into from
Closed

[SPARK-30018][SQL] Support ALTER DATABASE SET OWNER syntax #26775

wants to merge 18 commits into from

Conversation

yaooqinn
Copy link
Member

@yaooqinn yaooqinn commented Dec 6, 2019

What changes were proposed in this pull request?

In this pull request, we are going to support SET OWNER syntax for databases and namespaces,

ALTER (DATABASE|SCHEME|NAMESPACE) database_name SET OWNER [USER|ROLE|GROUP] user_or_role_group; 

Before this commit 332e252, we didn't care much about ownerships for the catalog objects. In 332e252, we determined to use properties to store ownership staff, and temporarily used alter database ... set dbproperties ... to support switch ownership of a database. This PR aims to use the formal syntax to replace it.

In hive, ownerName/Type are fields of the database objects, also they can be normal properties.

create schema test1 with dbproperties('ownerName'='yaooqinn')

The create/alter database syntax will not change the owner to yaooqinn but store it in parameters. e.g.

+----------+----------+---------------------------------------------------------------+-------------+-------------+-----------------------+--+
| db_name  | comment  |                           location                            | owner_name  | owner_type  |      parameters       |
+----------+----------+---------------------------------------------------------------+-------------+-------------+-----------------------+--+
| test1    |          | hdfs://quickstart.cloudera:8020/user/hive/warehouse/test1.db  | anonymous   | USER        | {ownerName=yaooqinn}  |
+----------+----------+---------------------------------------------------------------+-------------+-------------+-----------------------+--+

In this pull request, because we let the ownerName become reversed, so it will neither change the owner nor store in dbproperties, just be omitted silently.

Why are the changes needed?

Formal syntax support for changing database ownership

Does this PR introduce any user-facing change?

yes, add a new syntax

How was this patch tested?

add unit tests

@SparkQA
Copy link

SparkQA commented Dec 6, 2019

Test build #114928 has finished for PR 26775 at commit 3c96091.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class AlterNamespaceSetOwnerStatement(
  • case class AlterDatabaseSetOwnerCommand(databaseName: String, ownerName: String, ownerType: String)

@SparkQA
Copy link

SparkQA commented Dec 6, 2019

Test build #114930 has finished for PR 26775 at commit 7188587.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yaooqinn
Copy link
Member Author

yaooqinn commented Dec 6, 2019

retest this please

@yaooqinn
Copy link
Member Author

yaooqinn commented Dec 6, 2019

cc @cloud-fan @maropu @wangyum, thanks for reviewing.

@@ -94,11 +97,18 @@ class ResolveCatalogs(val catalogManager: CatalogManager)
s"because view support in catalog has not been implemented yet")

case AlterNamespaceSetPropertiesStatement(NonSessionCatalog(catalog, nameParts), properties) =>
AlterNamespaceSetProperties(catalog.asNamespaceCatalog, nameParts, properties)
val availableProps = properties -- RESERVED_PROPERTIES.asScala
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to fail if reserved properties are being set/altered.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I will go this way.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I notice that the RESERVED_PROPERTIES is unremovable but not reversed.. the comment and location is not exactly same with ownerName/Type. the name RESERVED_PROPERTIES here is deceptive.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can Hive alter table location via ALTER TABLE SET TBLPROPERTIES?

Seems like Hive can only do it for table comment: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AlterTableComment

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hive will not change Database field members via properties, if we set dbproperites('location'='xxx'), the location here is just a property inside properties, meaningless to the catalog

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK so we can't follow Hive here. It seems wrong to me to change location and owner with SET TBLPROPERTIES or SET DBPROPERTIES, but comment should be fine.

@SparkQA
Copy link

SparkQA commented Dec 6, 2019

Test build #114934 has finished for PR 26775 at commit 7188587.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yaooqinn
Copy link
Member Author

yaooqinn commented Dec 6, 2019

Code path for now:

[[CreateNamespaceStatement]]
    users can specify `comment` `location`
    users can not sepcify `ownerName` `ownerType`

    -> [[CreateDatabaseCommand]]
    We shall throw exception when meet user specified `ownerName` `ownerType`
    [[ResolveSessionCatalog]] will convert `comment` `location` to db fields
    [[HiveClientImpl]] will gather `ownerName` `ownerType`
    
    -> [[CreateNamespace]]
    we shall throw exception when meet user specified `ownerName` `ownerType`
       -> [[CreateNamespaceExec]]
       all propeties get here, no `ownerName` `ownerType` guaranteed
       if users use built in catalog
          [[V2SessionCatalog]] will convert `comment` `location` to db fields
          [[HiveClientImpl]] will gather `ownerName` `ownerType`
       if users use a external [[DelegatingCatalogExtension]]
          users need to care how to create `ownerName` `ownerType` or do we use spark user here?

[[AlterNamespaceSetPropertiesStatement]]
we shall throw exception when meet user specified `ownerName` `ownerType` `comment`(?) `location` before transform to
    -> [[AlterNamespaceSetProperties]]
    -> [[AlterDatabasePropertiesCommand]]

then, if users want to change these, use [[AlterNamespaceSetLocationStatement]] and [[AlterNamespaceSetOwner]]

Shall we expose a api in [[SupportsNamespace]] to deal with ownership staff? Then we need not to care what the underlying catalog delegation it is. We handle these in [[HiveClientImpl]] for now, it seems not a good solution.

@SparkQA
Copy link

SparkQA commented Dec 6, 2019

Test build #114951 has finished for PR 26775 at commit 56ac955.

  • This patch fails Java style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class AlterNamespaceSetOwner(

@SparkQA
Copy link

SparkQA commented Dec 6, 2019

Test build #114957 has finished for PR 26775 at commit 25cf227.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member

maropu commented Dec 9, 2019

How do we use ROLE/GROUP? I read the hive doc though. Probably, we need to describe how-to-use in the doc (we already have somewhere?). cc: @dongjoon-hyun

@@ -172,6 +173,10 @@ class ResolveSessionCatalog(
throw new AnalysisException(
s"The database name is not valid: ${nameParts.quoted}")
}
if (properties.keySet.intersect(REVERSED_PROPERTIES.asScala.toSet).nonEmpty) {
throw new AnalysisException(s"Cannot directly modify the reversed properties" +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: drop s in the head.

* The list of namespace ownership properties, cannot be used in `CREATE` syntax.
*
* Only support in:
*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about just saying like this? We need to mention CREATE syntax here?

   * The list of namespace ownership properties, which can be used in `ALTER` syntax:
   *
   * {{
   *   ALTER (DATABASE|SCHEMA|NAMESPACE) SET OWNER ...
   * }}

@@ -94,11 +97,21 @@ class ResolveCatalogs(val catalogManager: CatalogManager)
s"because view support in catalog has not been implemented yet")

case AlterNamespaceSetPropertiesStatement(NonSessionCatalog(catalog, nameParts), properties) =>
if (properties.keySet.intersect(REVERSED_PROPERTIES.asScala.toSet).nonEmpty) {
throw new AnalysisException(s"Cannot directly modify the reversed properties" +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: drop s

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change related to this pr to support SET OWNER?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to prohibit changing ownership ·SET PROPERTIES·

@cloud-fan
Copy link
Contributor

For data source API, we do want to use properties to set owner/location/comment to make the API flexible. The problem is about end-user API: how do we expect end-users to set them?

What I expect:

  • for CREATE TABLE/NAMESPACE, location should be set by LOCATION clause, comment should be set by COMMENT or properties. I'm not sure about owner: does hive support setting owner during CREATE TABLE/DATABASE
  • for ALTER TABLE/NAMESPACE, should be the same. location/owner should be set by special syntax instead of via properties.

@yaooqinn
Copy link
Member Author

yaooqinn commented Dec 9, 2019

For DataSource API developers:
Exposes defaultOwner/OwnerType in SupportsNamespaces to defined the catalog default ownership. By default, we use our sparkUser as default owner and USER as default ownerType. This should be considered as how the developers define authentication. A bit similar with Hive's HiveAuthenticationProvider . Currently, we use this only in CreateNamespaceExec to define the default ownership. When we support authorization, these can be used in privilege checking for queries, commands etc..

For end users:
We don't allow user to set ownership or location or comment in create syntax, the ownership should be inherited from the catalog impl, and the comment and location should from the specific clauses.

Also modifying these properties need specific commands.

@cloud-fan
Copy link
Contributor

hmm, isn't it better to let Spark decide the default owner instead of catalog implementations?

@yaooqinn
Copy link
Member Author

yaooqinn commented Dec 9, 2019

E.g. we currently don't support multi tenancy in Spark's ThriftServer, but if can use such a flexible implementation, we may can implement defaultOwner = session.getUser to get the actual user from client side, and do authentication/authorization for it. In the current implementation of Spark's ThriftServer, the sparkUser can only works as a super user, then at least we can achieve metadata security.

@cloud-fan
Copy link
Contributor

OK let's discuss it later. For now let's focus on ALTER TABLE/NAMESPACE SET OWNER.

@SparkQA
Copy link

SparkQA commented Dec 9, 2019

Test build #115012 has finished for PR 26775 at commit dd1d52b.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yaooqinn
Copy link
Member Author

yaooqinn commented Dec 9, 2019

retest this please

* Specify the default owner name for `CREATE` namespace.
*
*/
default String defaultOwner() { return Utils.getCurrentUserName(); }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do it later? I don't think there is a consensus yet. For now let's let spark decide the default owner, as how we do it for tables in hive metastore.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

@SparkQA
Copy link

SparkQA commented Jan 6, 2020

Test build #116124 has finished for PR 26775 at commit e5d695e.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -884,7 +885,8 @@ class DataSourceV2SQLSuite
.isEmpty, s"$key is a reserved namespace property and ignored")
val meta =
catalog("testcat").asNamespaceCatalog.loadNamespaceMetadata(Array("reservedTest"))
assert(meta.get(key) === null, "reserved properties should not have side effects")
assert(!Option(meta.get(key)).exists(_.contains("foo")),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's wrong with meta.get(key) === null?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be fixed if we revert the change in Create syntax

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it simply meta.get(key) != "foo"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'location' might change the path from relative to absolute

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

meta.get(key) != "foo" still works, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, but once if the location goes wrong and have side effects != 'foo' is not able to check that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about meta.get(key) == null || !meta.get(key).contains("foo")?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

rows += toCatalystRow("Location", metadata.get(SupportsNamespaces.PROP_LOCATION))
rows += toCatalystRow("Description", metadata.get(PROP_COMMENT))
rows += toCatalystRow("Location", metadata.get(PROP_LOCATION))
rows += toCatalystRow("Owner Name", metadata.getOrDefault(PROP_OWNER_NAME, ""))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should skip printing owner if the properties don't exist

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should do the same for other fields.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, this fields are optional across different catalog implementations

@SparkQA
Copy link

SparkQA commented Jan 9, 2020

Test build #116350 has finished for PR 26775 at commit 162fbf0.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 9, 2020

Test build #116351 has finished for PR 26775 at commit ae1ebec.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

* The list of reserved namespace properties, which can not be removed or changed directly by
* the syntax:
* {{
* ALTER NAMESPACE ... SET PROPERTIES ...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we support UNSET?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have no ALTER NAMESPACE UNSET PROPERTIES yet

test("AlterNamespaceSetOwner using v2 catalog") {
withNamespace("testcat.ns1.ns2") {
sql("CREATE NAMESPACE IF NOT EXISTS testcat.ns1.ns2 COMMENT " +
"'test namespace' LOCATION '/tmp/ns_test_3'")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not related to this PR, does ANSI SQL define a OWNER clause for CREATE TABLE/NAMESPACE?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked the Standard and did not find such a clause for CREATE TABLE/NAMESPACE syntaxes

checkOwner(db2, "a")
checkOwner(db1, currentUser, "USER")
val e = intercept[AnalysisException](sql(s"ALTER DATABASE $db1 SET DBPROPERTIES ('a'='a',"
+ s"'ownerName'='$owner','ownerType'='XXX')"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does case sensitivity matter for reserved properties? what if users specify Ownername?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok maybe it's fine to treat Ownername as a normal property.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not matter but I guess we should mention this

spark-sql> create namespace abcde with properties('LOCATION'= 'b');
20/01/09 17:21:54 INFO HiveMetaStore: 0: get_database: global_temp
20/01/09 17:21:54 INFO audit: ugi=kentyao	ip=unknown-ip-addr	cmd=get_database: global_temp
20/01/09 17:21:54 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
20/01/09 17:21:54 INFO HiveMetaStore: 0: create_database: Database(name:abcde, description:, locationUri:file:/Users/kentyao/Downloads/spark/spark-3.0.0-SNAPSHOT-bin-20200103/spark-warehouse/abcde.db, parameters:{LOCATION=b}, ownerName:kentyao, ownerType:USER)
20/01/09 17:21:54 INFO audit: ugi=kentyao	ip=unknown-ip-addr	cmd=create_database: Database(name:abcde, description:, locationUri:file:/Users/kentyao/Downloads/spark/spark-3.0.0-SNAPSHOT-bin-20200103/spark-warehouse/abcde.db, parameters:{LOCATION=b}, ownerName:kentyao, ownerType:USER)
20/01/09 17:21:54 WARN ObjectStore: Failed to get database abcde, returning NoSuchObjectException
20/01/09 17:21:54 INFO FileUtils: Creating directory if it doesn't exist: file:/Users/kentyao/Downloads/spark/spark-3.0.0-SNAPSHOT-bin-20200103/spark-warehouse/abcde.db
Time taken: 1.891 seconds
spark-sql> create namespace abcdef with properties('location'= 'b');
20/01/09 17:22:52 INFO HiveMetaStore: 0: create_database: Database(name:abcdef, description:, locationUri:file:/Users/kentyao/Downloads/spark/spark-3.0.0-SNAPSHOT-bin-20200103/b, parameters:{}, ownerName:kentyao, ownerType:USER)
20/01/09 17:22:52 INFO audit: ugi=kentyao	ip=unknown-ip-addr	cmd=create_database: Database(name:abcdef, description:, locationUri:file:/Users/kentyao/Downloads/spark/spark-3.0.0-SNAPSHOT-bin-20200103/b, parameters:{}, ownerName:kentyao, ownerType:USER)
20/01/09 17:22:52 WARN ObjectStore: Failed to get database abcdef, returning NoSuchObjectException
20/01/09 17:22:52 INFO FileUtils: Creating directory if it doesn't exist: file:/Users/kentyao/Downloads/spark/spark-3.0.0-SNAPSHOT-bin-20200103/b

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spark-sql> desc namespace extended abcde;
20/01/09 17:24:14 INFO HiveMetaStore: 0: get_database: abcde
20/01/09 17:24:14 INFO audit: ugi=kentyao	ip=unknown-ip-addr	cmd=get_database: abcde
20/01/09 17:24:14 INFO HiveMetaStore: 0: get_database: abcde
20/01/09 17:24:14 INFO audit: ugi=kentyao	ip=unknown-ip-addr	cmd=get_database: abcde
Database Name	abcde
Description
Location	file:/Users/kentyao/Downloads/spark/spark-3.0.0-SNAPSHOT-bin-20200103/spark-warehouse/abcde.db
Owner Name	kentyao
Owner Type	USER
Properties	((LOCATION,b))
Time taken: 0.048 seconds, Fetched 6 row(s)
20/01/09 17:24:14 INFO SparkSQLCLIDriver: Time taken: 0.048 seconds, Fetched 6 row(s)
spark-sql> desc namespace extended abcdef;
20/01/09 17:24:21 INFO HiveMetaStore: 0: get_database: abcdef
20/01/09 17:24:21 INFO audit: ugi=kentyao	ip=unknown-ip-addr	cmd=get_database: abcdef
20/01/09 17:24:21 INFO HiveMetaStore: 0: get_database: abcdef
20/01/09 17:24:21 INFO audit: ugi=kentyao	ip=unknown-ip-addr	cmd=get_database: abcdef
Database Name	abcdef
Description
Location	file:/Users/kentyao/Downloads/spark/spark-3.0.0-SNAPSHOT-bin-20200103/b
Owner Name	kentyao
Owner Type	USER
Properties
Time taken: 0.016 seconds, Fetched 6 row(s)

@SparkQA
Copy link

SparkQA commented Jan 9, 2020

Test build #116376 has finished for PR 26775 at commit bb2f919.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 9, 2020

Test build #116378 has finished for PR 26775 at commit 2182992.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 9, 2020

Test build #116388 has finished for PR 26775 at commit 514b78a.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Jan 9, 2020

Test build #116406 has finished for PR 26775 at commit 514b78a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 9, 2020

Test build #116404 has finished for PR 26775 at commit 514b78a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in bcf07cb Jan 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants