Skip to content

[SPARK-30312][SQL] Preserve path permission and acl when truncate table #26956

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 9 commits into from

Conversation

viirya
Copy link
Member

@viirya viirya commented Dec 19, 2019

What changes were proposed in this pull request?

This patch proposes to preserve existing permission/acls of paths when truncate table/partition.

Why are the changes needed?

When Spark SQL truncates table, it deletes the paths of table/partitions, then re-create new ones. If permission/acls were set on the paths, the existing permission/acls will be deleted.

We should preserve the permission/acls if possible.

Does this PR introduce any user-facing change?

Yes. When truncate table/partition, Spark will keep permission/acls of paths.

How was this patch tested?

Unit test.

Manual test:

  1. Create a table.
  2. Manually change it permission/acl
  3. Truncate table
  4. Check permission/acl
val df = Seq(1, 2, 3).toDF
df.write.mode("overwrite").saveAsTable("test.test_truncate_table")
val testTable = spark.table("test.test_truncate_table")
testTable.show()
+-----+
|value|
+-----+
|    1|
|    2|
|    3|
+-----+
// hdfs dfs -setfacl ...
// hdfs dfs -getfacl ...
sql("truncate table test.test_truncate_table")
// hdfs dfs -getfacl ...
val testTable2 = spark.table("test.test_truncate_table")
testTable2.show()
+-----+
|value|
+-----+
+-----+

Screen Shot 2019-12-30 at 3 12 15 PM

@viirya
Copy link
Member Author

viirya commented Dec 19, 2019

@SparkQA
Copy link

SparkQA commented Dec 20, 2019

Test build #115586 has finished for PR 26956 at commit 73e1bb0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 20, 2019

Test build #115588 has finished for PR 26956 at commit 60134c1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class FakeLocalFsFileSystem extends RawLocalFileSystem

@viirya
Copy link
Member Author

viirya commented Dec 27, 2019

retest this please

@SparkQA
Copy link

SparkQA commented Dec 27, 2019

Test build #115869 has finished for PR 26956 at commit 60134c1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class FakeLocalFsFileSystem extends RawLocalFileSystem

@SparkQA
Copy link

SparkQA commented Dec 28, 2019

Test build #115873 has finished for PR 26956 at commit 1df437b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 28, 2019

Test build #115875 has finished for PR 26956 at commit 2c3f1fb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya viirya changed the title [SPARK-30312][SQL] Preserve path attributes including permission when truncate table [SPARK-30312][SQL] Preserve path permission and acl when truncate table Dec 31, 2019
@viirya
Copy link
Member Author

viirya commented Dec 31, 2019

ping @cloud-fan @HyukjinKwon

// Not all fs impl. support these APIs.
val optPermission = Try(fileStatus.getPermission())
val optAcls = Try(fs.getAclStatus(path).getEntries)

fs.delete(path, true)
Copy link
Contributor

@cloud-fan cloud-fan Jan 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not familiar with Hadoop FS APIs, but do we have something like rm -rf tablePath/*?. My point is, if we don't delete the parent folder, then we don't have this problem at all.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the FileSystem API, I didn't find one like that.

But I just search again, there is FileUtil API that seems working like that.

I will test tomorrow to see if it work well to keep permission/acl in a Spark cluster like current approach.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, I think FileUtil.fullyDeleteContents only works on local file (java.io.File), not for files on distributed file system like HDFS.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think there is no single API to do rm -rf tablePath/* in Hadoop FS, if I haven't missed anything.

We can do listStatus and delete all contents in the given directory. But it is inefficient for DFS and is easier to have error.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know how Hive/Presto implement TRUNCATE TABLE? Are there other file attributes we need to retain?

Copy link
Member Author

@viirya viirya Jan 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Hive, I try to trace the code path for TRUNCATE TABLE:

  1. HiveMetaStore. truncateTableInternal:

Get the locations of table/partitions and call Warehouse.deleteDir for each.

https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L3109

  1. Warehouse.deleteDir delegates to MetaStoreFS.deleteDir.
  2. HiveMetaStoreFsImpl (implements MetaStoreFS) delegates to FileUtils.moveToTrash which calls FileSystem.delete(f, true).

In HiveMetaStore.truncateTableInternal, after the location is deleted, new directory is created and previous file status including permissions, group, and ACLs are set to new directory:

https://github.com/apache/hive/blob/35f86c749cefc2a9972a991deed78a1c3719093d/standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/utils/HdfsUtils.java#L288

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I retain permission and ACLs. I was doing fs.setOwner in first commit for retaining path owner and group. But fs.setOwner throws exception because it is only doable by super user.

In Hive code, since it is code running the metastore server, I think it is running with enough permission to set owner/group. For Spark, we may not running it with super user.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the presto way seems safer to me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was concerned with the presto way on performance regression. For a table which has many files/directories, deleting them one by one could be a bottleneck.

If we add a config for retaining permission/ACLs when truncating table? Users can choose to disable it and directly delete top directory (current behavior) without retaining permission/ACLs.

case Failure(e) if e.isInstanceOf[UnsupportedOperationException] =>
log.warn(s"Can not set original permission $permission back to " +
s"the created path: $path. Exception: ${e.getMessage}")
case Failure(e) => throw e
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have to fail if we can't set the permission back?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In internal discussion about this issue, we think it is more dangerous to have wrong permission/ACLs silently on a secure cluster when truncate table. This failure tells users what is wrong and they could so something manually to fix it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, we can consider to add one config that users can choose to fail if we cannot preserve original permission/ACLs. For non secure cluster, it can be ignored.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me, I don't think we need a config. Shall we also fail if the FS can't set permission?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already fail if the FS cannot set permission. No?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

case Failure(e) if e.isInstanceOf[UnsupportedOperationException] =>
                log.warn(s"Can not set original permission $permission back to " +
                  s"the created path: $path. Exception: ${e.getMessage}")

it's just a warning

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see. You meant UnsupportedOperationException case. Hmm, setAcl also has UnsupportedOperationException case. I'm not sure if it is possible that we can get permission/ACLs, but setPermission/setAcl throws UnsupportedOperationException.

Logically, following above reason, it should be safer to fail it too if it is UnsupportedOperationException.

@SparkQA
Copy link

SparkQA commented Jan 6, 2020

Test build #116136 has finished for PR 26956 at commit 39fe234.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented Jan 6, 2020

retest this please

@SparkQA
Copy link

SparkQA commented Jan 6, 2020

Test build #116181 has finished for PR 26956 at commit 39fe234.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

@cloud-fan cloud-fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except one comment

try {
optPermission = Some(fileStatus.getPermission())
} catch {
case NonFatal(e) =>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan @HyukjinKwon By using try catch, I'm not sure what we should do for exception during getPermission/getAcls. I think we don't/can't do anything?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, I guess case NonFatal(_) => // do nothing is okay.

try {
optAcls = Some(fs.getAclStatus(path).getEntries)
} catch {
case NonFatal(e) =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto.

fs.setPermission(path, permission)
} catch {
case NonFatal(e) =>
throw new AnalysisException(
Copy link
Member

@dongjoon-hyun dongjoon-hyun Jan 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ur, AnalysisException looks weird a little. This is not Analysis error.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, Exception is too risky behavior change. (@gatorsmile ).
I guess warning might be enough?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ur, AnalysisException looks weird a little. This is not Analysis error.

Ah, this follows throwing AnalysisException at the outside try..catch block. We can change this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, Exception is too risky behavior change. (@gatorsmile ).
I guess warning might be enough?

For security reason, cannot set back original Acl/permission seems serious issue so Exception is thrown now. warning might be ignored and not noticed by users.

I'm ok with a warning if you all think it isn't an issue.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ur, AnalysisException looks weird a little. This is not Analysis error.

Maybe SparkException or RuntimeException?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • If this is for security, what about SecurityException?
  • If we need to raise an exception, shall we have a fallback configuration?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on previous discussion around #26956 (comment), we may not need a config.

SecurityException seems ok. cc @cloud-fan

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SecurityException sounds good

fs.setAcl(path, acls)
} catch {
case NonFatal(e) =>
throw new AnalysisException(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto.

@SparkQA
Copy link

SparkQA commented Jan 9, 2020

Test build #116329 has finished for PR 26956 at commit 7b29f2f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Thanks!

dongjoon-hyun pushed a commit that referenced this pull request Jan 11, 2020
…e table

### What changes were proposed in this pull request?

This patch proposes to preserve existing permission/acls of paths when truncate table/partition.
Note that this is backport of #26956 to branch-2.4.

### Why are the changes needed?

When Spark SQL truncates table, it deletes the paths of table/partitions, then re-create new ones. If permission/acls were set on the paths, the existing permission/acls will be deleted.

We should preserve the permission/acls if possible.

### Does this PR introduce any user-facing change?

Yes. When truncate table/partition, Spark will keep permission/acls of paths.

### How was this patch tested?

Unit test and manual test as shown in #26956.

Closes #27173 from viirya/truncate-table-permission-2.4.

Authored-by: Liang-Chi Hsieh <liangchi@uber.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Jan 11, 2020

Ur, @viirya . Our Jenkins with Hadoop-2.7/Hive 1.2 job seems to fail. Could you take a look?

Also, @shaneknapp , does our system have different umask?

@viirya
Copy link
Member Author

viirya commented Jan 11, 2020

@dongjoon-hyun Thanks for pinging me. Opened a followup at #27175.

@dongjoon-hyun
Copy link
Member

Thanks!

@@ -1988,6 +1988,14 @@ object SQLConf {
.booleanConf
.createWithDefault(false)

val TRUNCATE_TABLE_IGNORE_PERMISSION_ACL =
buildConf("spark.sql.truncateTable.ignorePermissionAcl")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-> spark.sql.truncateTable.ignorePermissionAcl.enabled?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I will create a followup for this. Thanks.

dongjoon-hyun pushed a commit that referenced this pull request Jan 16, 2020
### What changes were proposed in this pull request?

Based on the [comment](#26956 (comment)), this patch changes the SQL config name from `spark.sql.truncateTable.ignorePermissionAcl` to `spark.sql.truncateTable.ignorePermissionAcl.enabled`.

### Why are the changes needed?

Make this config consistent other SQL configs.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Unit test.

Closes #27210 from viirya/truncate-table-permission-followup.

Authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
dongjoon-hyun pushed a commit that referenced this pull request Jan 16, 2020
Based on the [comment](#26956 (comment)), this patch changes the SQL config name from `spark.sql.truncateTable.ignorePermissionAcl` to `spark.sql.truncateTable.ignorePermissionAcl.enabled`.

Make this config consistent other SQL configs.

No.

Unit test.

Closes #27210 from viirya/truncate-table-permission-followup.

Authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(cherry picked from commit be4d825)
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
dongjoon-hyun added a commit that referenced this pull request Jan 17, 2020
### What changes were proposed in this pull request?

This is a followup of #26956 to add a migration document for 2.4.5.

### Why are the changes needed?

New legacy configuration will restore the previous behavior safely.

### Does this PR introduce any user-facing change?

This PR updates the doc.

<img width="763" alt="screenshot" src="https://user-images.githubusercontent.com/9700541/72639939-9da5a400-391b-11ea-87b1-14bca15db5a6.png">

### How was this patch tested?

Build the document and see the change manually.

Closes #27269 from dongjoon-hyun/SPARK-30312.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
dongjoon-hyun added a commit that referenced this pull request Jan 17, 2020
This is a followup of #26956 to add a migration document for 2.4.5.

New legacy configuration will restore the previous behavior safely.

This PR updates the doc.

<img width="763" alt="screenshot" src="https://user-images.githubusercontent.com/9700541/72639939-9da5a400-391b-11ea-87b1-14bca15db5a6.png">

Build the document and see the change manually.

Closes #27269 from dongjoon-hyun/SPARK-30312.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(cherry picked from commit fdbded3)
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
dongjoon-hyun pushed a commit that referenced this pull request Feb 12, 2020
…ntries when setting up ACLs in truncate table

### What changes were proposed in this pull request?

This is a follow-up to the PR #26956. In #26956, the patch proposed to preserve path permission when truncating table. When setting up original ACLs, we need to set user/group/other permission as ACL entries too, otherwise if the path doesn't have default user/group/other ACL entries, ACL API will complain an error `Invalid ACL: the user, group and other entries are required.`.

 In short this change makes sure:

1. Permissions for user/group/other are always kept into ACLs to work with ACL API.
2. Other custom ACLs are still kept after TRUNCATE TABLE (#26956 did this).

### Why are the changes needed?

Without this fix, `TRUNCATE TABLE` will get an error when setting up ACLs if there is no default default user/group/other ACL entries.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

Update unit test.

Manual test on dev Spark cluster.

Set ACLs for a table path without default user/group/other ACL entries:
```
hdfs dfs -setfacl --set 'user:liangchi:rwx,user::rwx,group::r--,other::r--' /user/hive/warehouse/test.db/test_truncate_table

hdfs dfs -getfacl /user/hive/warehouse/test.db/test_truncate_table
# file: /user/hive/warehouse/test.db/test_truncate_table
# owner: liangchi
# group: supergroup
user::rwx
user:liangchi:rwx
group::r--
mask::rwx
other::r--
```
Then run `sql("truncate table test.test_truncate_table")`, it works by normally truncating the table and preserve ACLs.

Closes #27548 from viirya/fix-truncate-table-permission.

Lead-authored-by: Liang-Chi Hsieh <liangchi@uber.com>
Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
dongjoon-hyun pushed a commit that referenced this pull request Feb 12, 2020
…ntries when setting up ACLs in truncate table

### What changes were proposed in this pull request?

This is a follow-up to the PR #26956. In #26956, the patch proposed to preserve path permission when truncating table. When setting up original ACLs, we need to set user/group/other permission as ACL entries too, otherwise if the path doesn't have default user/group/other ACL entries, ACL API will complain an error `Invalid ACL: the user, group and other entries are required.`.

 In short this change makes sure:

1. Permissions for user/group/other are always kept into ACLs to work with ACL API.
2. Other custom ACLs are still kept after TRUNCATE TABLE (#26956 did this).

### Why are the changes needed?

Without this fix, `TRUNCATE TABLE` will get an error when setting up ACLs if there is no default default user/group/other ACL entries.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

Update unit test.

Manual test on dev Spark cluster.

Set ACLs for a table path without default user/group/other ACL entries:
```
hdfs dfs -setfacl --set 'user:liangchi:rwx,user::rwx,group::r--,other::r--' /user/hive/warehouse/test.db/test_truncate_table

hdfs dfs -getfacl /user/hive/warehouse/test.db/test_truncate_table
# file: /user/hive/warehouse/test.db/test_truncate_table
# owner: liangchi
# group: supergroup
user::rwx
user:liangchi:rwx
group::r--
mask::rwx
other::r--
```
Then run `sql("truncate table test.test_truncate_table")`, it works by normally truncating the table and preserve ACLs.

Closes #27548 from viirya/fix-truncate-table-permission.

Lead-authored-by: Liang-Chi Hsieh <liangchi@uber.com>
Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(cherry picked from commit 5b76367)
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
dongjoon-hyun pushed a commit that referenced this pull request Feb 12, 2020
…ntries when setting up ACLs in truncate table

### What changes were proposed in this pull request?

This is a follow-up to the PR #26956. In #26956, the patch proposed to preserve path permission when truncating table. When setting up original ACLs, we need to set user/group/other permission as ACL entries too, otherwise if the path doesn't have default user/group/other ACL entries, ACL API will complain an error `Invalid ACL: the user, group and other entries are required.`.

 In short this change makes sure:

1. Permissions for user/group/other are always kept into ACLs to work with ACL API.
2. Other custom ACLs are still kept after TRUNCATE TABLE (#26956 did this).

### Why are the changes needed?

Without this fix, `TRUNCATE TABLE` will get an error when setting up ACLs if there is no default default user/group/other ACL entries.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

Update unit test.

Manual test on dev Spark cluster.

Set ACLs for a table path without default user/group/other ACL entries:
```
hdfs dfs -setfacl --set 'user:liangchi:rwx,user::rwx,group::r--,other::r--' /user/hive/warehouse/test.db/test_truncate_table

hdfs dfs -getfacl /user/hive/warehouse/test.db/test_truncate_table
# file: /user/hive/warehouse/test.db/test_truncate_table
# owner: liangchi
# group: supergroup
user::rwx
user:liangchi:rwx
group::r--
mask::rwx
other::r--
```
Then run `sql("truncate table test.test_truncate_table")`, it works by normally truncating the table and preserve ACLs.

Closes #27548 from viirya/fix-truncate-table-permission.

Lead-authored-by: Liang-Chi Hsieh <liangchi@uber.com>
Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(cherry picked from commit 5b76367)
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
dongjoon-hyun pushed a commit that referenced this pull request Mar 16, 2020
…ndle non-existed path

### What changes were proposed in this pull request?

This fix #26956
Wrap try-catch on `fs.getFileStatus(path)` within acl/permission in case of the path doesn't exist.

### Why are the changes needed?

`truncate table` may fail to re-create path in case of interruption or something else. As a result, next time we `truncate table` on the same table with acl/permission, it will fail due to `FileNotFoundException`. And it also brings behavior change compares to previous Spark version, which could still `truncate table` successfully even if the path doesn't exist.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Added UT.

Closes #27923 from Ngone51/fix_truncate.

Authored-by: yi.wu <yi.wu@databricks.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
dongjoon-hyun pushed a commit that referenced this pull request Mar 16, 2020
…ndle non-existed path

### What changes were proposed in this pull request?

This fix #26956
Wrap try-catch on `fs.getFileStatus(path)` within acl/permission in case of the path doesn't exist.

### Why are the changes needed?

`truncate table` may fail to re-create path in case of interruption or something else. As a result, next time we `truncate table` on the same table with acl/permission, it will fail due to `FileNotFoundException`. And it also brings behavior change compares to previous Spark version, which could still `truncate table` successfully even if the path doesn't exist.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Added UT.

Closes #27923 from Ngone51/fix_truncate.

Authored-by: yi.wu <yi.wu@databricks.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit cb26f63)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
dongjoon-hyun pushed a commit that referenced this pull request Mar 16, 2020
…ndle non-existed path

### What changes were proposed in this pull request?

This fix #26956
Wrap try-catch on `fs.getFileStatus(path)` within acl/permission in case of the path doesn't exist.

### Why are the changes needed?

`truncate table` may fail to re-create path in case of interruption or something else. As a result, next time we `truncate table` on the same table with acl/permission, it will fail due to `FileNotFoundException`. And it also brings behavior change compares to previous Spark version, which could still `truncate table` successfully even if the path doesn't exist.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Added UT.

Closes #27923 from Ngone51/fix_truncate.

Authored-by: yi.wu <yi.wu@databricks.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit cb26f63)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
sjincho pushed a commit to sjincho/spark that referenced this pull request Apr 15, 2020
…ntries when setting up ACLs in truncate table

### What changes were proposed in this pull request?

This is a follow-up to the PR apache#26956. In apache#26956, the patch proposed to preserve path permission when truncating table. When setting up original ACLs, we need to set user/group/other permission as ACL entries too, otherwise if the path doesn't have default user/group/other ACL entries, ACL API will complain an error `Invalid ACL: the user, group and other entries are required.`.

 In short this change makes sure:

1. Permissions for user/group/other are always kept into ACLs to work with ACL API.
2. Other custom ACLs are still kept after TRUNCATE TABLE (apache#26956 did this).

### Why are the changes needed?

Without this fix, `TRUNCATE TABLE` will get an error when setting up ACLs if there is no default default user/group/other ACL entries.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

Update unit test.

Manual test on dev Spark cluster.

Set ACLs for a table path without default user/group/other ACL entries:
```
hdfs dfs -setfacl --set 'user:liangchi:rwx,user::rwx,group::r--,other::r--' /user/hive/warehouse/test.db/test_truncate_table

hdfs dfs -getfacl /user/hive/warehouse/test.db/test_truncate_table
# file: /user/hive/warehouse/test.db/test_truncate_table
# owner: liangchi
# group: supergroup
user::rwx
user:liangchi:rwx
group::r--
mask::rwx
other::r--
```
Then run `sql("truncate table test.test_truncate_table")`, it works by normally truncating the table and preserve ACLs.

Closes apache#27548 from viirya/fix-truncate-table-permission.

Lead-authored-by: Liang-Chi Hsieh <liangchi@uber.com>
Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
sjincho pushed a commit to sjincho/spark that referenced this pull request Apr 15, 2020
…ndle non-existed path

### What changes were proposed in this pull request?

This fix apache#26956
Wrap try-catch on `fs.getFileStatus(path)` within acl/permission in case of the path doesn't exist.

### Why are the changes needed?

`truncate table` may fail to re-create path in case of interruption or something else. As a result, next time we `truncate table` on the same table with acl/permission, it will fail due to `FileNotFoundException`. And it also brings behavior change compares to previous Spark version, which could still `truncate table` successfully even if the path doesn't exist.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Added UT.

Closes apache#27923 from Ngone51/fix_truncate.

Authored-by: yi.wu <yi.wu@databricks.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
@viirya viirya deleted the truncate-table-permission branch December 27, 2023 18:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants