-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-16313][SQL][BRANCH-1.6] Spark should not silently drop exceptions in file listing #14139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Wouldn't this create a ton of warnings during table creation? |
I think there will be one warning when we create a table. Or maybe there is no warning during table creation because the refresh is called lazily. |
Test build #62116 has finished for PR 14139 at commit
|
test this please |
Test build #62117 has finished for PR 14139 at commit
|
Let me see if we can have a flag to determine if we want to swallow the FNF (like what https://github.com/apache/spark/pull/13987/files does). |
Test build #62121 has finished for PR 14139 at commit
|
Test build #62123 has finished for PR 14139 at commit
|
Test build #62136 has finished for PR 14139 at commit
|
Test build #62142 has finished for PR 14139 at commit
|
test this please |
2 similar comments
test this please |
test this please |
tes this please |
Test build #3186 has finished for PR 14139 at commit
|
Test build #3187 has finished for PR 14139 at commit
|
Test build #62175 has finished for PR 14139 at commit
|
cc @marmbrus |
LGTM |
let me take another look to see if there is a better change. |
false | ||
} | ||
case _ => false | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is equivalent with val resolvedRelation = dataSource.resolveRelation(checkPathExist = false)
in 2.0 (https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala#L427).
@rxin I think this version is the minimal change. Since the partition discovery logic in inside HadoopFsRelation in 1.6 and the refresh is triggered by using lazy val, passing a flag down will introduce lots of changes. |
Test build #62287 has finished for PR 14139 at commit
|
Test build #62288 has finished for PR 14139 at commit
|
@@ -273,6 +273,20 @@ private[hive] class HiveMetastoreCatalog(val client: ClientInterface, hive: Hive | |||
serdeProperties = options) | |||
} | |||
|
|||
def hasPartitionColumns(relation: HadoopFsRelation): Boolean = { | |||
try { | |||
// Calling hadoopFsRelation.partitionColumns will trigger the refresh call of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd add to the comment that this is a hack for [SPARK-16313][SQL][BRANCH-1.6] Spark should not silently drop exceptions in file listing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
LGTM other than that. |
Test build #62330 has finished for PR 14139 at commit
|
Thank you! I am merging this PR to branch 1.6. |
…ons in file listing ## What changes were proposed in this pull request? Spark silently drops exceptions during file listing. This is a very bad behavior because it can mask legitimate errors and the resulting plan will silently have 0 rows. This patch changes it to not silently drop the errors. After making partition discovery not silently drop exceptions, HiveMetastoreCatalog can trigger partition discovery on empty tables, which cause FileNotFoundExceptions (these Exceptions were dropped by partition discovery silently). To address this issue, this PR introduces two **hacks** to workaround the issues. These two hacks try to avoid of triggering partition discovery on empty tables in HiveMetastoreCatalog. ## How was this patch tested? Manually tested. **Note: This is a backport of #13987 Author: Yin Huai <yhuai@databricks.com> Closes #14139 from yhuai/SPARK-16313-branch-1.6.
…ons in file listing ## What changes were proposed in this pull request? Spark silently drops exceptions during file listing. This is a very bad behavior because it can mask legitimate errors and the resulting plan will silently have 0 rows. This patch changes it to not silently drop the errors. After making partition discovery not silently drop exceptions, HiveMetastoreCatalog can trigger partition discovery on empty tables, which cause FileNotFoundExceptions (these Exceptions were dropped by partition discovery silently). To address this issue, this PR introduces two **hacks** to workaround the issues. These two hacks try to avoid of triggering partition discovery on empty tables in HiveMetastoreCatalog. ## How was this patch tested? Manually tested. **Note: This is a backport of apache#13987 Author: Yin Huai <yhuai@databricks.com> Closes apache#14139 from yhuai/SPARK-16313-branch-1.6. (cherry picked from commit 6ea7d4b)
What changes were proposed in this pull request?
Spark silently drops exceptions during file listing. This is a very bad behavior because it can mask legitimate errors and the resulting plan will silently have 0 rows. This patch changes it to not silently drop the errors.
After making partition discovery not silently drop exceptions, HiveMetastoreCatalog can trigger partition discovery on empty tables, which cause FileNotFoundExceptions (these Exceptions were dropped by partition discovery silently). To address this issue, this PR introduces two hacks to workaround the issues. These two hacks try to avoid of triggering partition discovery on empty tables in HiveMetastoreCatalog.
How was this patch tested?
Manually tested.
Note: This is a backport of #13987