-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-16313][SQL] Spark should not silently drop exceptions in file listing #13987
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
LGTM |
Test build #61521 has finished for PR 13987 at commit
|
cachedLeafFiles | ||
} | ||
|
||
override protected def leafDirToChildrenFiles: Map[Path, Array[FileStatus]] = { | ||
if (cachedLeafDirToChildrenFiles eq null) { | ||
refresh() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a side effect. refresh()
reset the cachedPartitionSpec
to null, which may clear already-inferred partition information.
override def refresh(): Unit = {
val files = listLeafFiles(paths)
cachedLeafFiles =
new mutable.LinkedHashMap[Path, FileStatus]() ++= files.map(f => f.getPath -> f)
cachedLeafDirToChildrenFiles = files.toArray.groupBy(_.getPath.getParent)
cachedPartitionSpec = null
}
Test build #61526 has finished for PR 13987 at commit
|
@yhuai this is the 3rd attempt -- in this one I changed it so we would ignore FileNotFoundException if a flag is set. |
Test build #61528 has finished for PR 13987 at commit
|
Test build #61529 has finished for PR 13987 at commit
|
Test build #61531 has finished for PR 13987 at commit
|
Test build #3154 has finished for PR 13987 at commit
|
Test build #61558 has finished for PR 13987 at commit
|
Test build #61571 has finished for PR 13987 at commit
|
Test build #61573 has finished for PR 13987 at commit
|
Merging in master/2.0. |
…listing ## What changes were proposed in this pull request? Spark silently drops exceptions during file listing. This is a very bad behavior because it can mask legitimate errors and the resulting plan will silently have 0 rows. This patch changes it to not silently drop the errors. ## How was this patch tested? Manually verified. Author: Reynold Xin <rxin@databricks.com> Closes #13987 from rxin/SPARK-16313. (cherry picked from commit 3d75a5b) Signed-off-by: Reynold Xin <rxin@databricks.com>
Test build #3156 has finished for PR 13987 at commit
|
…ons in file listing ## What changes were proposed in this pull request? Spark silently drops exceptions during file listing. This is a very bad behavior because it can mask legitimate errors and the resulting plan will silently have 0 rows. This patch changes it to not silently drop the errors. After making partition discovery not silently drop exceptions, HiveMetastoreCatalog can trigger partition discovery on empty tables, which cause FileNotFoundExceptions (these Exceptions were dropped by partition discovery silently). To address this issue, this PR introduces two **hacks** to workaround the issues. These two hacks try to avoid of triggering partition discovery on empty tables in HiveMetastoreCatalog. ## How was this patch tested? Manually tested. **Note: This is a backport of #13987 Author: Yin Huai <yhuai@databricks.com> Closes #14139 from yhuai/SPARK-16313-branch-1.6.
…ons in file listing ## What changes were proposed in this pull request? Spark silently drops exceptions during file listing. This is a very bad behavior because it can mask legitimate errors and the resulting plan will silently have 0 rows. This patch changes it to not silently drop the errors. After making partition discovery not silently drop exceptions, HiveMetastoreCatalog can trigger partition discovery on empty tables, which cause FileNotFoundExceptions (these Exceptions were dropped by partition discovery silently). To address this issue, this PR introduces two **hacks** to workaround the issues. These two hacks try to avoid of triggering partition discovery on empty tables in HiveMetastoreCatalog. ## How was this patch tested? Manually tested. **Note: This is a backport of apache#13987 Author: Yin Huai <yhuai@databricks.com> Closes apache#14139 from yhuai/SPARK-16313-branch-1.6. (cherry picked from commit 6ea7d4b)
What changes were proposed in this pull request?
Spark silently drops exceptions during file listing. This is a very bad behavior because it can mask legitimate errors and the resulting plan will silently have 0 rows. This patch changes it to not silently drop the errors.
How was this patch tested?
Manually verified.