Skip to content

[SPARK-16313][SQL] Spark should not silently drop exceptions in file listing #13987

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 10 commits into from

Conversation

rxin
Copy link
Contributor

@rxin rxin commented Jun 30, 2016

What changes were proposed in this pull request?

Spark silently drops exceptions during file listing. This is a very bad behavior because it can mask legitimate errors and the resulting plan will silently have 0 rows. This patch changes it to not silently drop the errors.

How was this patch tested?

Manually verified.

@yhuai
Copy link
Contributor

yhuai commented Jun 30, 2016

LGTM

@SparkQA
Copy link

SparkQA commented Jun 30, 2016

Test build #61521 has finished for PR 13987 at commit f3eb4fb.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

cachedLeafFiles
}

override protected def leafDirToChildrenFiles: Map[Path, Array[FileStatus]] = {
if (cachedLeafDirToChildrenFiles eq null) {
refresh()
Copy link
Contributor

@clockfly clockfly Jun 30, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a side effect. refresh() reset the cachedPartitionSpec to null, which may clear already-inferred partition information.

  override def refresh(): Unit = {
    val files = listLeafFiles(paths)
    cachedLeafFiles =
      new mutable.LinkedHashMap[Path, FileStatus]() ++= files.map(f => f.getPath -> f)
    cachedLeafDirToChildrenFiles = files.toArray.groupBy(_.getPath.getParent)
    cachedPartitionSpec = null
  }

@SparkQA
Copy link

SparkQA commented Jun 30, 2016

Test build #61526 has finished for PR 13987 at commit dbf9e58.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor Author

rxin commented Jun 30, 2016

@yhuai this is the 3rd attempt -- in this one I changed it so we would ignore FileNotFoundException if a flag is set.

@SparkQA
Copy link

SparkQA commented Jun 30, 2016

Test build #61528 has finished for PR 13987 at commit bd2040a.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 30, 2016

Test build #61529 has finished for PR 13987 at commit 8383fb4.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 30, 2016

Test build #61531 has finished for PR 13987 at commit b545422.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 30, 2016

Test build #3154 has finished for PR 13987 at commit b545422.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 30, 2016

Test build #61558 has finished for PR 13987 at commit 7064c36.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 30, 2016

Test build #61571 has finished for PR 13987 at commit 6cf0e8c.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 30, 2016

Test build #61573 has finished for PR 13987 at commit 2dc3e84.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor Author

rxin commented Jun 30, 2016

Merging in master/2.0.

@asfgit asfgit closed this in 3d75a5b Jun 30, 2016
asfgit pushed a commit that referenced this pull request Jun 30, 2016
…listing

## What changes were proposed in this pull request?
Spark silently drops exceptions during file listing. This is a very bad behavior because it can mask legitimate errors and the resulting plan will silently have 0 rows. This patch changes it to not silently drop the errors.

## How was this patch tested?
Manually verified.

Author: Reynold Xin <rxin@databricks.com>

Closes #13987 from rxin/SPARK-16313.

(cherry picked from commit 3d75a5b)
Signed-off-by: Reynold Xin <rxin@databricks.com>
@SparkQA
Copy link

SparkQA commented Jul 1, 2016

Test build #3156 has finished for PR 13987 at commit 2dc3e84.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

asfgit pushed a commit that referenced this pull request Jul 14, 2016
…ons in file listing

## What changes were proposed in this pull request?
Spark silently drops exceptions during file listing. This is a very bad behavior because it can mask legitimate errors and the resulting plan will silently have 0 rows. This patch changes it to not silently drop the errors.

After making partition discovery not silently drop exceptions, HiveMetastoreCatalog can trigger partition discovery on empty tables, which cause FileNotFoundExceptions (these Exceptions were dropped by partition discovery silently). To address this issue, this PR introduces two **hacks** to workaround the issues. These two hacks try to avoid of triggering partition discovery on empty tables in HiveMetastoreCatalog.

## How was this patch tested?
Manually tested.

**Note: This is a backport of #13987

Author: Yin Huai <yhuai@databricks.com>

Closes #14139 from yhuai/SPARK-16313-branch-1.6.
zzcclp pushed a commit to zzcclp/spark that referenced this pull request Jul 15, 2016
…ons in file listing

## What changes were proposed in this pull request?
Spark silently drops exceptions during file listing. This is a very bad behavior because it can mask legitimate errors and the resulting plan will silently have 0 rows. This patch changes it to not silently drop the errors.

After making partition discovery not silently drop exceptions, HiveMetastoreCatalog can trigger partition discovery on empty tables, which cause FileNotFoundExceptions (these Exceptions were dropped by partition discovery silently). To address this issue, this PR introduces two **hacks** to workaround the issues. These two hacks try to avoid of triggering partition discovery on empty tables in HiveMetastoreCatalog.

## How was this patch tested?
Manually tested.

**Note: This is a backport of apache#13987

Author: Yin Huai <yhuai@databricks.com>

Closes apache#14139 from yhuai/SPARK-16313-branch-1.6.

(cherry picked from commit 6ea7d4b)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants