Skip to content

[SPARK-16313][SQL][BRANCH-1.6] Spark should not silently drop exceptions in file listing #14139

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 7 commits into from
Closed

Conversation

yhuai
Copy link
Contributor

@yhuai yhuai commented Jul 11, 2016

What changes were proposed in this pull request?

Spark silently drops exceptions during file listing. This is a very bad behavior because it can mask legitimate errors and the resulting plan will silently have 0 rows. This patch changes it to not silently drop the errors.

After making partition discovery not silently drop exceptions, HiveMetastoreCatalog can trigger partition discovery on empty tables, which cause FileNotFoundExceptions (these Exceptions were dropped by partition discovery silently). To address this issue, this PR introduces two hacks to workaround the issues. These two hacks try to avoid of triggering partition discovery on empty tables in HiveMetastoreCatalog.

How was this patch tested?

Manually tested.

Note: This is a backport of #13987

@rxin
Copy link
Contributor

rxin commented Jul 11, 2016

Wouldn't this create a ton of warnings during table creation?

@yhuai
Copy link
Contributor Author

yhuai commented Jul 11, 2016

I think there will be one warning when we create a table. Or maybe there is no warning during table creation because the refresh is called lazily.

@SparkQA
Copy link

SparkQA commented Jul 11, 2016

Test build #62116 has finished for PR 14139 at commit 674999d.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yhuai
Copy link
Contributor Author

yhuai commented Jul 11, 2016

test this please

@SparkQA
Copy link

SparkQA commented Jul 11, 2016

Test build #62117 has finished for PR 14139 at commit 674999d.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yhuai
Copy link
Contributor Author

yhuai commented Jul 11, 2016

Let me see if we can have a flag to determine if we want to swallow the FNF (like what https://github.com/apache/spark/pull/13987/files does).

@SparkQA
Copy link

SparkQA commented Jul 11, 2016

Test build #62121 has finished for PR 14139 at commit 308aef5.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 12, 2016

Test build #62123 has finished for PR 14139 at commit 6708e9b.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 12, 2016

Test build #62136 has finished for PR 14139 at commit b98ee09.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 12, 2016

Test build #62142 has finished for PR 14139 at commit 5aaa96b.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yhuai
Copy link
Contributor Author

yhuai commented Jul 12, 2016

test this please

2 similar comments
@yhuai
Copy link
Contributor Author

yhuai commented Jul 12, 2016

test this please

@yhuai
Copy link
Contributor Author

yhuai commented Jul 12, 2016

test this please

@yhuai
Copy link
Contributor Author

yhuai commented Jul 12, 2016

tes this please

@SparkQA
Copy link

SparkQA commented Jul 12, 2016

Test build #3186 has finished for PR 14139 at commit 5aaa96b.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 12, 2016

Test build #3187 has finished for PR 14139 at commit 5aaa96b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 12, 2016

Test build #62175 has finished for PR 14139 at commit 5aaa96b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yhuai
Copy link
Contributor Author

yhuai commented Jul 12, 2016

cc @marmbrus

@marmbrus
Copy link
Contributor

LGTM

@yhuai
Copy link
Contributor Author

yhuai commented Jul 13, 2016

let me take another look to see if there is a better change.

false
}
case _ => false
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is equivalent with val resolvedRelation = dataSource.resolveRelation(checkPathExist = false) in 2.0 (https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala#L427).

@yhuai
Copy link
Contributor Author

yhuai commented Jul 13, 2016

@rxin I think this version is the minimal change. Since the partition discovery logic in inside HadoopFsRelation in 1.6 and the refresh is triggered by using lazy val, passing a flag down will introduce lots of changes.

@SparkQA
Copy link

SparkQA commented Jul 14, 2016

Test build #62287 has finished for PR 14139 at commit 5d66df7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 14, 2016

Test build #62288 has finished for PR 14139 at commit 82d3711.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -273,6 +273,20 @@ private[hive] class HiveMetastoreCatalog(val client: ClientInterface, hive: Hive
serdeProperties = options)
}

def hasPartitionColumns(relation: HadoopFsRelation): Boolean = {
try {
// Calling hadoopFsRelation.partitionColumns will trigger the refresh call of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add to the comment that this is a hack for [SPARK-16313][SQL][BRANCH-1.6] Spark should not silently drop exceptions in file listing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@rxin
Copy link
Contributor

rxin commented Jul 14, 2016

LGTM other than that.

@SparkQA
Copy link

SparkQA commented Jul 14, 2016

Test build #62330 has finished for PR 14139 at commit 072aa3f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yhuai
Copy link
Contributor Author

yhuai commented Jul 14, 2016

Thank you! I am merging this PR to branch 1.6.

asfgit pushed a commit that referenced this pull request Jul 14, 2016
…ons in file listing

## What changes were proposed in this pull request?
Spark silently drops exceptions during file listing. This is a very bad behavior because it can mask legitimate errors and the resulting plan will silently have 0 rows. This patch changes it to not silently drop the errors.

After making partition discovery not silently drop exceptions, HiveMetastoreCatalog can trigger partition discovery on empty tables, which cause FileNotFoundExceptions (these Exceptions were dropped by partition discovery silently). To address this issue, this PR introduces two **hacks** to workaround the issues. These two hacks try to avoid of triggering partition discovery on empty tables in HiveMetastoreCatalog.

## How was this patch tested?
Manually tested.

**Note: This is a backport of #13987

Author: Yin Huai <yhuai@databricks.com>

Closes #14139 from yhuai/SPARK-16313-branch-1.6.
@yhuai yhuai closed this Jul 14, 2016
zzcclp pushed a commit to zzcclp/spark that referenced this pull request Jul 15, 2016
…ons in file listing

## What changes were proposed in this pull request?
Spark silently drops exceptions during file listing. This is a very bad behavior because it can mask legitimate errors and the resulting plan will silently have 0 rows. This patch changes it to not silently drop the errors.

After making partition discovery not silently drop exceptions, HiveMetastoreCatalog can trigger partition discovery on empty tables, which cause FileNotFoundExceptions (these Exceptions were dropped by partition discovery silently). To address this issue, this PR introduces two **hacks** to workaround the issues. These two hacks try to avoid of triggering partition discovery on empty tables in HiveMetastoreCatalog.

## How was this patch tested?
Manually tested.

**Note: This is a backport of apache#13987

Author: Yin Huai <yhuai@databricks.com>

Closes apache#14139 from yhuai/SPARK-16313-branch-1.6.

(cherry picked from commit 6ea7d4b)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants