Skip to content

[SPARK-14004][SQL] NamedExpressions should have at most one qualifier #11822

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

liancheng
Copy link
Contributor

What changes were proposed in this pull request?

This is a more aggressive version of PR #11820, which not only fixes the original problem, but also does the following updates to enforce the at-most-one-qualifier constraint:

  • Renames NamedExpression.qualifiers to NamedExpression.qualifier
  • Uses Option[String] rather than Seq[String] for NamedExpression.qualifier

Quoted PR description of #11820 here:

Current implementations of AttributeReference.sql and Alias.sql joins all available qualifiers, which is logically wrong. But this implementation mistake doesn't cause any real SQL generation bugs though, since there is always at most one qualifier for any given AttributeReference or Alias.

How was this patch tested?

Existing tests should be enough.

@liancheng
Copy link
Contributor Author

Personally, I had once been quite confused by the fact that NamedExpression.qualifiers is a Seq[String] and thought that attributes can be qualified with multiple qualifiers like db.table.column. That's why current version of AttributeReference.sql joins all qualifiers using . rather than picking the first one.

I believe it's safe and good to enforce the at-most-one-qualifier constraint at type level unless there do exist valid cases where using multiple qualifiers make sense but haven't been implemented in Spark SQL yet.

cc @marmbrus @rxin @yhuai

@SparkQA
Copy link

SparkQA commented Mar 18, 2016

Test build #53534 has finished for PR 11822 at commit c1f08b0.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 18, 2016

Test build #53540 has finished for PR 11822 at commit 59241ec.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@liancheng liancheng force-pushed the spark-14004-aggressive branch from 59241ec to cbef382 Compare March 19, 2016 05:01
@SparkQA
Copy link

SparkQA commented Mar 19, 2016

Test build #53601 has finished for PR 11822 at commit cbef382.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -1438,7 +1438,7 @@ object CleanupAliases extends Rule[LogicalPlan] {
def trimNonTopLevelAliases(e: Expression): Expression = e match {
case a: Alias =>
Alias(trimAliases(a.child), a.name)(
a.exprId, a.qualifiers, a.explicitMetadata, a.isGenerated)
a.exprId, a.qualifier, a.explicitMetadata, a.isGenerated)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not related to this PR, but we can simplify this to: a.withNewChildren(trimAliases(a.child) :: Nil)

@cloud-fan
Copy link
Contributor

Overall LGTM, I think it's more clear to have qualifier as a Option instead of Seq.

@marmbrus
Copy link
Contributor

LGTM pending tests. The original signature is from very early on when I thought you could refer to a table both by name and alias.

@SparkQA
Copy link

SparkQA commented Mar 21, 2016

Test build #53681 has finished for PR 11822 at commit 9d7b1db.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yhuai
Copy link
Contributor

yhuai commented Mar 21, 2016

Thanks! Merging to master.

@asfgit asfgit closed this in 5d8de16 Mar 21, 2016
@liancheng liancheng deleted the spark-14004-aggressive branch March 21, 2016 23:49
roygao94 pushed a commit to roygao94/spark that referenced this pull request Mar 22, 2016
## What changes were proposed in this pull request?

This is a more aggressive version of PR apache#11820, which not only fixes the original problem, but also does the following updates to enforce the at-most-one-qualifier constraint:

- Renames `NamedExpression.qualifiers` to `NamedExpression.qualifier`
- Uses `Option[String]` rather than `Seq[String]` for `NamedExpression.qualifier`

Quoted PR description of apache#11820 here:

> Current implementations of `AttributeReference.sql` and `Alias.sql` joins all available qualifiers, which is logically wrong. But this implementation mistake doesn't cause any real SQL generation bugs though, since there is always at most one qualifier for any given `AttributeReference` or `Alias`.

## How was this patch tested?

Existing tests should be enough.

Author: Cheng Lian <lian@databricks.com>

Closes apache#11822 from liancheng/spark-14004-aggressive.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants