Skip to content

[SPARK-18398][SQL] Fix nullabilities of MapObjects and ExternalMapToCatalyst. #15840

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 11 commits into from

Conversation

ueshin
Copy link
Member

@ueshin ueshin commented Nov 10, 2016

What changes were proposed in this pull request?

The nullabilities of MapObject can be made more strict by relying on inputObject.nullable and lambdaFunction.nullable.

Also ExternalMapToCatalyst.dataType can be made more strict by relying on valueConverter.nullable.

How was this patch tested?

Existing tests.

@SparkQA
Copy link

SparkQA commented Nov 10, 2016

Test build #68462 has finished for PR 15840 at commit b287ded.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@ueshin
Copy link
Member Author

ueshin commented Nov 10, 2016

Jenkins, retest this please.

@hvanhovell
Copy link
Contributor

On a whole this PR looks good. On a more general level, I would like to suggest that we should have a more generic way of preventing unneeded null-checks.

@SparkQA
Copy link

SparkQA commented Nov 10, 2016

Test build #68466 has finished for PR 15840 at commit b287ded.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@kiszk
Copy link
Member

kiszk commented Nov 10, 2016

Do we prepare a utility method like this in CodeGenerator?

def genNullableAssignment(genIsNull: boolean, isNull: boolean, lhs: String, rhs: String): String = {
  if (genIsNull) {
    s"""
      if ($isNull) {
        $lhs = null;
      } else {
        $lhs = rhs;
      }
    """
  } else {
    s"$lhs = $rhs;"
  }
}

@hvanhovell
Copy link
Contributor

@kiszk that could be a start. However I would like to take a step back, and see which nullability checking patterns are common, and provide generic utilities for them. I also think that the proper use of NullIntolerant could also lead to significant improvement.

@kiszk
Copy link
Member

kiszk commented Nov 10, 2016

@hvanhovell Thank for pointing out NullIntolerant. I did not know it. I also provide it as a starting point based on two instances that I saw today.
I agree that it would be good to add NullIntolerant to other classes and finding common patterns.

@ueshin
Copy link
Member Author

ueshin commented Nov 10, 2016

I found that we could use CodegenContext.nullSafeExec() for most case, which is not so generic but we can easily replace nullability checking for now.

@ueshin
Copy link
Member Author

ueshin commented Nov 11, 2016

@hvanhovell @kiszk I tried to use CodegenContext.nullSafeExec() in MapObjects as an example.
If you can bear with this for now, I'll apply it to other places to generate nullability checking codes.

@SparkQA
Copy link

SparkQA commented Nov 11, 2016

Test build #68509 has finished for PR 15840 at commit deffccd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@kiszk
Copy link
Member

kiszk commented Nov 11, 2016

Does this approach avoid to generate code convertedArray[loopIndex] = null; if lambdaFunction.nullable is true? Is it a design? I know that it works correctly in this case since an array allocation ensures that all of the elements have null.

I am curious that whether the current restriction (generate only if-then) in nullSafeExec() works in other places. What do you think?

@ueshin
Copy link
Member Author

ueshin commented Nov 12, 2016

@kiszk I have not checked all the case yet but I think the case that we need to generate else-clause doesn't match the case we discuss here.
Of course we can add the method you suggested when we find there is the case and let me know if you find it.

@kiszk
Copy link
Member

kiszk commented Nov 12, 2016

I agree that this case is fine.

@ueshin
Copy link
Member Author

ueshin commented Nov 14, 2016

I replaced nullability checking where we can use CodegenContext.nullSafeExec().

@ueshin
Copy link
Member Author

ueshin commented Nov 14, 2016

@hvanhovell The pattern basically I used to replace nullability checking here is like:

val eval = child.genCode(ctx)
ev.copy(code = s"""
  ${eval.code}
  boolean ${ev.isNull} = ${eval.isNull};
  ${ctx.javaType(dataType)} ${ev.value} = ${ctx.defaultValue(dataType)};
  if (!${ev.isNull}) {
      ${ev.value} = something;
  }
""")

into:

val eval = child.genCode(ctx)
ev.copy(code = s"""
  ${eval.code}
  ${ctx.javaType(dataType)} ${ev.value} = ${ctx.defaultValue(dataType)};
""" +
  ctx.nullSafeExec(child.nullable, eval.isNull)(s"""
    ${ev.value} = something;
  """), isNull = child.isNull)

Some of them are modified to fit each case.

Does this help you think about generic utilities?

@SparkQA
Copy link

SparkQA commented Nov 14, 2016

Test build #68607 has finished for PR 15840 at commit 0fc36a8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 14, 2016

Test build #68627 has finished for PR 15840 at commit 658c5e3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 14, 2016

Test build #68629 has finished for PR 15840 at commit ec0c55c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@ueshin
Copy link
Member Author

ueshin commented Nov 17, 2016

@hvanhovell I'd like to revert some of commits in this pr which replace null checking with ctx.nullSafeExec() to focus on the original purpose of this pr, and send another pr to discuss null checking way, use ctx.nullSafeExec() or provide generic utilities for them as you suggested.
What do you think?

@ueshin
Copy link
Member Author

ueshin commented Nov 21, 2016

@hvanhovell Could you review this pr again, please?

I reverted some commits which replace null checking with ctx.nullSafeExec() and then modified some null checkings to ${isNull} ? null : ${value} pattern in objects.scala because Janino can optimize codes like false ? a : b to b, which we found at #15901.
We expect isNull == "false" for expr.nullable = false (not always, though), so we can expect Janino will optimize them.

@hvanhovell
Copy link
Contributor

LGTM - pending jenkins.

Small question: is it really easier for Janino to optimize ternary operators than an if/else block?

@ueshin
Copy link
Member Author

ueshin commented Nov 21, 2016

@hvanhovell Ah, good question.

I checked the Janino code around IfStatement and I found Janino seems to optimize if/else block, too:
https://github.com/janino-compiler/janino/blob/22a7787e62037049e337cb1ce4064c29a4856022/janino/src/main/java/org/codehaus/janino/UnitCompiler.java#L2275
So we didn't need to replace them.

I'll revert them.

@ueshin
Copy link
Member Author

ueshin commented Nov 21, 2016

Finally the purpose of this pr is to fix nullabilities of MapObjects and ExternalMapToCatalyst, I'll update pr title and description.

@ueshin ueshin changed the title [SPARK-18398][SQL] Fix nullabilities of MapObjects and optimize not to check null if lambda is not nullable. [SPARK-18398][SQL] Fix nullabilities of MapObjects and ExternalMapToCatalyst. Nov 21, 2016
@hvanhovell
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Nov 21, 2016

Test build #68931 has finished for PR 15840 at commit 78d9e80.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 21, 2016

Test build #68933 has finished for PR 15840 at commit e777fbc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 21, 2016

Test build #68934 has finished for PR 15840 at commit e777fbc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@hvanhovell
Copy link
Contributor

Merging to master/2.1. Thanks!

asfgit pushed a commit that referenced this pull request Nov 21, 2016
…atalyst.

## What changes were proposed in this pull request?

The nullabilities of `MapObject` can be made more strict by relying on `inputObject.nullable` and `lambdaFunction.nullable`.

Also `ExternalMapToCatalyst.dataType` can be made more strict by relying on `valueConverter.nullable`.

## How was this patch tested?

Existing tests.

Author: Takuya UESHIN <ueshin@happy-camper.st>

Closes #15840 from ueshin/issues/SPARK-18398.

(cherry picked from commit 9f262ae)
Signed-off-by: Herman van Hovell <hvanhovell@databricks.com>
@asfgit asfgit closed this in 9f262ae Nov 21, 2016
uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
…atalyst.

## What changes were proposed in this pull request?

The nullabilities of `MapObject` can be made more strict by relying on `inputObject.nullable` and `lambdaFunction.nullable`.

Also `ExternalMapToCatalyst.dataType` can be made more strict by relying on `valueConverter.nullable`.

## How was this patch tested?

Existing tests.

Author: Takuya UESHIN <ueshin@happy-camper.st>

Closes apache#15840 from ueshin/issues/SPARK-18398.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants