Robertk/versions #30

robert3005 · 2016-09-23T17:32:40Z

No description provided.

…it jars (#30) * Revamp ports and service setup for the driver. - Expose the driver-submission service on NodePort and contact that as opposed to going through the API server proxy - Restrict the ports that are exposed on the service to only the driver submission service when uploading content and then only the Spark UI after the job has started * Move service creation down and more thorough error handling * Fix missed merge conflict * Add braces * Fix bad merge * Address comments and refactor run() more. Method nesting was getting confusing so pulled out the inner class and removed the extra method indirection from createDriverPod() * Remove unused method * Support SSL configuration for the driver application submission (#49) * Support SSL when setting up the driver. The user can provide a keyStore to load onto the driver pod and the driver pod will use that keyStore to set up SSL on its server. * Clean up SSL secrets after finishing submission. We don't need to persist these after the pod has them mounted and is running already. * Fix compilation error * Revert image change * Address comments * Programmatically generate certificates for integration tests. * Address comments * Resolve merge conflicts * Fix bad merge * Remove unnecessary braces * Fix compiler error

…enStageId ### What changes were proposed in this pull request? Spark SQL's whole-stage codegen (WSCG) supports dumping the generated code to help with debugging. One way to get the generated code is through `df.queryExecution.debug.codegen`, or SQL `EXPLAIN CODEGEN` statement. The generated code is currently printed without specific ordering, which can make debugging a bit annoying. This PR makes a minor improvement to sort the codegen dump by the `codegenStageId`, ascending. After this change, the following query: ```scala spark.range(10).agg(sum('id)).queryExecution.debug.codegen ``` will always dump the generated code in a natural, stable order. A version of this example with shorter output is: ``` spark.range(10).agg(sum('id)).queryExecution.debug.codegenToSeq.map(_._1).foreach(println) *(1) HashAggregate(keys=[], functions=[partial_sum(id#8L)], output=[sum#15L]) +- *(1) Range (0, 10, step=1, splits=16) *(2) HashAggregate(keys=[], functions=[sum(id#8L)], output=[sum(id)#12L]) +- Exchange SinglePartition, true, [id=palantir#30] +- *(1) HashAggregate(keys=[], functions=[partial_sum(id#8L)], output=[sum#15L]) +- *(1) Range (0, 10, step=1, splits=16) ``` The number of codegen stages within a single SQL query tends to be very small, most likely < 50, so the overhead of adding the sorting shouldn't be significant. ### Why are the changes needed? Minor improvement to aid WSCG debugging. ### Does this PR introduce any user-facing change? No user-facing change for end-users; minor change for developers who debug WSCG generated code. ### How was this patch tested? Manually tested the output; all other tests still pass. Closes apache#27955 from rednaxelafx/codegen. Authored-by: Kris Mok <kris.mok@databricks.com> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org> (cherry picked from commit a177628) Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>

* Change policy * Move policy bot file * Add changelog * Delete policy.yml * Revert "Delete policy.yml" This reverts commit cd84cbabc0ae9504a57907532d5d03867a7e926d. * Delete changelog.yml

Robert Kruszewski added 2 commits September 23, 2016 12:14

publish at the end

c01b658

faster

af474d9

robert3005 merged commit 7d0a22b into master Sep 23, 2016

robert3005 deleted the robertk/versions branch September 23, 2016 17:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Robertk/versions #30

Robertk/versions #30

Uh oh!

robert3005 commented Sep 23, 2016

Uh oh!

Uh oh!

Robertk/versions #30

Robertk/versions #30

Uh oh!

Conversation

robert3005 commented Sep 23, 2016

Uh oh!

Uh oh!