Pysql #359

ahirreddy · 2014-04-08T08:36:07Z

No description provided.

AmplabJenkins · 2014-04-08T08:37:23Z

Merged build triggered.

AmplabJenkins · 2014-04-08T08:37:29Z

Merged build started.

AmplabJenkins · 2014-04-08T08:39:25Z

Merged build finished.

AmplabJenkins · 2014-04-08T08:39:25Z

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13890/

The code introduced in apache#359 used Hadoop's WritableUtils.clone() to duplicate objects when reading from Hadoop files. Some users have reported exceptions when cloning data in verious file formats, including Avro and another custom format. This patch removes that functionality to ensure stability for the 0.9 release. Instead, it puts a clear warning in the documentation that copying may be necessary for Hadoop data sets.

Remove Hadoop object cloning and warn users making Hadoop RDD's. The code introduced in apache#359 used Hadoop's WritableUtils.clone() to duplicate objects when reading from Hadoop files. Some users have reported exceptions when cloning data in various file formats, including Avro and another custom format. This patch removes that functionality to ensure stability for the 0.9 release. Instead, it puts a clear warning in the documentation that copying may be necessary for Hadoop data sets.

Remove Hadoop object cloning and warn users making Hadoop RDD's. The code introduced in apache#359 used Hadoop's WritableUtils.clone() to duplicate objects when reading from Hadoop files. Some users have reported exceptions when cloning data in various file formats, including Avro and another custom format. This patch removes that functionality to ensure stability for the 0.9 release. Instead, it puts a clear warning in the documentation that copying may be necessary for Hadoop data sets. (cherry picked from commit c319617) Conflicts: core/src/main/scala/org/apache/spark/api/java/JavaSparkContext.scala

This commit tries to solve issue #359 by allowing the `spark.executor.cores` configuration key to take fractional values, e.g., 0.5 or 1.5. The value is used to specify the cpu request when creating the executor pods, which is allowed to be fractional by Kubernetes. When the value is passed to the executor process through the environment variable `SPARK_EXECUTOR_CORES`, the value is rounded up to the closest integer as required by the `CoarseGrainedExecutorBackend`. Signed-off-by: Yinan Li <ynli@google.com>(cherry picked from commit 6f6cfd6)

* Allow spark driver find shuffle pods in specified namespace The conf property spark.kubernetes.shuffle.namespace is used to specify the namesapce of shuffle pods. In normal cases, only one "shuffle daemonset" is deployed and shared by all spark pods. The spark driver should be able to list and watch shuffle pods in the namespace specified by user. Note: by default, spark driver pod doesn't have authority to list and watch shuffle pods in another namespace. Some action is needed to grant it the authority. For example, below ABAC policy works. ``` {"apiVersion": "abac.authorization.kubernetes.io/v1beta1", "kind": "Policy", "spec": {"group": "system:serviceaccounts", "namespace": "SHUFFLE_NAMESPACE", "resource": "pods", "readonly": true}} ``` (cherry picked from commit a6291c6) * Bypass init-containers when possible (cherry picked from commit 08fe944) * Config for hard cpu limit on pods; default unlimited (cherry picked from commit 8b3248f) * Allow number of executor cores to have fractional values This commit tries to solve issue #359 by allowing the `spark.executor.cores` configuration key to take fractional values, e.g., 0.5 or 1.5. The value is used to specify the cpu request when creating the executor pods, which is allowed to be fractional by Kubernetes. When the value is passed to the executor process through the environment variable `SPARK_EXECUTOR_CORES`, the value is rounded up to the closest integer as required by the `CoarseGrainedExecutorBackend`. Signed-off-by: Yinan Li <ynli@google.com>(cherry picked from commit 6f6cfd6) * Python Bindings for launching PySpark Jobs from the JVM * Adding PySpark Submit functionality. Launching Python from JVM * Addressing scala idioms related to PR351 * Removing extends Logging which was necessary for LogInfo * Refactored code to leverage the ContainerLocalizedFileResolver * Modified Unit tests so that they would pass * Modified Unit Test input to pass Unit Tests * Setup working environent for integration tests for PySpark * Comment out Python thread logic until Jenkins has python in Python * Modifying PythonExec to pass on Jenkins * Modifying python exec * Added unit tests to ClientV2 and refactored to include pyspark submission resources * Modified unit test check * Scalastyle * PR 348 file conflicts * Refactored unit tests and styles * further scala stylzing and logic * Modified unit tests to be more specific towards Class in question * Removed space delimiting for methods * Submission client redesign to use a step-based builder pattern. This change overhauls the underlying architecture of the submission client, but it is intended to entirely preserve existing behavior of Spark applications. Therefore users will find this to be an invisible change. The philosophy behind this design is to reconsider the breakdown of the submission process. It operates off the abstraction of "submission steps", which are transformation functions that take the previous state of the driver and return the new state of the driver. The driver's state includes its Spark configurations and the Kubernetes resources that will be used to deploy it. Such a refactor moves away from a features-first API design, which considers different containers to serve a set of features. The previous design, for example, had a container files resolver API object that returned different resolutions of the dependencies added by the user. However, it was up to the main Client to know how to intelligently invoke all of those APIs. Therefore the API surface area of the file resolver became untenably large and it was not intuitive of how it was to be used or extended. This design changes the encapsulation layout; every module is now responsible for changing the driver specification directly. An orchestrator builds the correct chain of steps and hands it to the client, which then calls it verbatim. The main client then makes any final modifications that put the different pieces of the driver together, particularly to attach the driver container itself to the pod and to apply the Spark configuration as command-line arguments. * Don't add the init-container step if all URIs are local. * Python arguments patch + tests + docs * Revert "Python arguments patch + tests + docs" This reverts commit 4533df2. * Revert "Don't add the init-container step if all URIs are local." This reverts commit e103225. * Revert "Submission client redesign to use a step-based builder pattern." This reverts commit 5499f6d. * style changes * space for styling (cherry picked from commit befcf0a) Conflicts: README.md core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala * Submission client redesign to use a step-based builder pattern * Submission client redesign to use a step-based builder pattern. This change overhauls the underlying architecture of the submission client, but it is intended to entirely preserve existing behavior of Spark applications. Therefore users will find this to be an invisible change. The philosophy behind this design is to reconsider the breakdown of the submission process. It operates off the abstraction of "submission steps", which are transformation functions that take the previous state of the driver and return the new state of the driver. The driver's state includes its Spark configurations and the Kubernetes resources that will be used to deploy it. Such a refactor moves away from a features-first API design, which considers different containers to serve a set of features. The previous design, for example, had a container files resolver API object that returned different resolutions of the dependencies added by the user. However, it was up to the main Client to know how to intelligently invoke all of those APIs. Therefore the API surface area of the file resolver became untenably large and it was not intuitive of how it was to be used or extended. This design changes the encapsulation layout; every module is now responsible for changing the driver specification directly. An orchestrator builds the correct chain of steps and hands it to the client, which then calls it verbatim. The main client then makes any final modifications that put the different pieces of the driver together, particularly to attach the driver container itself to the pod and to apply the Spark configuration as command-line arguments. * Add a unit test for BaseSubmissionStep. * Add unit test for kubernetes credentials mounting. * Add unit test for InitContainerBootstrapStep. * unit tests for initContainer * Add a unit test for DependencyResolutionStep. * further modifications to InitContainer unit tests * Use of resolver in PythonStep and unit tests for PythonStep * refactoring of init unit tests and pythonstep resolver logic * Add unit test for KubernetesSubmissionStepsOrchestrator. * refactoring and addition of secret trustStore+Cert checks in a SubmissionStepSuite * added SparkPodInitContainerBootstrapSuite * Added InitContainerResourceStagingServerSecretPluginSuite * style in Unit tests * extremely minor style fix in variable naming * Address comments. * Rename class for consistency. * Attempt to make spacing consistent. Multi-line methods should have four-space indentation for arguments that aren't on the same line as the method call itself... but this is difficult to do consistently given how IDEs handle Scala multi-line indentation in most cases. (cherry picked from commit 0f4368f) Conflicts: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/kubernetes/KubernetesClusterSchedulerBackend.scala * Add implicit conversions to imports. Otherwise we can get a Scalastyle error when building from SBT. (cherry picked from commit 7deaaa3) * Fix import order and scalastyle Test with ./dev/scalastyle * fix submit job errors (cherry picked from commit 8751a9a) * Add node selectors for driver and executor pods (cherry picked from commit 6dbd32e) * Retry binding server to random port in the resource staging server test. * Retry binding server to random port in the resource staging server test. * Break if successful start * Start server in try block. * FIx scalastyle * More rigorous cleanup logic. Increment port numbers. * Move around more exception logic. * More exception refactoring. * Remove whitespace * Fix test * Rename variable * Scalastyle fix

This commit tries to solve issue apache#359 by allowing the `spark.executor.cores` configuration key to take fractional values, e.g., 0.5 or 1.5. The value is used to specify the cpu request when creating the executor pods, which is allowed to be fractional by Kubernetes. When the value is passed to the executor process through the environment variable `SPARK_EXECUTOR_CORES`, the value is rounded up to the closest integer as required by the `CoarseGrainedExecutorBackend`. Signed-off-by: Yinan Li <ynli@google.com>

…pache#359) This change change to use OSC commands for devstack default resources cleanup role since Ocata release, and in M and N releases, use OSC commands and Neutron CLI commands. Closes: theopenlab/openlab#101

ahirreddy added 18 commits April 6, 2014 15:00

compiling

b4bc82d

Java to python

b6f4feb

java to python, and python to java

5cb8dc0

Added schema rdd class

d2c60af

doesn't crash

949071b

working

9cb15c8

more working

730803e

even better

837bd13

yippie

224add8

Switched to using Scala SQLContext

f16524d

returning dictionaries works

d69594d

output dictionaries correctly

337ed16

return row objects

ed9e3b4

awesome row objects

2d44498

SchemaRDD now has all RDD operations

1f6e343

made jrdd explicitly lazy

ef91795

for now only allow dictionaries as input

ec5b6e6

added todo explaining cost of creating Row object in python

6c690e5

ahirreddy closed this Apr 8, 2014

mccheah pushed a commit to mccheah/spark that referenced this pull request Nov 28, 2018

Upgrade jackson to 2.9.5 (apache#359)

52a122f

mccheah pushed a commit to mccheah/spark that referenced this pull request Nov 28, 2018

Upgrade jackson to 2.9.5 (apache#359)

a21b1c2

peter-toth mentioned this pull request Jun 21, 2020

[SPARK-29375][SPARK-28940][SPARK-32041][SQL] Whole plan exchange and subquery reuse #28885

Closed

LuciferYang mentioned this pull request Dec 31, 2022

[SPARK-41802][BUILD] Upgrade Apache httpcore to 4.4.16 #39329

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Pysql #359

Pysql #359

Uh oh!

ahirreddy commented Apr 8, 2014

Uh oh!

AmplabJenkins commented Apr 8, 2014

Uh oh!

AmplabJenkins commented Apr 8, 2014

Uh oh!

AmplabJenkins commented Apr 8, 2014

Uh oh!

AmplabJenkins commented Apr 8, 2014

Uh oh!

Uh oh!

Pysql #359

Pysql #359

Uh oh!

Conversation

ahirreddy commented Apr 8, 2014

Uh oh!

AmplabJenkins commented Apr 8, 2014

Uh oh!

AmplabJenkins commented Apr 8, 2014

Uh oh!

AmplabJenkins commented Apr 8, 2014

Uh oh!

AmplabJenkins commented Apr 8, 2014

Uh oh!

Uh oh!