Skip to content

[SPARK-26343][KUBERNETES] Try to speed up running local k8s integration tests #23380

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
# See the License for the specific language governing permissions and
# limitations under the License.
#
set -ex
TEST_ROOT_DIR=$(git rev-parse --show-toplevel)
UNPACKED_SPARK_TGZ="$TEST_ROOT_DIR/target/spark-dist-unpacked"
IMAGE_TAG_OUTPUT_FILE="$TEST_ROOT_DIR/target/image-tag.txt"
Expand Down Expand Up @@ -58,50 +59,59 @@ while (( "$#" )); do
shift
done

if [[ $SPARK_TGZ == "N/A" ]];
rm -rf "$UNPACKED_SPARK_TGZ"
if [[ $SPARK_TGZ == "N/A" && $IMAGE_TAG == "N/A" ]];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is too strict. You should be able to build images from the build directory too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So @vanzin is the idea that if neither a SPARK_TGZ or an IMAGE_TAG is specified we'd dynamically build the image on request? Or what do you mean?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be my expectation.

Having to build a tgz before running the ITs slows down the process. So does manually having to build the images before running the ITs.

If invoking the ITs means "I'll test whatever version of Spark is currently built in your build directory", it speeds up everything.

(Kinda what running the ITs through sbt does.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds reasonable, I'll switch that over.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still want to special case this option I think, but it now runs successfully without an image tag or an release tgz specified. cc @vanzin

then
echo "Must specify a Spark tarball to build Docker images against with --spark-tgz." && exit 1;
# If there is no spark image tag to test with and no src dir, build from current
SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
SPARK_INPUT_DIR="$(cd "$SCRIPT_DIR/"../../../../ >/dev/null 2>&1 && pwd )"
DOCKER_FILE_BASE_PATH="$SPARK_INPUT_DIR/resource-managers/kubernetes/docker/src/main/dockerfiles/spark"
elif [[ $IMAGE_TAG == "N/A" ]];
then
# If there is a test src tarball and no image tag we will want to build from that
mkdir -p $UNPACKED_SPARK_TGZ
tar -xzvf $SPARK_TGZ --strip-components=1 -C $UNPACKED_SPARK_TGZ;
SPARK_INPUT_DIR="$UNPACKED_SPARK_TGZ"
DOCKER_FILE_BASE_PATH="$SPARK_INPUT_DIR/kubernetes/dockerfiles/spark"
fi

rm -rf $UNPACKED_SPARK_TGZ
mkdir -p $UNPACKED_SPARK_TGZ
tar -xzvf $SPARK_TGZ --strip-components=1 -C $UNPACKED_SPARK_TGZ;

# If there is a specific Spark image skip building and extraction/copy
if [[ $IMAGE_TAG == "N/A" ]];
then
IMAGE_TAG=$(uuidgen);
cd $UNPACKED_SPARK_TGZ
cd $SPARK_INPUT_DIR

# Build PySpark image
LANGUAGE_BINDING_BUILD_ARGS="-p $UNPACKED_SPARK_TGZ/kubernetes/dockerfiles/spark/bindings/python/Dockerfile"
LANGUAGE_BINDING_BUILD_ARGS="-p $DOCKER_FILE_BASE_PATH/bindings/python/Dockerfile"

# Build SparkR image
LANGUAGE_BINDING_BUILD_ARGS="$LANGUAGE_BINDING_BUILD_ARGS -R $UNPACKED_SPARK_TGZ/kubernetes/dockerfiles/spark/bindings/R/Dockerfile"
# Build SparkR image -- disabled since this fails, re-enable as part of SPARK-25152
# LANGUAGE_BINDING_BUILD_ARGS="$LANGUAGE_BINDING_BUILD_ARGS -R $DOCKER_FILE_BASE_PATH/bindings/R/Dockerfile"

case $DEPLOY_MODE in
cloud)
# Build images
$UNPACKED_SPARK_TGZ/bin/docker-image-tool.sh -r $IMAGE_REPO -t $IMAGE_TAG $LANGUAGE_BINDING_BUILD_ARGS build
$SPARK_INPUT_DIR/bin/docker-image-tool.sh -r $IMAGE_REPO -t $IMAGE_TAG $LANGUAGE_BINDING_BUILD_ARGS build

# Push images appropriately
if [[ $IMAGE_REPO == gcr.io* ]] ;
then
gcloud docker -- push $IMAGE_REPO/spark:$IMAGE_TAG
else
$UNPACKED_SPARK_TGZ/bin/docker-image-tool.sh -r $IMAGE_REPO -t $IMAGE_TAG push
$SPARK_INPUT_DIR/bin/docker-image-tool.sh -r $IMAGE_REPO -t $IMAGE_TAG push
fi
;;

docker-for-desktop)
# Only need to build as this will place it in our local Docker repo which is all
# we need for Docker for Desktop to work so no need to also push
$UNPACKED_SPARK_TGZ/bin/docker-image-tool.sh -r $IMAGE_REPO -t $IMAGE_TAG $LANGUAGE_BINDING_BUILD_ARGS build
$SPARK_INPUT_DIR/bin/docker-image-tool.sh -r $IMAGE_REPO -t $IMAGE_TAG $LANGUAGE_BINDING_BUILD_ARGS build
;;

minikube)
# Only need to build and if we do this with the -m option for minikube we will
# build the images directly using the minikube Docker daemon so no need to push
$UNPACKED_SPARK_TGZ/bin/docker-image-tool.sh -m -r $IMAGE_REPO -t $IMAGE_TAG $LANGUAGE_BINDING_BUILD_ARGS build
$SPARK_INPUT_DIR/bin/docker-image-tool.sh -m -r $IMAGE_REPO -t $IMAGE_TAG $LANGUAGE_BINDING_BUILD_ARGS build
;;
*)
echo "Unrecognized deploy mode $DEPLOY_MODE" && exit 1
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -103,8 +103,16 @@ class KubernetesSuite extends SparkFunSuite
System.clearProperty(key)
}

val sparkDirProp = System.getProperty(CONFIG_KEY_UNPACK_DIR)
require(sparkDirProp != null, "Spark home directory must be provided in system properties.")
val possible_spark_dirs = List(
// If someone specified the tgz for the tests look at the extraction dir
System.getProperty(CONFIG_KEY_UNPACK_DIR),
// Try the spark test home
sys.props("spark.test.home")
)
val sparkDirProp = possible_spark_dirs.filter(x =>
new File(Paths.get(x).toFile, "bin/spark-submit").exists).headOption.getOrElse(null)
require(sparkDirProp != null,
s"Spark home directory must be provided in system properties tested $possible_spark_dirs")
sparkHomeDir = Paths.get(sparkDirProp)
require(sparkHomeDir.toFile.isDirectory,
s"No directory found for spark home specified at $sparkHomeDir.")
Expand Down