Skip to content

Bugfix: remove usage of "userClassPathFirst" properties [was: Investigate delta.io integration] #354

Closed
@razvan

Description

@razvan

Description

Users have reported that it's not possible to dynamically provision delta.io packages to use with PySpark.

The erroneous behavior can be reproduced with this commit.

The error is fixed and the delta test (and all others except for logging) is successful with this commit. This fix is only temporary and cannot be merged in it's current form since it breaks the logging tests.

Analysis

The problem is caused by the following two properties that the operator always adds to spark-submit in order to support log aggregation with vector:

--conf spark.driver.userClassPathFirst=true
--conf spark.executor.userClassPathFirst=true

In addition, the user classpath is extended like this:

--conf spark.driver.userClassPath=/stackable/spark/extra-jars/*
--conf spark.executor.userClassPath=/stackable/spark/extra-jars/*

The contents of /stackable/spark/extra-jars/ is:

bash-4.4$ ls -l /stackable/spark/extra-jars/
total 1868
-rw-r--r-- 1 stackable stackable  126137 Feb 12 08:54 jackson-dataformat-xml-2.15.2.jar
-rw-r--r-- 1 stackable stackable  195909 Feb 12 08:54 stax2-api-4.2.1.jar
-rw-r--r-- 1 stackable stackable 1586395 Feb 12 08:54 woodstox-core-6.5.1.jar

Acceptance Criteria

Since this is an investigation ticket, the following outcomes are possible:

  • An integration test showcasing Stackable and Delta with PySpark and S3.
  • Updated operator documentation
  • An update to the Spark images to include Delta dependencies.
  • A new Spark image with with Delta dependencies.

Related PRs

Related Issues

Metadata

Metadata

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions