Skip to content

Conversation

@squito
Copy link
Contributor

@squito squito commented Jan 3, 2019

Spark always creates secure py4j connections between java and python,
but it also allows users to pass in their own connection. This ensures
that even passed in connections are secure.

Added test cases verifying the failure with a (mocked) insecure gateway.

This is closely related to SPARK-26019, but this entirely forbids the
insecure connection, rather than creating the "escape-hatch".

Spark always creates secure py4j connections between java and python,
but it also allows users to pass in their own connection. This ensures
that even passed in connections are secure.

Added test cases verifying the failure without the extra configuration.

For the tests, I added ways to create insecure gateways, but I tried to put in protections to make sure that wouldn't get used incorrectly.

This is closely related to SPARK-26019, but this entirely forbids the
insecure connection, rather than creating the "escape-hatch".
@squito
Copy link
Contributor Author

squito commented Jan 3, 2019

@HyukjinKwon @BryanCutler @holdenk @tgravescs @vanzin since you all looked at the related #23337, this is the same just without any way to allow the insecure connection

@squito
Copy link
Contributor Author

squito commented Jan 3, 2019

ps thanks @HyukjinKwon for the test refactoring, so much nicer & faster to run my pyspark tests locally now on the master branch :)

@SparkQA
Copy link

SparkQA commented Jan 3, 2019

Test build #100710 has finished for PR 23441 at commit 8a74acf.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

if (sys.env.getOrElse("_PYSPARK_CREATE_INSECURE_GATEWAY", "0") != "1") {
builder.authToken(secret)
} else {
assert(sys.env.getOrElse("SPARK_TESTING", "0") == "1",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Utils.isTesting?

def test_forbid_insecure_gateway(self):
# Fail immediately if you try to create a SparkContext
# with an insecure gateway
gateway = _launch_gateway(insecure=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test seems to be just for the check you're adding in the SparkContext python class, right?

So instead of this insecure flag, couldn't you just instantiate a dummy insecure gateway? Then the Java and Python code could be a little simpler and avoid the test-only checks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh great point, that simplified this code a ton.

@SparkQA
Copy link

SparkQA commented Jan 4, 2019

Test build #100711 has finished for PR 23441 at commit ee6343c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 4, 2019

Test build #100717 has finished for PR 23441 at commit 2d40cba.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 4, 2019

Test build #100719 has finished for PR 23441 at commit 3d925ac.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 4, 2019

Test build #100720 has finished for PR 23441 at commit 4edc4b4.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

Looks good otherwise.

@SparkQA
Copy link

SparkQA commented Jan 4, 2019

Test build #100738 has finished for PR 23441 at commit 3ce8a98.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

retest this please

# with an insecure gateway
parameters = namedtuple('MockGatewayParameters', 'auth_token')(None)
mock_insecure_gateway = namedtuple('MockJavaGateway', 'gateway_parameters')(parameters)
with self.assertRaises(Exception) as context:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I would cache ValueError

@SparkQA
Copy link

SparkQA commented Jan 7, 2019

Test build #100857 has finished for PR 23441 at commit 3ce8a98.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 7, 2019

Test build #100905 has finished for PR 23441 at commit 6e4f236.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@BryanCutler BryanCutler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@asfgit asfgit closed this in 32515d2 Jan 8, 2019
@BryanCutler
Copy link
Member

merged to master, thanks @squito !

jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
Spark always creates secure py4j connections between java and python,
but it also allows users to pass in their own connection. This ensures
that even passed in connections are secure.

Added test cases verifying the failure with a (mocked) insecure gateway.

This is closely related to SPARK-26019, but this entirely forbids the
insecure connection, rather than creating the "escape-hatch".

Closes apache#23441 from squito/SPARK-26349.

Authored-by: Imran Rashid <irashid@cloudera.com>
Signed-off-by: Bryan Cutler <cutlerb@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants