-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-26349][PYSPARK] Forbid insecure py4j gateways #23441
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Spark always creates secure py4j connections between java and python, but it also allows users to pass in their own connection. This ensures that even passed in connections are secure. Added test cases verifying the failure without the extra configuration. For the tests, I added ways to create insecure gateways, but I tried to put in protections to make sure that wouldn't get used incorrectly. This is closely related to SPARK-26019, but this entirely forbids the insecure connection, rather than creating the "escape-hatch".
|
@HyukjinKwon @BryanCutler @holdenk @tgravescs @vanzin since you all looked at the related #23337, this is the same just without any way to allow the insecure connection |
|
ps thanks @HyukjinKwon for the test refactoring, so much nicer & faster to run my pyspark tests locally now on the master branch :) |
|
Test build #100710 has finished for PR 23441 at commit
|
| if (sys.env.getOrElse("_PYSPARK_CREATE_INSECURE_GATEWAY", "0") != "1") { | ||
| builder.authToken(secret) | ||
| } else { | ||
| assert(sys.env.getOrElse("SPARK_TESTING", "0") == "1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Utils.isTesting?
python/pyspark/tests/test_context.py
Outdated
| def test_forbid_insecure_gateway(self): | ||
| # Fail immediately if you try to create a SparkContext | ||
| # with an insecure gateway | ||
| gateway = _launch_gateway(insecure=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test seems to be just for the check you're adding in the SparkContext python class, right?
So instead of this insecure flag, couldn't you just instantiate a dummy insecure gateway? Then the Java and Python code could be a little simpler and avoid the test-only checks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh great point, that simplified this code a ton.
|
Test build #100711 has finished for PR 23441 at commit
|
|
Test build #100717 has finished for PR 23441 at commit
|
|
Test build #100719 has finished for PR 23441 at commit
|
|
Test build #100720 has finished for PR 23441 at commit
|
|
Looks good otherwise. |
|
Test build #100738 has finished for PR 23441 at commit
|
|
retest this please |
python/pyspark/tests/test_context.py
Outdated
| # with an insecure gateway | ||
| parameters = namedtuple('MockGatewayParameters', 'auth_token')(None) | ||
| mock_insecure_gateway = namedtuple('MockJavaGateway', 'gateway_parameters')(parameters) | ||
| with self.assertRaises(Exception) as context: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I would cache ValueError
|
Test build #100857 has finished for PR 23441 at commit
|
|
Test build #100905 has finished for PR 23441 at commit
|
BryanCutler
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
merged to master, thanks @squito ! |
Spark always creates secure py4j connections between java and python, but it also allows users to pass in their own connection. This ensures that even passed in connections are secure. Added test cases verifying the failure with a (mocked) insecure gateway. This is closely related to SPARK-26019, but this entirely forbids the insecure connection, rather than creating the "escape-hatch". Closes apache#23441 from squito/SPARK-26349. Authored-by: Imran Rashid <irashid@cloudera.com> Signed-off-by: Bryan Cutler <cutlerb@gmail.com>
Spark always creates secure py4j connections between java and python,
but it also allows users to pass in their own connection. This ensures
that even passed in connections are secure.
Added test cases verifying the failure with a (mocked) insecure gateway.
This is closely related to SPARK-26019, but this entirely forbids the
insecure connection, rather than creating the "escape-hatch".