Skip to content
This repository was archived by the owner on Dec 4, 2024. It is now read-only.

[SPARK-542] Add and document Driver<->Executor TLS support. #188

Merged
merged 12 commits into from
Nov 22, 2017

Conversation

ArtRand
Copy link
Contributor

@ArtRand ArtRand commented Sep 22, 2017

TLS in Spark

This PR is about 2 things.

  1. Update documentation and expectations surrounding TLS in Spark right now.
  2. Propose a temporary solution.

How it works

Spark has native support for TLS with java KeyStores. It requires the KeyStore and optionally the TrustStore (hereafter referred to as Stores) to be in the sandbox of the Driver and the Executors.

The solution implemented here uses environment based secrets. The user base64 encodes their Stores and puts them in the DC/OS secret store. Then you use add the appropriate configuration (either on the command line or with a properties file) and the base64 string is injected in the Driver and executor environments. The pre-start script spark-env.sh notices these environment variables and decodes them to files.

The question.

This solution is similar to the older Kerberos implementation, which is why I asked @triclambert and @vishnu2kmohan to review this PR. My question is: is this solution inherently insecure and thus we should not even allow/advise customers to use it? Is exposing the Stores files in the sandbox a common practice?

The better alternative is to use file-based secrets as we do for Kerberos. Unfortunately, the MesosCoarseGrainedSchedulerBackend (the Driver) does not have secrets support which it needs to to start the executors with secrets. I recently added secret support to the Dispatcher (in Spark) and it would be a trivial change to add it to the Driver also (but would require the ASF review process).

Thanks for the input.

else
BASE64_D="base64 -D" # BSD
fi

Copy link
Contributor

@skonto skonto Sep 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@skonto
Copy link
Contributor

skonto commented Sep 25, 2017

@ArtRand I think sandbox access is protected, yet depending on different envs it would be good to support both options. For example with DC/OS you have better handling of env secrets using the secret store. Pure mesos does not provide such support out of the box. Would it be possible that the end-user would like to store all credentials in his key store? Hope I don't miss anything here.

Copy link
Contributor

@sascala sascala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

little edits

docs/security.md Outdated
@@ -37,26 +40,35 @@ Both stores must be base64 encoded, for example:

**Note:** The base64 string of the keystore will probably be much longer than the snippet above, spanning 50 lines or so.

With this and the password `secret` for the keystore and the private key, your JSON options file will look like this:
Add the stores to your secrets in the DC/OS Secret store, for example if your base64 encoded keystores and truststores are server.jks.base64 and trust.jks.base64, respectively then do the following:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add the stores to your secrets in the DC/OS secret store. For example, if your base64-encoded keystores and truststores are server.jks.base64 and trust.jks.base64, respectively, then use the following commands to add them to the secret store:

docs/security.md Outdated
}
}
}
In this case you're adding two secrets `/truststore` and `/keystore` that you will need to pass to the Spark Driver and Executors. You will need to add the following configurations to your `dcos spark run ` command:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, you are adding....

You must add the following configurations...

docs/security.md Outdated

Make sure to connect the DC/OS cluster only using an SSL connection (i.e. by using an `https://<dcos-url>`). Use the following command to set your DC/OS URL:
Importantly the `spark.mesos.driver.labels` and `spark.mesos.task.labels` must be set as shown. If you upload your secret with another path (e.g. not `/keystore` and `/truststore`) then change the `name` in the value accordingly. Lastly, `spark.mesos.task.lables` must have the `DCOS_SPACE:<dcos_space>` label as well, to have access to the secret. See the [Secrets Documentation about SPACES][13] for more details about Spaces, but usually you want `/spark` as shown.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: The spark.mesos.driver.labels and spark.mesos.task.labels must be set as shown. If you upload your secret with another path (e.g. not /keystore and /truststore), then change the name in the value accordingly. Lastly, spark.mesos.task.lables must have also have the DCOS_SPACE:<dcos_space> label in order to access the secret. See the [Secrets Documentation about spaces][13] for more details about spaces. Usually, you will want to set the space label to /spark, as shown.

Copy link
Contributor

@susanxhuynh susanxhuynh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @ArtRand Should we add a regression test for this?

docs/security.md Outdated

[11]: https://docs.mesosphere.com/1.9/overview/architecture/components/
[12]: http://docs.oracle.com/javase/8/docs/technotes/tools/unix/keytool.html
[13]: https://docs.mesosphere.com/service-docs/spark/v2.0.1-2.2.0-1/run-job/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As referenced above, this link is supposed to point to "Secrets Documentation about SPACES".

docs/security.md Outdated
| Variable | Description |
|----------------------------------|-------------------------------------------------|
| `spark.ssl.enabled` | Whether to enable SSL (default: `false`). |
| `spark.ssl.keyStoreBase64` | Base64 encoded blob containing a Java keystore. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is not required anymore, if using secrets.

docs/security.md Outdated
--conf spark.ssl.trustStore=trust.jks \ # this MUST be set this way
--conf spark.ssl.trustStorePassword=<truststore password> \
--conf spark.mesos.driver.labels=DCOS_SECRETS_DIRECTIVE:[{\"name\"\:\"/keystore\"\,\"type\"\:\"ENVIRONMENT\"\,\"environment\"\:{\"name\"\:\"KEYSTORE_BASE64\"}}\,{\"name\"\:\"/truststore\"\,\"type\"\:\"ENVIRONMENT\"\,\"environment\"\:{\"name\"\:\"TRUSTSTORE_BASE64\"}}] \
--conf spark.mesos.task.labels=DCOS_SECRETS_DIRECTIVE:[{\"name\"\:\"/keystore\"\,\"type\"\:\"ENVIRONMENT\"\,\"environment\"\:{\"name\"\:\"KEYSTORE_BASE64\"}}\,{\"name\"\:\"/truststore\"\,\"type\"\:\"ENVIRONMENT\"\,\"environment\"\:{\"name\"\:\"TRUSTSTORE_BASE64\"}}],DCOS_SPACE:/spark, \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a pretty long command. Did you consider setting some of these configs automatically in the CLI? Or, do you want the user to be aware of each config being set?

echo "spark-env: Copying krb config from $KRB5_CONFIG_BASE64 to /etc/" >&2
echo "${KRB5_CONFIG_BASE64}" | ${BASE64_D} > /etc/krb5.conf
else
echo "spark-env: No kerberos KDC config found" >&2
fi

if [[ -n "${KEYSTORE_BASE64}" ]]; then
echo "spark-env: Copying keystore to SANDBOX" >&2
echo "${KEYSTORE_BASE64}" | ${BASE64_D} > ${MESOS_SANDBOX}/server.jks

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest using file based secrets that are more secure than writing contents of keystore to mesos sandbox.

@susanxhuynh
Copy link
Contributor

This will be replaced with file-based secrets.

@susanxhuynh
Copy link
Contributor

I'm going to reopen this PR and replace the env-based secrets with file-based secrets.

@susanxhuynh susanxhuynh reopened this Oct 9, 2017
@susanxhuynh susanxhuynh force-pushed the spark-542-tls-support-and-docs branch from ca8ae07 to 2a7a40a Compare October 15, 2017 19:52
@susanxhuynh
Copy link
Contributor

I have replaced the env-based secrets with file-based secrets, pending this Apache Spark PR: apache/spark#19437
This PR contains the related documentation and an integration test.
cc @ArtRand @skonto @mhrabovcin @sascala

Copy link
Contributor Author

@ArtRand ArtRand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few nits. Thanks!

truststore_path = os.path.join(resources_folder, '{}.base64'.format(truststore_file))
keystore_secret = '__dcos_base64__keystore'
truststore_secret = '__dcos_base64__truststore'
shakedown.run_dcos_command('security secrets create /{} --value-file {}'.format(keystore_secret, keystore_path))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how come this works now? I remember having to jump through some hoops to get the secrets API to work (https://github.com/mesosphere/spark-build/blob/master/tests/utils.py#L179). Could you homogenize the way secrets are added in the tests by either using the SecretsHandler or changing/removing it?

@@ -318,6 +318,53 @@ def test_secrets():
LOGGER.warn("Error when deleting secret, {}".format(r.content))


# Skip DC/OS < 1.10, because it doesn't have support for file-based secrets.
@pytest.mark.skip("Enable when SPARK-22131 is merged and released in DC/OS Spark")
Copy link
Contributor Author

@ArtRand ArtRand Oct 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume you've tested this locally. How about we test with a specific docker image (that has the patch, built from master) instead of the universe package. I don't like the idea of merging "skipped tests"

docs/security.md Outdated
--conf spark.ssl.protocol=TLS \
--conf spark.ssl.trustStore=trust.jks \ # this MUST be set this way
--conf spark.ssl.trustStorePassword=<truststore password> \
--conf spark.mesos.driver.secret.names=__dcos_base64__keystore,__dcos_base64__truststore \
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@keithchambers should probably review this. We could add some sugar to the CLI so these extra configs aren't required (like we do with Kerberos), but it becomes the same thing.

docs/security.md Outdated
--conf spark.mesos.driver.secret.filenames=server.jks,trust.jks \
--conf spark.mesos.executor.secret.names=__dcos_base64__keystore,__dcos_base64__truststore \
--conf spark.mesos.executor.secret.filenames=server.jks,trust.jks \
--conf spark.mesos.task.labels=DCOS_SPACE:/spark \
Copy link
Contributor Author

@ArtRand ArtRand Oct 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should probably add this via the CLI automatically.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be more intuitive for the secret:file pairs to be together?
e.g., conf spark.mesos.executor.secrets=secret1:file1,secret2:file2

@susanxhuynh susanxhuynh force-pushed the spark-542-tls-support-and-docs branch from 24d75c8 to e223ddf Compare November 2, 2017 18:56
docs/security.md Outdated
}
In this case, you are adding two secrets `/truststore` and `/keystore`.
You must add the following configurations to your `dcos spark run ` command.
The ones in parentheses are optional.:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, @keithchambers . Actually, that was the old verbose command. Here is the simplified one. What do you think? --

@susanxhuynh susanxhuynh force-pushed the spark-542-tls-support-and-docs branch 2 times, most recently from 6c5246b to e7b869b Compare November 20, 2017 19:07
@susanxhuynh
Copy link
Contributor

susanxhuynh commented Nov 20, 2017

@ArtRand I've simplified the interface a bit. There are only three required properties now, and some optional ones. PTAL. cc @skonto

paths = append(paths, args.truststoreSecretPath)
filenames = append(filenames, trustStoreFileName)
}
args.properties["spark.mesos.driver.secret.names"] = strings.Join(paths, ",")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will this overwrite someone's specified secrets if they've already specified some?
i.e.

docs spark run --submit-args="\
...
--conf spark.mesos.driver.secret.names=awsaccess \
--conf spark.mesos.driver.secret.filenames=awsaccess \
...

@@ -43,6 +43,8 @@ type sparkArgs struct {
keytabSecretPath string
tgtSecretPath string
tgtSecretValue string
keystoreSecretPath string
Copy link
Contributor Author

@ArtRand ArtRand Nov 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about adding a password because the other "magic" configs are here also?

// Make sure passwords are set
if _, ok := args.properties["spark.ssl.keyPassword"]; !ok {
return "", errors.New("Need to provide key password with keystore: " +
"--conf spark.ssl.keyPassword=<password>")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, I think we should add a DC/OS specific flag here otherwise your TLS args are both in the DC/OS-specific args and the submit-args

docs/security.md Outdated
| Variable | Description | Default Value |
|----------------------------------|-------------------------------------------------|---------------|
| `spark.ssl.enabledAlgorithms` | Allowed cyphers | JVM defaults |
| `spark.ssl.keyPassword` | The password for the private key | None |
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change default value to REQUIRED? @sascala ?

@susanxhuynh susanxhuynh force-pushed the spark-542-tls-support-and-docs branch from 904a4a0 to 17bccac Compare November 22, 2017 03:30
@susanxhuynh
Copy link
Contributor

@ArtRand

  • Decided to go with "magic" options for passwords. I figured since any user will have to figure out our DC/OS-specific --keystore-secret-path option, it's not too much trouble to add the password ones also.
  • Made sure any additional user-defined secrets are preserved, added placeholders for env-based secrets.
  • Waiting for tests to pass one last time.

if args.keystorePassword == "" || args.privateKeyPassword == "" {
return "", errors.New("Need to provide keystore password and key password with keystore")
} else {
args.properties["spark.ssl.keyStorePassword"] = args.keystorePassword
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit/question: any reason the password configs are set here instead of in `setupTLSArgs?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No particular reason. I'll move the setting into setupTLSArgs and just leave the validation check here.

Then calculates the value of pi.
"""

check_secret(sys.argv[2], sys.argv[3])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this also check the secret in the executors or just the driver?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't bother adding a test for executor secrets because the CLI code (wrt TLS) is the same for driver and executor secrets.


dcos package install --options=options.json spark
**Note:** If you have specified a space for your secrets other than the default value,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can specify this with a "magic" flag? d2iq-archive/dcos-commons#1872

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which magic flag? The link ^^ doesn't seem relevant?

Make sure to connect the DC/OS cluster only using an SSL connection (i.e. by using an `https://<dcos-url>`). Use the following command to set your DC/OS URL:
**Note:** If you specify environment-based secrets with `spark.mesos.[driver|executor].secret.envkeys`,
the keystore and truststore secrets will also show up as environment-based secrets,
due to the way secrets are implemented. You can ignore these extra environment variables.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should also be mentioned in Limitations (https://docs.mesosphere.com/service-docs/spark/2.1.0-2.2.0-1/limitations/)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding ...

@@ -286,5 +290,51 @@ def test_cli_multiple_spaces():
" --class ", "org.apache.spark.examples.SparkPi"])


# Skip DC/OS < 1.10, because it doesn't have support for file-based secrets.
@pytest.mark.skipif('shakedown.dcos_version_less_than("1.10")')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to skip on open also.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

# Skip DC/OS < 1.10, because it doesn't have support for file-based secrets.
@pytest.mark.skipif('shakedown.dcos_version_less_than("1.10")')
@pytest.mark.sanity
def test_driver_executor_tls():
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we make the job rely on a file being served by the driver (i.e. the channel effected by TLS)? This may be more work than necessary because we'd need to enable/check the --files flag.

for _, taskType := range taskTypes {
appendToProperty(fmt.Sprintf("spark.mesos.%s.secret.names", taskType), joinedPaths, args)
appendToProperty(fmt.Sprintf("spark.mesos.%s.secret.filenames", taskType), joinedFilenames, args)
appendToPropertyIfSet(fmt.Sprintf("spark.mesos.%s.secret.envkeys", taskType), joinedEnvkeys, args)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


def check_secret(secret_name, secret_content):
'''
Make sure the extra secret envvar and secret file show up in driver.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check driver and executor?

@susanxhuynh susanxhuynh merged commit 022f332 into master Nov 22, 2017
@susanxhuynh susanxhuynh deleted the spark-542-tls-support-and-docs branch November 22, 2017 21:13
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants