Skip to content

Commit

Permalink
[pip][design] PIP 289: Secure Pulsar Connector Configuration (apache#…
Browse files Browse the repository at this point in the history
  • Loading branch information
michaeljmarshall authored Aug 16, 2023
1 parent d6734b7 commit d014713
Showing 1 changed file with 242 additions and 0 deletions.
242 changes: 242 additions & 0 deletions pip/pip-289.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,242 @@
# PIP 289: Secure Pulsar Connector Configuration
# Background knowledge

Pulsar Sinks and Sources (a.k.a. Connectors) allow you to move data from a remote system into and out of a Pulsar cluster. These remote systems often require authentication, which requires secret management.

The current state of Pulsar Connector secret management is fragmented, is not documented in the "Pulsar IO" docs, and is not possible in certain cases. This PIP aims to address these issues through several changes.

The easiest way to show the current short comings is by way of example.

## Elasticsearch Example
Here is the current way to deploy an Elasticsearch Sink without the use of plaintext secrets:

```shell
$ bin/pulsar-admin sinks create \
--tenant public \
--namespace default \
--sink-type elastic_search \
--name elasticsearch-test-sink \
--sink-config '{"elasticSearchUrl":"http://localhost:9200","indexName": "my_index"}' \
--secrets '{"username": {"MY-K8S-SECRET-USERNAME": "secret-name"},"password": {"MY-K8S-SECRET-PASSWORD": "password123"}}'
--inputs elasticsearch_test
```

When run targetting Kubernetes, the above works by mounting secrets `MY-K8S-SECRET-USERNAME` and `MY-K8S-SECRET-PASSWORD` into the sink pod container as [environment variables](https://github.com/apache/pulsar/blob/82237d3684fe506bcb6426b3b23f413422e6e4fb/pulsar-functions/secrets/src/main/java/org/apache/pulsar/functions/secretsproviderconfigurator/KubernetesSecretsProviderConfigurator.java#L85-L99):

```shell
username=secret-name
password=password123
```

Those environment variables are then [injected](https://github.com/apache/pulsar/blob/674655347da95305cf671f0696f113dcca88b44d/pulsar-io/common/src/main/java/org/apache/pulsar/io/common/IOConfigUtils.java#L67-L78) into the config when it is loaded at runtime based on [annotations](https://github.com/apache/pulsar/blob/b7eab9469177eda2c56e36bb9871aab48a17d4ec/pulsar-io/elastic-search/src/main/java/org/apache/pulsar/io/elasticsearch/ElasticSearchConfig.java#L99-L113) on the `ElasticSearchConfig`.

### Problem

The annotation approach, which is the only way to inject secrets into connectors, requires that all secret fields are annotated with `sensitive = true` and that all secret fields are at the top level of their configuration class. However, the Elasticsearch config contains an `ssl` field that has nested secrets. See:

```json
{
"elasticSearchUrl": "http://localhost:9200",
"indexName": "my_index",
"username": "username",
"password": "password",
"ssl": {
"enabled": true,
"truststorePath": "/pulsar/security/truststore.jks",
"truststorePassword": "truststorepass",
"keystorePath": "/pulsar/security/keystore.jks",
"keystorePassword": "keystorepass"
}
}
```

Because `truststorePassword` and `keystorePassword` are not at the top level, we do not currently have a secure way (i.e. non-plaintext) to configure those settings.

## RabbitMQ Example

Another relevant example shows how the Pulsar code base has not consistently implemented secret management for connectors. For the RabbitMQ Sink, the sensitive fields are [annotated correctly](https://github.com/apache/pulsar/blob/82237d3684fe506bcb6426b3b23f413422e6e4fb/pulsar-io/rabbitmq/src/main/java/org/apache/pulsar/io/rabbitmq/RabbitMQAbstractConfig.java#L61-L73), but the configuration is not loaded via the `IOConfigUtils#loadWithSecrets` method, which means the only way to load rabbit secrets is as plaintext values in the config.

## Kafka Connect Adapter Example

The final relevant example is the Kafka Connect Adapter. This adapter allows you to run Kafka Connectors in Pulsar Connectors. Because of the recursive nature of these connectors, the configuration for the wrapped connector is stored in a map named [kafkaConnectorConfigProperties](https://github.com/apache/pulsar/blob/55523ac8f31fd6d54aacba326edef1f53028877e/pulsar-io/kafka-connect-adaptor/src/main/java/org/apache/pulsar/io/kafka/connect/PulsarKafkaConnectSinkConfig.java#L59-L62). Because this field is an arbitrary map, we cannot rely on the Pulsar `sensitive` annotation flag to determine whether to load the secret when building the config class.

# Motivation

Increase Pulsar Function security by giving users a way to configure Pulsar Connectors with non-plaintext secrets.

The recent [CVE-2023-37579](https://github.com/apache/pulsar/wiki/CVE%E2%80%902023%E2%80%9037579) resulted in the potential to leak connector configurations. Because we do not always provide a way to configure connector configuration in the connector's secrets map, leaking the configuration meant leaking secrets.

# Goals

## In Scope

* Provide users with a secure way to configure official Pulsar Connectors as well as third party connectors.
* Improve documentation to reflect the current state of secrets management in Pulsar Connectors.
* Only sinks and sources will benefit from this change.
* Only the `JavaInstanceRunnable` class will benefit from this change.

## Out of Scope

* This PIP will not prevent users from configuring secrets via insecure methods, such as plaintext configuration.
* Functions are out of scope because they do not need arbitrary secret injection. Functions can already access secrets through the `Context#getSecret` method.
* Python and Go Function Runtimes--sinks and sources are not typically written in these languages.

# High Level Design

* Add a new secrets injection mechanism which allows for arbitrary secret injection into the connector configuration at runtime.
* Update existing, official connectors to properly use the already available secret injection mechanism.
* Fix the documentation for the existing secrets management methods.

# Detailed Design

## Design & Implementation Details

In order to add a new way to inject, or interpolate, secrets, we need to add a new method to the `SecretsProvider` interface, which can be implemented by users, but is not exposed to function/connector runtimes. This new method will be used to first determine if a secret should be interpolated for a given value, and if so, return the interpolated value. If the value is not a secret, or the secret does not exist, the method will return `null` and no interpolation will occur. The notable difference for this method is that it does not have a "path" to the secret. Therefore, the existing `secrets` map might not apply for certain use cases. In the environment variable scenario, this is a natural fit because the `value` can be interpreted as the name of the environment variable. For usage of the new configuration mechanism, see the [cli](#cli) section.

In the event of a value collision between the old way and this new way to inject secrets, the old way will take precedence.

In order to add support for the existing `sensitive` annotation, I propose fixing all the connectors that have explicit secrets in their configurations.

Fixing the documentation will be a matter of updating the existing documentation to reflect the current state of the code.

## Public-facing Changes

### Public API

#### Add new method to SecretsProvider Interface

Add the following method to the `SecretsProvider` interface:

```java
interface SecretsProvider {
/**
* If the passed value is formatted as a reference to a secret, as defined by the implementation, return the
* referenced secret. If the value is not formatted as a secret reference or the referenced secret does not exist,
* return null.
*
* @param value a config value that may be formatted as a reference to a secret
* @return the materialized secret. Otherwise, null.
*/
default String interpolateSecretForValue(String value) {
return null;
}
}
```

There are only two official implementations of the `SecretProvider` interface. The `ClearTextSecretsProvider` and the `EnvironmentBasedSecretsProvider`. Given that the `ClearTextSecretsProvider` is only plaintext, it will not override the new method. Here is the proposed implementation for the `EnvironmentBasedSecretsProvider`:

```java
public class EnvironmentBasedSecretsProvider implements SecretsProvider {
/**
* Pattern to match ${secretName} in the value.
*/
private static final Pattern interpolationPattern = Pattern.compile("\\$\\{(.+?)}");

@Override
public String interpolateSecretForValue(String value) {
Matcher m = interpolationPattern.matcher(value);
if (m.matches()) {
String secretName = m.group(1);
// If the secret doesn't exist, we return null and don't override the current value.
return provideSecret(secretName, null);
}
return null;
}
}
```

### Binary protocol

No change.

### Configuration

There is no new configuration for this change. It is always enabled.

### CLI

* Here is the new way that users will map secrets into nested configs:

```bash
$ bin/pulsar-admin sinks create \
--tenant public \
--namespace default \
--sink-type elastic_search \
--name elasticsearch-test-sink \
--sink-config '{
"elasticSearchUrl": "http://localhost:9200",
"indexName": "my_index",
"username": "${username}",
"password": "${password}",
"ssl": {
"enabled": true,
"truststorePath": "/pulsar/security/truststore.jks",
"truststorePassword": "${truststorepass}",
"keystorePath": "/pulsar/security/keystore.jks",
"keystorePassword": "${keystorePassword}"
}' \
--secrets '{"username": {"MY-K8S-SECRET-USERNAME": "secret-name"},"password": {"MY-K8S-SECRET-PASSWORD": "password123"},"keystorePassword": {"MY-K8S-KEYSTORE-PASS": "xyz"},"truststorepass": {"MY-K8S-TRUSTSTORE-PASS": "abc"}}'
--inputs elasticsearch_test
```

### Metrics

No new metrics are added by this change.

# Monitoring

Not applicable.

# Security Considerations

The primary security consideration is whether there is any risk in giving users a way to interpolate environment variables into their connector. This change only affects the `EnvironmentBasedSecretsProvider`, which is only used by the Kubernetes Function runtime. As such, there are no environment variables to leak. Further, all connectors have access to their environment variables, so no additional risk is present.

# Backward & Forward Compatibility

## Revert

Reverting this change is as simple as downgrading the function worker and stopping then starting the function.

## Upgrade

Upgrade by upgrading the function worker and stopping then starting the function. Also, the user will need to update their connector configuration to use the new syntax.

# Alternatives

While exploring this PIP, I considered several alternatives.

### Merge Secret Map into Config Map

Attempt to merge all secrets configured for the connector into the connector's configuration. See https://github.com/apache/pulsar/pull/20863 for an example of this approach.
The primary issue with this design is the fact that the secrets map configured for a connector is of type `Map<String, Object>` where the keys are meant to be top level fields in the connector configuration and the values are paths to the secrets. As such, we cannot use the secrets map to recursively inject secrets into the config, which is a requirement for some connectors.
### Directly Inject Secrets into Config Map Based on Value Prefix
We could consider interpreting configuration values that start with a well known prefix, like `env:`, as values that need to be read from the environment. The primary drawback to this solution is that there is not an easy way to configure the function at this point in the code, which means that it is always on.
This solution would look something like adding this code block
```java
// Replace environment variable pointers with their environment variable values
for (Map.Entry<String, Object> entry : config.entrySet()) {
if (entry.getValue() instanceof String && ((String) entry.getValue()).toLowerCase().startsWith("env:")) {
String envVariableName = ((String) entry.getValue()).substring("env:".length());
String envVariableValue = System.getenv(envVariableName);
entry.setValue(envVariableValue);
}
}
```
to this method: https://github.com/apache/pulsar/blob/f7c0b3c49c9ad8c28d0b00aa30d727850eb8bc04/pulsar-functions/instance/src/main/java/org/apache/pulsar/functions/instance/JavaInstanceRunnable.java#L884-L929.
# General Notes
# Links
* Initial Issue exploring this feature: https://github.com/apache/pulsar/issues/20862
* PR for new interpolation feature: https://github.com/apache/pulsar/pull/20901
* PR for correcting `sensitive` annotation flag handling: https://github.com/apache/pulsar/pull/20902
* Rejected PR for merging secrets map into config map: https://github.com/apache/pulsar/pull/20863
* Mailing List discussion thread: https://lists.apache.org/thread/xdmhp6zpwto2dyrf1xwk7fhd2cr69xtn
* Mailing List voting thread: https://lists.apache.org/thread/ww88z811bpnzpcdf8popvg4njn6d07jt

0 comments on commit d014713

Please sign in to comment.