Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add common TLS configuration #1838

Merged
merged 4 commits into from
Jan 28, 2020
Merged

Conversation

backjo
Copy link
Contributor

@backjo backjo commented Oct 7, 2019

Signed-off-by: Jonah Back jonah@jonahback.com

Closes #1837

Which problem is this PR solving?

  • Makes TLS configuration standard across various storage systems

Short description of the changes

  • Moves TLS configurations into the tls package
  • Refactors cassandra/kafka/elasticsearch backends to use the common configuration

@backjo
Copy link
Contributor Author

backjo commented Oct 7, 2019

This change is Reviewable

@yurishkuro
Copy link
Member

@backjo Thanks!

I was also working on this on the plane, see #1840. It is slightly more configurable than your version, but so far I only switched gRPC client/server configs to it. Let me get it merged, and then you should be able to use that for storage configs.

NB We will need to add SkipHostVerify as another configurable option.

@backjo
Copy link
Contributor Author

backjo commented Oct 7, 2019

Sounds good 👍

@yurishkuro
Copy link
Member

@backjo #1840 was merged, do you want to try to integrate it into this PR?

@backjo
Copy link
Contributor Author

backjo commented Oct 8, 2019

@yurishkuro yeah, I'll try to integrate it in tonight

Copy link
Member

@pavolloffay pavolloffay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plugin/storage/es/options_test.go Show resolved Hide resolved
pkg/config/tls/tls.go Outdated Show resolved Hide resolved
@pavolloffay
Copy link
Member

The problem with current TLS options is that due to --es.tls: true and other embedded options, the config cannot be defined in JSON. In yaml there is a workaround:

es:
    tls: true
es.tls.cert: "/foo/bar"

So let's deprecate the --es.tls and if it is used use it but create a warning message. The tls package should not add deprecated enabled flag for new components (e.g tls for kafka)

pkg/config/tls/tls.go Outdated Show resolved Hide resolved
@backjo
Copy link
Contributor Author

backjo commented Oct 9, 2019

Yeah, this isn't ready for review yet with your changes - let me update when it is

@yurishkuro
Copy link
Member

@backjo do you plan to continue this PR?

@backjo
Copy link
Contributor Author

backjo commented Oct 16, 2019

yep - will try to push what I have locally tonight - sorry for the delay!

somnathnitb pushed a commit to somnathnitb/jaeger that referenced this pull request Oct 17, 2019
…kend (jaegertracing#1845)

* Revert Support for TLS as it will handled with issue jaegertracing#1838

Signed-off-by: somnath <somnath.ghosh@honeywell.com>
@codecov
Copy link

codecov bot commented Oct 18, 2019

Codecov Report

Merging #1838 into master will decrease coverage by 0.02%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1838      +/-   ##
==========================================
- Coverage   97.45%   97.42%   -0.03%     
==========================================
  Files         207      207              
  Lines       10314    10286      -28     
==========================================
- Hits        10051    10021      -30     
  Misses        221      221              
- Partials       42       44       +2
Impacted Files Coverage Δ
cmd/agent/app/reporter/grpc/flags.go 100% <ø> (ø) ⬆️
cmd/collector/app/builder/builder_flags.go 100% <ø> (ø) ⬆️
plugin/storage/es/options.go 100% <100%> (ø) ⬆️
pkg/config/tlscfg/options.go 100% <100%> (ø) ⬆️
plugin/storage/cassandra/options.go 100% <100%> (ø) ⬆️
pkg/config/tlscfg/flags.go 100% <100%> (ø) ⬆️
cmd/query/app/static_handler.go 86.84% <0%> (-1.76%) ⬇️
plugin/storage/badger/spanstore/reader.go 96.79% <0%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4b58b94...4863a22. Read the comment docs.

@backjo backjo closed this Oct 18, 2019
@backjo backjo reopened this Oct 18, 2019
@backjo
Copy link
Contributor Author

backjo commented Oct 18, 2019

@yurishkuro @pavolloffay I've updated this and I believe it's ready for review.

Summary of changes:

  1. Move Kafka/ES/C* TLS setup and configuration to use the shared TLS config.
  2. Added SkipHostVerify as a TLS option
  3. Added a tls.enabled: true flag that is the recommended flag for enabling tls. tls: true is now marked as deprecated.

@backjo backjo requested a review from pavolloffay October 18, 2019 23:13
pavolloffay
pavolloffay previously approved these changes Oct 21, 2019
@backjo
Copy link
Contributor Author

backjo commented Oct 23, 2019

@yurishkuro mind taking a look?

@yurishkuro
Copy link
Member

I'll take a look, most likely over the weekend, a bit swamped this week

Copy link
Member

@yurishkuro yurishkuro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great overall, +132 −256 is nice to see.

Could you please attach the diffs between before and after output of the help commands with different storage types for agent/collector/query/ingester (we may actually want to script it for future use)?

e.g. SPAN_STORAGE_TYPE=kafka go run ./cmd/collector help > collector-kafka.txt

This way it makes it much easier to look at the changes being introduced to the flags across the board.

@@ -73,13 +79,18 @@ func (c ClientFlagsConfig) InitFromViper(v *viper.Viper) Options {
var p Options
if c.ShowEnabled {
p.Enabled = v.GetBool(c.Prefix + tlsEnabled)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blank line not necessary

@@ -88,6 +99,10 @@ func (c ServerFlagsConfig) InitFromViper(v *viper.Viper) Options {
var p Options
if c.ShowEnabled {
p.Enabled = v.GetBool(c.Prefix + tlsEnabled)

Copy link
Member

@yurishkuro yurishkuro Oct 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no blank line

return nil, err
}
httpTransport.TLSClientConfig = &tls.Config{RootCAs: ca}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what was this code doing? It seems deleting it will not preserve some functionality

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you're correct. adding it back.

config.TLS.KeyPath = v.GetString(configPrefix + tlsPrefix + suffixTLSKey)
var tlsClientConfig = tlscfg.ClientFlagsConfig{
Prefix: configPrefix,
ShowEnabled: true,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do kafka options actually need to show the enabled flag? There is a dedicated switch flag .authentication that indicates TLS or other auth types

EnableHostVerification: true,
TLS: tlscfg.Options{
Enabled: false,
SkipHostVerify: false,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is just setting zero values, can omit the TLS key completely

@@ -225,6 +203,11 @@ func (opt *Options) InitFromViper(v *viper.Viper) {
}

func (cfg *namespaceConfig) initFromViper(v *viper.Viper) {
var tlsFlagsConfig = tlscfg.ClientFlagsConfig{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please DRY it w/ L117, e.g. a helper function

func tlsFlagsConfig(namespace string) tlscfg.ClientFlagsConfig

@@ -112,6 +108,14 @@ func NewOptions(primaryNamespace string, otherNamespaces ...string) *Options {
return options
}

func (config *namespaceConfig) GetTLSFlagsConfig() *tlscfg.ClientFlagsConfig {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • does it need to be public?
  • it can return config by value, we're not using pointers anywhere

pkg/config/tlscfg/flags.go Show resolved Hide resolved
@pavolloffay pavolloffay mentioned this pull request Oct 25, 2019
4 tasks
@backjo backjo force-pushed the commonTlsConfig branch 3 times, most recently from e2dc8ec to 15650eb Compare December 19, 2019 03:03
@pavolloffay
Copy link
Member

Some of the flags look weird - two dots:

      --reporter.grpc..tls                                        (deprecated) see --reporter.grpc..tls.enabled
      --reporter.grpc..tls.ca string                              Path to a TLS CA (Certification Authority) file used to verify the remote server(s) (by default will use the system truststore)
      --reporter.grpc..tls.cert string                            Path to a TLS Certificate file, used to identify this process to the remote server(s)
      --reporter.grpc..tls.enabled                                Enable TLS when talking to the remote server(s)
      --reporter.grpc..tls.key string                             Path to a TLS Private Key file, used to identify this process to the remote server(s)
      --reporter.grpc..tls.server-name string                     Override the TLS server name we expect in the certificate of the remove server(s)
      --reporter.grpc..tls.skip-host-verify                       Skip TLS hostname verification

Copy link
Member

@pavolloffay pavolloffay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reporter grpc tls flags have two dots. At the moment it also breaks TLS handling in ES

@@ -301,17 +291,14 @@ func (c *Configuration) getConfigOptions(logger *zap.Logger) ([]elastic.ClientOp
}
} else {
httpTransport := &http.Transport{
Proxy: http.ProxyFromEnvironment,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert all the changes in this function except line 285. It breaks a couple of things e.g. 47d2029#diff-06fc7142fac7ac3ed9db42948f2558c8R306

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, i've got a TODO still to restore this

KeyPath string
ServerName string // only for client-side TLS config
ClientCAPath string // only for server-side TLS config for client auth
SkipHostVerify bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On line 50 add InsecureSkipVerify: p.SkipHostVerify,

ctls := &TLSConfig{CaPath: c.TLS.CaPath}
ca, err := ctls.loadCertificate()
if c.TLS.CAPath != "" {
config, err := c.TLS.Config()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we want to load only CA cert. Does it load only CA. What happens is client cert or key are not specified?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this does the right thing, although it will try to load SystemCertPool if CAPath==""

Copy link
Member

@yurishkuro yurishkuro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use ./scripts/generate-help-output.sh to compare flags between master and your branch, e.g. in master run

mkdir -p .tmp/v1
./scripts/generate-help-output.sh .tmp/v1

then in your branch

mkdir -p .tmp/v2
./scripts/generate-help-output.sh .tmp/v2

and then compare

diff -brw .tmp/v1 .tmp/v2

In particular, the following differences don't look correct:

>       --collector.grpc..tls.enabled                     Enable TLS on the server

ctls := &TLSConfig{CaPath: c.TLS.CaPath}
ca, err := ctls.loadCertificate()
if c.TLS.CAPath != "" {
config, err := c.TLS.Config()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this does the right thing, although it will try to load SystemCertPool if CAPath==""

plugin/storage/cassandra/options.go Show resolved Hide resolved
pkg/config/tlscfg/flags.go Outdated Show resolved Hide resolved
@yurishkuro
Copy link
Member

@backjo thanks for your patience, and sorry for the delay in reviewing. There are just a few small issues remaining:

(a) can you amend the commit b7d5be8 to include the sign-off?

(b) These flags are missing, need to restore and deprecate them:

       --cassandra-archive.tls.verify-host
       --cassandra.tls.verify-host

(c) sync with master

@backjo
Copy link
Contributor Author

backjo commented Jan 16, 2020

@yurishkuro will do - I think I'm going to rebase the changes into one commit and bring it current with master.

Re: verify-host - I'm a bit uncertain how we're going to handle conflicting behavior between the two, since the flags are inverted. Specifically, thinking about these cases:

  1. User does not set --cassandra.tls.verify-host, which implicitly sets it to false. This default behavior conflicts with the default behavior of skip-host-verification, which would be to implicitly enable host verification.
  2. User explicitly sets --cassandra.tls.verify-host to false. This could be backward-compatible / respected by using the IsSet call in Viper
  3. User explicitly sets --cassandra.tls.verify-host to true. This aligns with the default value of skip-host-verification so the two would be in alignment. If both were explicitly set for some reason, we would respect skip-host-verification.

The first case is a bit tricky - the most backward-compatible way would be to keep the host verification off by default - which would mean that users would need to explicitly enable hostname verification with the new property, until some time later on when we decide that we should enable the more secure option by default.

@yurishkuro
Copy link
Member

I think it is ok to change the behavior to verify bu default. We will document it in the release notes as a breaking change. The legacy option is needed to that people's start scripts don't break (they won't be affected by the default behavior change since they're already specifying the legacy flag explicitly).

@yurishkuro
Copy link
Member

@backjo one last push to get this over the line?

@backjo backjo requested a review from a team as a code owner January 26, 2020 22:34
Signed-off-by: Jonah Back <jonah_back@intuit.com>
Signed-off-by: Jonah Back <jonah@jonahback.com>
Signed-off-by: Jonah Back <jonah@jonahback.com>
Copy link
Member

@yurishkuro yurishkuro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! 🎉 🎉 🎉

@@ -97,7 +97,7 @@ required = [

[[constraint]]
name = "github.com/spf13/viper"
version = "^1.0.0"
version = "^1.5.0"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change necessary? If we're changing this file, we need to run dep ensure --update and also check-in the lock file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is - there was a bug in Viper where isSet was returning true even in cases where it was not set, but there was a default value. This was fixed in 1.5

@yurishkuro
Copy link
Member

@pavolloffay are you ok to approve/merge?

Signed-off-by: Jonah Back <jonah@jonahback.com>
@pavolloffay
Copy link
Member

I have just tested the following ES configurations against elastic.co operator. So far all works as before.

to test skip-host-verify

cat <<EOF | kubectl apply -f -
apiVersion: jaegertracing.io/v1
kind: Jaeger
metadata:
  name: simple-prod
spec:
  strategy: production
  collector:
    image: pavolloffay/jaeger-collector:es-tls2
  query:
    image: pavolloffay/jaeger-query:es-tls2
  storage:
    type: elasticsearch
    options:
      es:
        server-urls: https://quickstart-es-http:9200
        tls:
          skip-host-verify: true
        num-shards: 1
        num-replicas: 0
    secretName: jaeger-secret
  volumeMounts:
    - name: certificates
      mountPath: /es/certificates/
      readOnly: true
  volumes:
    - name: certificates
      secret:
        secretName: quickstart-es-http-certs-public
EOF

To test CA cert without providing all certs.

cat <<EOF | kubectl apply -f -
apiVersion: jaegertracing.io/v1
kind: Jaeger
metadata:
  name: simple-prod
spec:
  strategy: production
  collector:
    image: pavolloffay/jaeger-collector:es-tls2
  query:
    image: pavolloffay/jaeger-query:es-tls2
  storage:
    type: elasticsearch
    options:
      es:
        server-urls: https://quickstart-es-http:9200
        tls:
          skip-host-verify: true
        num-shards: 1
        num-replicas: 0
    secretName: jaeger-secret
  volumeMounts:
    - name: certificates
      mountPath: /es/certificates/
      readOnly: true
  volumes:
    - name: certificates
      secret:
        secretName: quickstart-es-http-certs-public
EOF

Copy link
Member

@pavolloffay pavolloffay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM,

Could you please list it with an explanation regarding skip tls and Cassandra in our changelog?

return tlscfg.ClientFlagsConfig{
Prefix: config.namespace,
ShowEnabled: true,
ShowServerName: true,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This flag wasn't supported before.

@yurishkuro yurishkuro merged commit c33f5ea into jaegertracing:master Jan 28, 2020
@yurishkuro
Copy link
Member

Thank you, @backjo!

if err != nil {
return errors.Wrap(err, "error loading tls config")
func setTLSConfiguration(config *tlscfg.Options, saramaConfig *sarama.Config) error {
if config.Enabled {
Copy link
Contributor

@jpkrohling jpkrohling Feb 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a breaking change: previously, there was no --kafka.consumer.tls option (or producer), and TLS config would be used on whether the getTLS() function returned an error or not. Now, even if all the parameters are set, the collector/ingester won't be using TLS unless a new option is added to the command line, --kafka.consumer.tls.enabled (or the one marked as deprecated but that never existed: --kafka.consumer.tls).

For example, we had this in the jaeger-operator before, when auto-provisioning a Kafka cluster:

      --kafka.producer.authentication=tls
      --kafka.producer.brokers=auto-provision-kafka-kafka-bootstrap.default.svc.cluster.local:9093
      --kafka.producer.tls.ca=/var/run/secrets/auto-provision-kafka-cluster-ca/ca.crt
      --kafka.producer.tls.cert=/var/run/secrets/auto-provision-kafka/user.crt
      --kafka.producer.tls.key=/var/run/secrets/auto-provision-kafka/user.key

It now requires the new property to be added, otherwise, the collector/ingester will crash with the following error:

{"level":"fatal","ts":1582796332.9417691,"caller":"collector/main.go:91","msg":"Failed to init storage factory","error":"kafka: client has run out of available brokers to talk to (Is your cluster reachable?)","stacktrace":"main.main.func1\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/cmd/collector/main.go:91\ngithub.com/jaegertracing/jaeger/vendor/github.com/spf13/cobra.(*Command).execute\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/vendor/github.com/spf13/cobra/command.go:698\ngithub.com/jaegertracing/jaeger/vendor/github.com/spf13/cobra.(*Command).ExecuteC\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/vendor/github.com/spf13/cobra/command.go:783\ngithub.com/jaegertracing/jaeger/vendor/github.com/spf13/cobra.(*Command).Execute\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/vendor/github.com/spf13/cobra/command.go:736\nmain.main\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/cmd/collector/main.go:183\nruntime.main\n\t/home/travis/.gimme/versions/go1.13.5.linux.amd64/src/runtime/proc.go:203"}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the "new" set of parameters to make it work?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tls.enabled ones: kafka.consumer.tls.enabled and kafka.producer.tls.enabled.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So to be backward compatible, it would have to use the getTLS() approach still - possibly with the getTLS() method checking the enabled flag (i.e. if enabled = false then don't use)?

Copy link
Contributor

@jpkrohling jpkrohling Feb 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would work, unless "not specified" (nil) means false when checking the tls.enabled flag.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep - unfortunately tls.enabled default is false.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created #2092 to track this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Refactor TLS flags & logic into a shared module
5 participants