Skip to content

Secure cluster traffic via mutual TLS#2237

Merged
csmarchbanks merged 13 commits into
prometheus:mainfrom
hooten:tls
Aug 9, 2021
Merged

Secure cluster traffic via mutual TLS#2237
csmarchbanks merged 13 commits into
prometheus:mainfrom
hooten:tls

Conversation

@hooten
Copy link
Copy Markdown
Contributor

@hooten hooten commented Apr 20, 2020

Co-authored-by: Sharad Gaur sharadgaur@gmail.com
Signed-off-by: Dustin Hooten dustinhooten@gmail.com

This pull request makes it possible to use mutual TLS for cluster communications.

By adding the path to a TLS configuration file using the command line flag --cluster.tls-config, TLS will be used for inter-peer gossip.

Mutual TLS is achieved by configuring Memberlist to use a TLS implementation of the memberlist.Transport interface. This approach has been submitted in a design doc and implemented as a proof-of-concept in #1819. This pull request continues that work.

Closes #1322

Updates December 30, 2020: TLS server config uses common code from prometheus/exporter-toolkit.
Updates January 2021: TLS config also allows client config using code from prometheus/common.

@brian-brazil
Copy link
Copy Markdown
Contributor

This PR should not be considered for merging until the tls code is in common, it is not planned to ever have more than 1 copy of that code.

@hooten
Copy link
Copy Markdown
Contributor Author

hooten commented Apr 20, 2020

Happy to have the rest of it reviewed. We can clean up the TLS-Config code before merging.

Copy link
Copy Markdown
Member

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great that this is happening!

I added two comments inline.

In my proof-of-concept I extracted the memberlist specific logic to a separate project for other memberlist users to reuse. What do you think about doing the same here? Is the additional complexity worth the effort?

Comment thread cluster/tls_connection.go Outdated

// writePacket writes all the bytes in one operation so no concurrent write happens in between.
// It prefixes the connection type, the from address and the message length.
func writePacket(conn net.Conn, fromAddr string, b []byte) error {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To future-proof this protocol what do you think of sending a version number down the wire?

Protobuf would do length delimiting and versioning. Do you think that would be overkill?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We made a change to use protobuf for this. Any thoughts? Are there ways we could improve it? Thanks!

Comment thread cluster/tls_transport.go Outdated
// writes to it, and closes it. It also returns a time stamp of when
// the packet was written.
func (t *TLSTransport) WriteTo(b []byte, addr string) (time.Time, error) {
dialer := &net.Dialer{Timeout: DefaultTcpTimeout}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I understand correctly, that this is opening up a new TCP connection for each packet to send? What do you think of something along the lines of a connection pool?

Copy link
Copy Markdown
Contributor

@sharadgaur sharadgaur Apr 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It adds additional complexity as we need to manage the lifecycle of the connection and the Memberlist does not provide any good way to handle it. I would not recommend using a connection pool in this case.
In our testing, we did not see any issues yet. If you like we can run additional load tests.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What sort of network latency did the testing have?

Copy link
Copy Markdown
Contributor

@sharadgaur sharadgaur May 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An individual request is taking less than 3 mill seconds. Here are the benchmark results
goos: darwin
goarch: amd64
pkg: github.com/prometheus/alertmanager/cluster
BenchmarkWriteTo-12 1000000000 0.00243 ns/op 0 B/op 0 allocs/op
PASS
ok github.com/prometheus/alertmanager/cluster 0.111s
Success: Benchmarks passed.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The question isn't about how long things take over localhost, it's how many round trips may be required over a 200ms+ connection.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh got it. I will run a test against remotely deployed Alertmanger next week. If we see any issue we will implement a connection pool.
Thank you.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've added a connection pool :)

@csmarchbanks
Copy link
Copy Markdown
Member

In my proof-of-concept I extracted the memberlist specific logic to a separate project for other memberlist users to reuse. What do you think about doing the same here? Is the additional complexity worth the effort?

We already moved tsdb back inside of Prometheus to avoid dealing with version upgrades and such across projects. It is up to the maintainers of Alertmanager, but I would keep the lessons of tsdb in mind.

@stale stale Bot added the stale label Jul 5, 2020
@stale stale Bot removed the stale label Dec 30, 2020
@hooten
Copy link
Copy Markdown
Contributor Author

hooten commented Jan 12, 2021

I've made the requested updates. I'd appreciate another review @brian-brazil @mxinden @csmarchbanks. Thanks!

@hooten hooten requested a review from brian-brazil January 12, 2021 19:16
@brian-brazil
Copy link
Copy Markdown
Contributor

I've only taken a quick peek, and noticed that there's no config for the client-side parts of this. You should use https://github.com/prometheus/common/blob/20c99e7aa07352b599bd0517cd5ca945f4af0407/config/http_config.go#L325 for that.

@hooten
Copy link
Copy Markdown
Contributor Author

hooten commented Jan 12, 2021

Thanks! I will update it to use that one.

key_file: "certs/node1-key.pem"
client_ca_file: "certs/ca.pem"
client_auth_type: "VerifyClientCertIfGiven"
tls_client_config:
Copy link
Copy Markdown
Contributor Author

@hooten hooten Jan 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brian-brazil. Added this commit. Is this the sort of config that you suggested I add? Or was it something else? Thanks!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks right from a quick peek.

Comment thread cluster/tls_config.go Outdated
"io/ioutil"
"path/filepath"

common "github.com/prometheus/common/config"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't rename imports unless it's unavoidable, it makes it harder to grep and understand the code.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed the rename

Comment thread cluster/tls_config.go Outdated
return nil, err
}
config := &TLSTransportConfig{
TLSServerConfig: &web.TLSStruct{
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You shouldn't be tweaking any of this, leave it up to the library. A tls config should have the same defaults everywhere its used.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed this configuration. Now just unmarshaling

key_file: "certs/node1-key.pem"
client_ca_file: "certs/ca.pem"
client_auth_type: "VerifyClientCertIfGiven"
tls_client_config:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks right from a quick peek.

@hooten
Copy link
Copy Markdown
Contributor Author

hooten commented Jan 30, 2021

@simonpasquier I'd appreciate a review whenever you get a chance.

@robertjsullivan
Copy link
Copy Markdown

Hey y'all, we would love to see this merged! Any updates on when that will happen?

@beorn7
Copy link
Copy Markdown
Contributor

beorn7 commented Feb 23, 2021

@gotjosh would this be something for you to look at, to at the same time get more familiar with the codebase?

@gotjosh
Copy link
Copy Markdown
Member

gotjosh commented Feb 23, 2021

I'm happy to take a look at it eventually but my current backlog seems to not finish. This is quite a big change though, and I have 3 other things here (1 PR open, 1 to review and another to implement 😄 ).

I'll add it to my list but I don't expect this to be soonish.

@beorn7
Copy link
Copy Markdown
Contributor

beorn7 commented Apr 20, 2021

@gotjosh what's your current state? Do you think you'll get to this any time soon?

@hooten thanks for the recent commits. I tried to find out why the linter is failing in the test. Perhaps the problem can be solved by merging in (or rebasing on top of) master.

@gotjosh
Copy link
Copy Markdown
Member

gotjosh commented Apr 26, 2021

@beorn7 I'll set aside some time to review this soon. @hooten can you please rebase with master so that I can do some testing please?

@beorn7
Copy link
Copy Markdown
Contributor

beorn7 commented Apr 27, 2021

BTW, the test failures seem real (expired certificate) rather than just flaky.

@hooten
Copy link
Copy Markdown
Contributor Author

hooten commented Apr 29, 2021

Yes, I noticed that. Busy week at work. I will fix it soon

@hooten
Copy link
Copy Markdown
Contributor Author

hooten commented May 6, 2021

The expired certs issue has been fixed.

@hooten
Copy link
Copy Markdown
Contributor Author

hooten commented May 31, 2021

I would appreciate it if someone could review this.

@beorn7
Copy link
Copy Markdown
Contributor

beorn7 commented Jun 14, 2021

@gotjosh @simonpasquier what is your state on this? If you cannot review this any time soon, can you delegate this to someone else at Red Hat or Grafana who is invested here?

Copy link
Copy Markdown
Member

@csmarchbanks csmarchbanks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few minor comments to discuss/address, but nothing blocking. Since this is an additive change and all TLS support is already marked as experimental I move to merge this soon, and allow more people to test out the functionality. We can always provide fixes in additional PRs as they come up.

Comment thread cluster/connection_pool.go Outdated
if err != nil {
return nil, err
}
pool.connections[key] = conn
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like this pool will grow forever as addresses come and go? Are there concerns that could cause issues, my gut is that it is unlikely as the number of addresses is probably small over the lifetime of an alertmanager?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect that the number of connections in the pool will remain small unless peer addresses change regularly/frequently. Maybe it would be worth declaring bankruptcy on all the connections in the pool once it reaches a certain size (e.g., 1000). Is there a better way?

Copy link
Copy Markdown
Member

@csmarchbanks csmarchbanks Jul 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably the safest option would be a LRU with a max size that is sufficiently large, 1024 would make sense to me. Alternatively, you could periodically delete connections that have not been used for a long time.

You are right that the map probably would not grow to be very large, but it also feels not ideal to merge a known memory leak. Perhaps try the hashicorp lru library to see if that change is minor?

Comment thread cluster/tls_connection.go Outdated
Comment thread cluster/tls_connection.go Outdated
Copy link
Copy Markdown
Member

@gotjosh gotjosh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I apologise as I've been sitting on half of my review for a while, but overall the changes look good - my comments are mostly nits.

My take here is given we've cut a new version a couple of weeks ago and have had several patches to it accordingly, we can let this bake in master for a while before we cut a new version to ensure we have time to catch anything (if it comes!).

Comment thread docs/https.md Outdated
Comment thread docs/https.md Outdated
Comment thread docs/https.md Outdated
Comment thread cmd/alertmanager/main.go Outdated
Comment thread cluster/tls_connection.go Outdated
Comment thread cluster/tls_connection.go Outdated
@beorn7
Copy link
Copy Markdown
Contributor

beorn7 commented Jul 13, 2021

'm very sorry that it took so long to rally some reviewers, but @gotjosh and @csmarchbanks finally got to it a month ago. Both seem to just have a few minor nits. @hooten do you think you'll be able to address the comments? Then we could finally merge this and make it available for a somewhat wider public to try out.

@hooten
Copy link
Copy Markdown
Contributor Author

hooten commented Jul 13, 2021

Yep, Thanks for the reviews! I will get to it this week

@hooten
Copy link
Copy Markdown
Contributor Author

hooten commented Jul 26, 2021

@gotjosh, @csmarchbanks. Thanks again for the reviews. I pushed some fixes and left a couple comments/questions in response a few days ago. Let me know what you think. Thanks!

@hooten
Copy link
Copy Markdown
Contributor Author

hooten commented Jul 28, 2021

Thanks @csmarchbanks, I will give that a try sometime in the next couple of days

Copy link
Copy Markdown
Member

@gotjosh gotjosh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for the delay, but overall this LGTM. I don't want to delay this any further so I've left one nit and two comments.

Thanks @hooten for addressing my feedback.

Comment thread cluster/connection_pool.go Outdated
type connectionPool struct {
connections map[string]*tlsConn
tlsConfig *tls.Config
lock sync.Mutex
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
lock sync.Mutex
mtx sync.Mutex

last nit given you're about to touch this file as part of Chris's comments.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took away the mutex here because the lru cache provides its own locking mechanism, which I think suffices.

Comment thread cluster/tls_connection.go
// read returns a packet for packet connections or an error if there is one.
// It returns nothing if the connection is meant to be streamed.
func (conn *tlsConn) read() (*memberlist.Packet, error) {
if conn.connection == nil {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my understanding was that conn.mtx is protecting connection, any particular reason why we're not using the lock to read the connection here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I looked at it more and I think the lock was incidentally unnecessary because read is only called after rcvTLSConn, which creates a new connection object.

However, I'm adding a lock around the reading of the connection to make this safer to use.

Comment thread cluster/tls_transport.go Outdated
err = conn.writeStream()
if err != nil {
t.writeErrs.WithLabelValues("stream").Inc()
return conn.connection, errors.Wrap(err, "failed to create stream connection")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar thoughts here - perhaps, we're assuming we want to sync writing but not reading - is this correct?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a similar situation to the above. The connection is never reused so it is incidentally working. I'm adding a new function and some safety measures.

if err != nil {
return nil, err
}
pool.cache.Add(key, conn)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@csmarchbanks I updated this to use an lru cache. I removed the mutex from this file because the cache provides locking. However, I'm wondering if I still need it between the pool.cache.Get and the pool.cache.Add. Thoughts?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think yes as two concurrent operations could each dial a TLS connection and then add the connection for the same key?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might be able to use PeekOrAdd to avoid a mutex, but it would also have a bit of extra complexity. Basically:

  1. Do the get
  2. If not found, dial a TLS connection
  3. Use PeekOrAdd
  4. If found, use the "old" connection returned by PeekOrAdd, if added use the created connection.

Whichever way looks cleaner is fine by me.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried both ways. In the end, the mutex ended looking cleaner.

The PeekOrAdd would have been nice if I didn't always need to check whether the connection is alive when I get it back from the cache.

hooten and others added 13 commits August 5, 2021 16:50
Co-authored-by: Sharad Gaur <sharadgaur@gmail.com>
Signed-off-by: Dustin Hooten <dustinhooten@gmail.com>
Signed-off-by: Dustin Hooten <dustinhooten@gmail.com>
Signed-off-by: Dustin Hooten <dustinhooten@gmail.com>
Signed-off-by: Dustin Hooten <dustinhooten@gmail.com>
Signed-off-by: Dustin Hooten <dustinhooten@gmail.com>
Signed-off-by: Dustin Hooten <dustinhooten@gmail.com>
Signed-off-by: Dustin Hooten <dustinhooten@gmail.com>
Signed-off-by: Dustin Hooten <dustinhooten@gmail.com>
Signed-off-by: Dustin Hooten <dustinhooten@gmail.com>
Signed-off-by: Dustin Hooten <dustinhooten@gmail.com>
Signed-off-by: Dustin Hooten <dustinhooten@gmail.com>
Signed-off-by: Dustin Hooten <dustinhooten@gmail.com>
Signed-off-by: Dustin Hooten <dustinhooten@gmail.com>
Copy link
Copy Markdown
Member

@csmarchbanks csmarchbanks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-ran the tests and they pass now, looks like a flaky acceptance test. 👍 from me, I will wait for Monday to see if any comments/concerns come through and will then plan to merge this!

@csmarchbanks csmarchbanks merged commit ff85bec into prometheus:main Aug 9, 2021
@hooten hooten deleted the tls branch August 10, 2021 01:41
@hooten
Copy link
Copy Markdown
Contributor Author

hooten commented Aug 10, 2021

🎉 Thanks @csmarchbanks @gotjosh @beorn7 @mxinden @brian-brazil @sharadgaur !

@csmarchbanks
Copy link
Copy Markdown
Member

Thanks @hooten for all your work and commitment on this PR!

@markmsmith
Copy link
Copy Markdown

markmsmith commented Aug 12, 2021

+100, this is awesome, thank you!
Now that this has landed, is there a release planned soon, or are there any other blocking issues?

@pracucci pracucci mentioned this pull request Sep 15, 2021
3 tasks
nekketsuuu pushed a commit to nekketsuuu/alertmanager that referenced this pull request Oct 1, 2021
* Add TLS option to gossip cluster

Co-authored-by: Sharad Gaur <sharadgaur@gmail.com>
Signed-off-by: Dustin Hooten <dustinhooten@gmail.com>

* generate new certs that expire in 100 years

Signed-off-by: Dustin Hooten <dustinhooten@gmail.com>

* Fix tls_connection attributes

Signed-off-by: Dustin Hooten <dustinhooten@gmail.com>

* Improve error message

Signed-off-by: Dustin Hooten <dustinhooten@gmail.com>

* Fix tls client config docs

Signed-off-by: Dustin Hooten <dustinhooten@gmail.com>

* Add capacity arg to message buffer

Signed-off-by: Dustin Hooten <dustinhooten@gmail.com>

* fix formatting

Signed-off-by: Dustin Hooten <dustinhooten@gmail.com>

* Update version; add version validation

Signed-off-by: Dustin Hooten <dustinhooten@gmail.com>

* use lru cache for connection pool

Signed-off-by: Dustin Hooten <dustinhooten@gmail.com>

* lock reading from the connection

Signed-off-by: Dustin Hooten <dustinhooten@gmail.com>

* when extracting net.Conn from tlsConn, lock and throw away wrapper

Signed-off-by: Dustin Hooten <dustinhooten@gmail.com>

* Add mutex to connection pool to protect cache

Signed-off-by: Dustin Hooten <dustinhooten@gmail.com>

* fix linting

Signed-off-by: Dustin Hooten <dustinhooten@gmail.com>

Co-authored-by: Sharad Gaur <sharadgaur@gmail.com>
@osipov-vadim
Copy link
Copy Markdown

osipov-vadim commented Mar 4, 2022

Hello! Thanks a lot for adding this feature, seems like a very great thing!

I have a few questions;

  1. What format would TLS Config file be? I tried to follow up with few formats, and it doesn't seem that those are the right ones.
  2. Considering question above , when I pass --cluster.tls-config flag down to AM, it fails to start. Error is: level=error ts=2022-03-04T21:21:25.340Z caller=coordinator.go:118 component=configuration msg="Loading configuration file failed" file=alertmanager.yml err="open alertmanager.yml: no such file or directory" Where file=alertmanager.yml is not the path that was specified by a line to start it. Would that be because config file possibly wrong?
  3. Is this feature enabled in by default in Alertmanager, or I need to download additional libraries? Not entirely clear.

I would highly appreciate advice

@Scrin
Copy link
Copy Markdown

Scrin commented Mar 4, 2022

  1. An example TLS config file can be found here
  2. The main config file (typically alertmanager.yml, provided via --config.file) is different form the TLS config file, make sure you are not mixing them up or missing the main config file. Something like:
    /bin/alertmanager --config.file=/etc/alertmanager/alertmanager.yml --cluster.tls-config=/etc/alertmanager/tls.yml (and any other cli flags) should work
  3. I believe it should be enabled by default in AM if you provide a valid configuration, at least on the docker image the config is all you need

@osipov-vadim
Copy link
Copy Markdown

osipov-vadim commented Mar 7, 2022

@Scrin Thanks a lot for reply!
Yeah, seems that I totally did not see the document there. Though was googling and searching for the tls config file example. Exactly what I needed!

I am not using a ready made docker image, I'm building my own. At this point I get an error alertmanager: error: unknown long flag '--cluster.tls-config', try --help where --help does not list --cluster.tls-config flag at all. Alertmanager version 0.23.0

Maybe I've missed a step in installation process?

Update:
Also tried passing the --cluster.tls-config to this image:
docker run --name alertmanager -d -p 127.0.0.1:9093:9093 quay.io/prometheus/alertmanager
seem to get exactly same thing, AM doesn't know about --cluster.tls-config flag.

@Scrin
Copy link
Copy Markdown

Scrin commented Mar 8, 2022

0.23.0 was released before this was merged, so it's not available in that version yet, and the next version doesn't seem to be released yet. If you build AM from the main branch (instead of the v0.23.0 tag) you should get this feature (Docker users can use quay.io/prometheus/alertmanager:main or prom/alertmanager:main docker image for that)

}{
{bindAddr: localhost, bindPort: 9094, inputIp: "10.0.0.5", inputPort: 54231, expectedIp: "10.0.0.5", expectedPort: 54231},
{bindAddr: localhost, bindPort: 9093, inputIp: "invalid", inputPort: 54231, expectedError: "failed to parse advertise address \"invalid\""},
{bindAddr: "0.0.0.0", bindPort: 0, inputIp: "", inputPort: 0, expectedIp: "random"},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Such a test is not very packager-friendly. Most distros will run unit tests in clean room environment when packaging, which can often mean a host with only a loopback interface (with 127.0.0.1 and usually [::1] configured), and no default route.

On Debian buildbots, and presumably other distros with similar clean room package building policy, this fails with:

=== RUN   TestFinalAdvertiseAddr
    tls_transport_test.go:89: 
        	Error Trace:	/<<PKGBUILDDIR>>/build/src/github.com/prometheus/alertmanager/cluster/tls_transport_test.go:89
        	Error:      	Expected nil, but got: &errors.errorString{s:"no private IP address found, and explicit IP not provided"}
        	Test:       	TestFinalAdvertiseAddr
--- FAIL: TestFinalAdvertiseAddr (0.00s)

This is due to the bindAddr being "0.0.0.0" and the inputIp being empty.

I see that TestClusterJoinAndReconnect in cluster/cluster_test.go at least checks first whether a private IP exists, and skips the test if none is found. Perhaps it makes sense to omit this particular test case in such a scenario also.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that TestClusterJoinAndReconnect in cluster/cluster_test.go at least checks first whether a private IP exists, and skips the test if none is found. Perhaps it makes sense to omit this particular test case in such a scenario also.

I agree and the skip in cluster_test.go exists exactly for the same constraint (#1445).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Securing the gossip protocol?