-
Notifications
You must be signed in to change notification settings - Fork 18.8k
Description
Description
I use 3 node swarm with 3 manager on AWS, each node created by docker-machine (ami-87b917e4)
after restart nodes, some of container cannot communicate each other via IP address and service name.
Steps to produce the issue:
- create network
docker network create --driver overlay --subnet 10.0.150.0/24 prod-nw
- create 4 backend service and 1 frontend service, which is global mode. note that each container has at least 1 publish setting (I omit fluent logger setting to simplify)
docker service create --name backend-1 --replicas 1 --with-registry-auth --network prod-nw --publish 8200:8082 $(backend-1-image)
docker service create --name backend-2 --replicas 1 --with-registry-auth --network prod-nw --publish 8201:8082 $(backend-2-image)
docker service create --name backend-3 --replicas 1 --with-registry-auth --network prod-nw --publish 8100:8082 $(backend-3-image)
docker service create --name backend-4 --replicas 1 --with-registry-auth --network prod-nw --publish 8101:8082 $(backend-4-image)
docker service create --name frontend --mode global --publish mode=host,published=50051,target=50051 --with-registry-auth --network prod-nw --publish mode=host,published=8082,target=8082 $(frontend-image)
- after restart nodes, try to connect to the other service via the DNS entry/VIP
Describe the results you received:
- each container had following IPs on prod-nw:
backend-1: 10.0.150.30
backend-2: 10.0.150.12
backend-3: 10.0.150.32
backend-4: 10.0.150.17
frontend-1: 10.0.150.15
frontend-2: 10.0.150.4
frontend-3: 10.0.150.9
- most of connectivity work well except:
frontend-1 <-> backend-4
frontend-2 <-> backend-2
backend-2 -> frontend-3 (weird, because connection from frontend-3 to backend-2 seems to be established)
- and if connectivity lost, even with direct IP, got following errors:
- No route to host at 10.0.150.12 (backend-2 -> frontend-3)
$ telnet 10.0.150.9 50051 Trying 10.0.150.9... telnet: Unable to connect to remote host: No route to host $ netstat -an | grep ESTABLISHED # report connection established tcp 0 0 10.0.150.12:50051 10.0.150.9:53242 ESTABLISHED tcp 0 0 10.0.150.12:50051 10.0.150.15:55472 ESTABLISHED
- Connection timed out at 10.0.150.17 (backend-4 -> frontend-1)
telnet 10.0.150.15 50051 Trying 10.0.150.15... telnet: Unable to connect to remote host: Connection timed out
Describe the results you expected:
I expected to be able to connect to the service using the VIP created for the service and route accordingly.
Additional information you deem important (e.g. issue happens only occasionally):
its similar to #26106, but a few difference, so suggested to create as new issue:
- using docker-machine created AWS docker instance (ubuntu 16.04 LTS)
- I do not explicitly specify userns-remap setting (I'm not sure implicitly set)
- not only container name, but also specifying direct IP does not work (No route to host)
Output of docker version
:
Client:
Version: 17.03.1-ce
API version: 1.27
Go version: go1.7.5
Git commit: c6d412e
Built: Mon Mar 27 17:14:09 2017
OS/Arch: linux/amd64
Server:
Version: 17.05.0-ce
API version: 1.29 (minimum version 1.12)
Go version: go1.7.5
Git commit: 89658be
Built: Thu May 4 22:10:54 2017
OS/Arch: linux/amd64
Experimental: false
Output of docker info
:
Containers: 55
Running: 7
Paused: 0
Stopped: 48
Images: 74
Server Version: 17.05.0-ce
Storage Driver: aufs
Root Dir: /var/lib/docker/aufs
Backing Filesystem: extfs
Dirs: 281
Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Swarm: active
NodeID: kcpanuat85bztrvktep186fg8
Is Manager: true
ClusterID: qclswzn5foalbgmlkhh2e95i6
Managers: 3
Nodes: 3
Orchestration:
Task History Retention Limit: 5
Raft:
Snapshot Interval: 10000
Number of Old Snapshots to Retain: 0
Heartbeat Tick: 1
Election Tick: 3
Dispatcher:
Heartbeat Period: 5 seconds
CA Configuration:
Expiry Duration: 3 months
Node Address: 172.32.11.239
Manager Addresses:
172.32.11.239:2377
172.32.11.40:2377
172.32.2.28:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 9048e5e50717ea4497b757314bad98ea3763c145
runc version: 9c2d8d184e5da67c95d601382adf14862e4f2228
init version: 949e6fa
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 4.4.0-79-generic
Operating System: Ubuntu 16.04.1 LTS
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 15.67GiB
Name: swarm-master
ID: YAOV:4AKS:YOJL:GKDF:HHTV:XW24:ZMOI:M7HU:7T2Q:E5PZ:5KW4:45FI
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
provider=amazonec2
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
WARNING: No swap limit support
Additional environment details (AWS, VirtualBox, physical, etc.):
AWS, 3 node swarm, 3 manager, each node created by docker-machine (ami-87b917e4)