Swarm overlay network does not routing IP address (without userns-remap) after restart nodes

**Description**
I use 3 node swarm with 3 manager on AWS, each node created by docker-machine (ami-87b917e4)
after restart nodes, some of container cannot communicate each other via IP address and service name.

**Steps to produce the issue:**
1. create network
```
docker network create --driver overlay --subnet 10.0.150.0/24 prod-nw
```

2. create 4 backend service and 1 frontend service, which is global mode. note that each container has at least 1 publish setting (I omit fluent logger setting to simplify)
```
docker service create --name backend-1 --replicas 1 --with-registry-auth --network prod-nw --publish 8200:8082 $(backend-1-image)
docker service create --name backend-2 --replicas 1 --with-registry-auth --network prod-nw --publish 8201:8082 $(backend-2-image)
docker service create --name backend-3 --replicas 1 --with-registry-auth --network prod-nw --publish 8100:8082 $(backend-3-image) 
docker service create --name backend-4 --replicas 1 --with-registry-auth --network prod-nw --publish 8101:8082 $(backend-4-image)
docker service create --name frontend --mode global --publish mode=host,published=50051,target=50051 --with-registry-auth --network prod-nw --publish mode=host,published=8082,target=8082 $(frontend-image)
```

3. after restart nodes, try to connect to the other service via the DNS entry/VIP

**Describe the results you received:**
- each container had following IPs on prod-nw:
```
backend-1:  10.0.150.30
backend-2: 10.0.150.12
backend-3: 10.0.150.32
backend-4: 10.0.150.17
frontend-1: 10.0.150.15
frontend-2: 10.0.150.4
frontend-3: 10.0.150.9
```
- most of connectivity work well except:
```
frontend-1 <-> backend-4
frontend-2 <-> backend-2
backend-2 -> frontend-3 (weird, because connection from frontend-3 to backend-2 seems to be established)
``` 

- and if connectivity lost, even with direct IP, got following errors: 
  - No route to host at 10.0.150.12 (backend-2 -> frontend-3)
  ```
  $ telnet 10.0.150.9 50051
  Trying 10.0.150.9...
  telnet: Unable to connect to remote host: No route to host
  $ netstat -an | grep ESTABLISHED # report connection established
  tcp        0      0 10.0.150.12:50051       10.0.150.9:53242        ESTABLISHED
  tcp        0      0 10.0.150.12:50051       10.0.150.15:55472       ESTABLISHED
  ```
  - Connection timed out at 10.0.150.17 (backend-4 -> frontend-1)
  ```
  telnet 10.0.150.15 50051
  Trying 10.0.150.15...
  telnet: Unable to connect to remote host: Connection timed out
  ```

**Describe the results you expected:**
I expected to be able to connect to the service using the VIP created for the service and route accordingly.

**Additional information you deem important (e.g. issue happens only occasionally):**
its similar to #26106, but a few difference, so suggested to create as new issue:
- using docker-machine created AWS docker instance (ubuntu 16.04 LTS)
- I do not explicitly specify userns-remap setting (I'm not sure implicitly set)
- not only container name, but also specifying direct IP does not work (No route to host)

**Output of `docker version`:**

```
Client:
 Version:      17.03.1-ce
 API version:  1.27
 Go version:   go1.7.5
 Git commit:   c6d412e
 Built:        Mon Mar 27 17:14:09 2017
 OS/Arch:      linux/amd64

Server:
 Version:      17.05.0-ce
 API version:  1.29 (minimum version 1.12)
 Go version:   go1.7.5
 Git commit:   89658be
 Built:        Thu May  4 22:10:54 2017
 OS/Arch:      linux/amd64
 Experimental: false
```

**Output of `docker info`:**

```
Containers: 55
 Running: 7
 Paused: 0
 Stopped: 48
Images: 74
Server Version: 17.05.0-ce
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 281
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins: 
 Volume: local
 Network: bridge host macvlan null overlay
Swarm: active
 NodeID: kcpanuat85bztrvktep186fg8
 Is Manager: true
 ClusterID: qclswzn5foalbgmlkhh2e95i6
 Managers: 3
 Nodes: 3
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Number of Old Snapshots to Retain: 0
  Heartbeat Tick: 1
  Election Tick: 3
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 3 months
 Node Address: 172.32.11.239
 Manager Addresses:
  172.32.11.239:2377
  172.32.11.40:2377
  172.32.2.28:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 9048e5e50717ea4497b757314bad98ea3763c145
runc version: 9c2d8d184e5da67c95d601382adf14862e4f2228
init version: 949e6fa
Security Options:
 apparmor
 seccomp
  Profile: default
Kernel Version: 4.4.0-79-generic
Operating System: Ubuntu 16.04.1 LTS
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 15.67GiB
Name: swarm-master
ID: YAOV:4AKS:YOJL:GKDF:HHTV:XW24:ZMOI:M7HU:7T2Q:E5PZ:5KW4:45FI
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
 provider=amazonec2
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

WARNING: No swap limit support
```

**Additional environment details (AWS, VirtualBox, physical, etc.):**
AWS, 3 node swarm, 3 manager, each node created by docker-machine (ami-87b917e4)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Swarm overlay network does not routing IP address (without userns-remap) after restart nodes #34165

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Swarm overlay network does not routing IP address (without userns-remap) after restart nodes #34165

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions