Skip to content

Local consul agent cannot reach consul server via RPC on a Swarm #503

@lucj

Description

@lucj

I'm trying ContainerPilot on a Swarm and got an error from a service as its local consul agent cannot connect to the consul server.

The Compose file I'm using


version: '3.3'
services:
  consul:
    image: consul:0.9.2
    command: agent -server -client=0.0.0.0 -bootstrap -ui -bind '{{ GetInterfaceIP "eth0"  }}'
    dns:
      - 127.0.0.1
    networks:
      - appnet
    ports:
      - "8500:8500"
  api:
    image: myorg/api
    command: ["containerpilot"]
    networks:
      - appnet
  db:
    image: autopilotpattern/mongodb
    networks:
      - appnet
    volumes:
      - mongo-data:/data/db
volumes:
  mongo-data:
networks:
  appnet:

The thing is the db service, based on the autopilotpattern/mongodb, is correctly registered in Consul but the api service is not.

In a manage.sh file in the api service, I have added a sed command to set the bind_addr of the consul agent.


#!/bin/sh

event=$1
echo "Received event:[$event]"

if [ "$event" = "prestart" ];then

  # Update the Consul '-advertise' address to use the interface ContainerPilot was told to listen on
  echo "IP set for current API container: ${CONTAINERPILOT_API_IP}"
  sed -i "s/IP_ADDRESS/${CONTAINERPILOT_API_IP}/" /config/consul.json

  # Wait for the db to be available
  while [[ "$(curl -s http://localhost:8500/v1/health/service/mongodb-replicaset | grep passing)" = "" ]]
  do
    echo "db is not yet healthly..."
    sleep 5
  done
  echo "db is healthly, moving on..."
  exit 0
fi

# If db not accessible anymore, restart the api service
if [ "$event" = "db-change" ];then
  pkill -SIGHUP node
fi

But I got the following error from its logs:


2017/09/01 17:05:26 [ERR] consul: RPC failed to server 10.0.0.7:8300: rpc error: failed to get conn: dial tcp 10.0.0.2:0->10.0.0.7:8300: i/o timeout
2017-09-01T17:05:26.992751673Z     2017/09/01 17:05:26 [ERR] http: Request GET /v1/health/service/mongodb-replicaset?passing=1, error: rpc error: failed to get conn: dial tcp 10.0.0.2:0->10.0.0.7:8300: i/o timeout from=127.0.0.1:52830
2017-09-01T17:05:26.993280146Z failed to query mongodb-replicaset: Unexpected response code: 500 (rpc error: failed to get conn: dial tcp 10.0.0.2:0->10.0.0.7:8300: i/o timeout) []
2017-09-01T17:05:26.993973624Z     2017/09/01 17:05:26 [ERR] consul: RPC failed to server 10.0.0.7:8300: rpc error: failed to get conn: rpc error: lead thread didn't get connection
2017-09-01T17:05:26.994033652Z     2017/09/01 17:05:26 [ERR] agent: failed to sync changes: rpc error: failed to get conn: rpc error: lead thread didn't get connection
2017-09-01T17:05:26.994062418Z     2017/09/01 17:05:26 [ERR] consul: RPC failed to server 10.0.0.7:8300: rpc error: failed to get conn: rpc error: lead thread didn't get connection
2017-09-01T17:05:26.994080896Z     2017/09/01 17:05:26 [ERR] consul: RPC failed to server 10.0.0.7:8300: rpc error: failed to get conn: rpc error: lead thread didn't get connection
2017-09-01T17:05:26.99410307Z     2017/09/01 17:05:26 [ERR] agent: Coordinate update error: rpc error: failed to get conn: rpc error: lead thread didn't get connection
2017-09-01T17:05:26.994125344Z     2017/09/01 17:05:26 [ERR] http: Request GET /v1/health/service/mongodb-replicaset, error: rpc error: failed to get conn: rpc error: lead thread didn't get connection from=127.0.0.1:52942
2017-09-01T17:05:26.996276493Z db is not yet healthly...

The value set for the CONTAINERPILOT_API_IP env var is 10.0.0.2.


/app # cat /proc/15/environ | tr \\0 "\n"
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
HOSTNAME=75448ab9cf09
VERSION=v6.9.4
NPM_VERSION=3
LAST_UPDATED=20170515T152500
CONTAINERPILOT_VER=3.3.0
CONTAINERPILOT=/etc/containerpilot.json5
HOME=/root
CONTAINERPILOT_PID=10
CONTAINERPILOT_API_IP=10.0.0.2
CONTAINERPILOT_CONTAINERPILOT_IP=10.0.0.2

If I check the interfaces of the api service I get 2 IPs, one for the container, the other one for the service. Could the wrong IP be used in this case ?


259: eth0@if260:  mtu 1450 qdisc noqueue state UP
    link/ether 02:42:0a:00:00:03 brd ff:ff:ff:ff:ff:ff
    inet 10.0.0.3/24 scope global eth0
       valid_lft forever preferred_lft forever
    inet 10.0.0.2/32 scope global eth0
       valid_lft forever preferred_lft forever

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions