Skip to content

[Cluster] Error 503 handling with ROUND_ROBIN #710

Closed
@orgrimarr

Description

@orgrimarr

Environment

  • ArangoJS: 7.2.0
  • ArangoDB 3.7.5 Cluster
    • Tested on docker
    • Tested on windows

Description

When en arango is in maintenance mode or is starting, it return an error 503.

This error is handled by arangojs, but with ROUND_ROBIN, it should try an other node even if arango does not respond whith LEADER_ENDPOINT_HEADER

This throw an error instead of retrying

Steps to reproduce

Coordinator run on ports 9001, 9002, 9003

Script

const arangojs = require('./build')

const test = async function(){
    const db = new arangojs.Database({
        url: ['https://127.0.0.1:9001', 'https://127.0.0.1:9002', 'https://127.0.0.1:9003'],
        maxRetries: 3,
        databaseName: '_system',
        loadBalancingStrategy: "ROUND_ROBIN",
        agentOptions: {
            rejectUnauthorized: false
        }
    })

    db.useBasicAuth('root', '')


    while(true){
        console.time('test')
        const cursor = await db.query({
            query: `
                FOR element IN @@collection
                RETURN element
            `,
            bindVars: {
                "@collection": '_users'
            }
        })
        const results = []
        for await (const result of cursor) {
            results.push(result)
        }
        console.timeEnd('test')
        console.log('OK', results.length)
    }
}

test()
.catch(console.error)

Cluster

  • node1: /usr/bin/arangodb --ssl.auto-key
  • node2: /usr/bin/arangodb --ssl.auto-key --starter.join=db1
  • node3: /usr/bin/arangodb --ssl.auto-key --starter.join=db1

docker-compose.yaml

version: '3.7'
services:
  db1:
    image: arangodb:3.7.5
    container_name: db1
    command: /usr/bin/arangodb --ssl.auto-key
    environment:
      ARANGO_ROOT_PASSWORD:
    ports:
      - 9001:8529
    volumes:
      - ./db1:/data
  db2:
    image: arangodb:3.7.5
    container_name: db2
    command: /usr/bin/arangodb --ssl.auto-key --starter.join=db1
    environment:
      ARANGO_ROOT_PASSWORD:
    ports:
      - 9002:8529
    volumes:
      - ./db2:/data
  db3:
    image: arangodb:3.7.5
    container_name: db3
    command: /usr/bin/arangodb --ssl.auto-key --starter.join=db1
    environment:
      ARANGO_ROOT_PASSWORD:
    ports:
      - 9003:8529
    volumes:
      - ./db3:/data

Steps

  • Run the cluster
  • Run the script
  • Kill an arango instance
  • If script still running then restart the failed node

Error

    body: {
      error: true,
      errorNum: 503,
      errorMessage: 'service unavailable due to startup or maintenance mode',
      code: 503
    },
    arangojsHostId: 2,
    [Symbol(kCapture)]: false
  },
  errorNum: 503,
  code: 503

Proposition

  • Apply the retry strategy for error 503 without LEADER_ENDPOINT_HEADER
  • connection.ts, line 61X

Example:

      } else {
        const response = res!;
        if (response.statusCode === 503) {
          if(response.headers[LEADER_ENDPOINT_HEADER]){
            const url = response.headers[LEADER_ENDPOINT_HEADER]!;
            const [index] = this.addToHostList(url);
            task.host = index;
            if (this._activeHost === host) {
              this._activeHost = index;
            }
            this._queue.push(task);
          }
          else if(!task.host && this._shouldRetry && task.retries < (this._maxRetries || this._hosts.length - 1)){
            task.retries += 1;
            this._queue.push(task);
          }
          else {
            response.arangojsHostId = host;
            task.resolve(response);
          }
        } else {
          response.arangojsHostId = host;
          task.resolve(response);
        }
      }

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugA code defect that needs to be fixed.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions