Consider leveraging docker healthchecks #85

erikengervall · 2019-09-20T15:37:28Z

Background

Docker supports container healthchecking in various ways, one of which is via docker-compose.

The container's healthcheck periodically executes the test-command provided to establish whether or not the container is healthy.

The current state is retrievable either via docker inspect <containerId> on a State object:

{
  State: {
    Status: 'running',
    Running: true,
    Paused: false,
    Restarting: false,
    OOMKilled: false,
    Dead: false,
    Pid: ...,
    ExitCode: 0,
    Error: '',
    StartedAt: '...',
    FinishedAt: '...',
    Health: {
      Status: 'unhealthy',
      FailingStreak: 1 ... n,
      Log: [
        {
          Start: '...',
          End: '...',
          ExitCode: -1,
          Output:
            '...',
        },
      ],
    },
  },
}

...or using docker events

Pro's & Con's

Pro's

Builtin functionality
Easy to build your own healthcheck in a compose file

Con's

Would require polling the container's State using docker inspect
Harder to incorporate via code, compared to a healthcheck executed outside of the container
test-commands would execute periodically even when the container has been established as healthy

The text was updated successfully, but these errors were encountered:

n1ru4l · 2019-09-23T07:49:29Z

Would require polling the container's State using docker inspect

It should be possible to use docker events with stdout, so no polling is required.

Harder to incorporate via code, compared to a health check executed outside of the container

An approach like the example in this comment could abstract this, in a way that allows users to decide whether they want to use a docker health check as a readiness check or not.

// readiness check passes once health status is succeeding for the first time
const readinessCheck = new DockerHealthStatusReadinessCheck()

// "legacy" implementation extracted from current `PostgresRunner`
const readinessCheck = new RedisReadinessCheck({ port: 1234, password: 'foo' })

// custom readiness implementation
const readinessCheck = new CustomReadinessCheck({
  timeout: 10000,
  command: containerId => `docker exec ${containerId} my-fancy-command`
})

n1ru4l · 2019-10-09T08:22:08Z

So here is some more input:

I am currently trying to use dockest for implementing tests for a multi-container application (multiple redis, PostgreSQL, python and nodejs containers). On my local machine everything goes fine, however, there are some issues on the actual CI server:

I noticed that the current readiness checks are quite error-prone, they can timeout fast (because CI servers can be slower than your local machine). If you increase the timeout and the container dies before booting up, the readiness check will go on for a while and fail (because the container already died 😅). Ideally, the readiness check should automatically fail in case the container dies.

I think it is pretty crucial to leverage docker events. A flow could look like this:

Wait for start/attach event
Do the following in parallel:
2.1 Run the readiness check
2.2 Listen for die/stop event -> automatically fail/cancel readiness check

In case there is no readiness check specified, e.g. I have a container that is only started for creating s3 buckets for minio that will exit after creating the buckets, we could only listen for the start/attach event.

n1ru4l · 2019-10-11T12:54:09Z

My current idea is the following:

spawn docker-compose events child process and wrap it with an EventEmitter
create a ReplaySubject stream for each service (it ensures that new subscribers receive the events prior to the subscription. The observable is then assigned to the runner instance/passed as an argument into the health check start method). TBH I have not intensively used observables before, however, I could not find any other drop-in "data-structure" that solves this problem. Please let me know, in case someone has alternative recommendations!

The interface of a readiness check could look like this:

import { Runner } from '../runners/@types'
import { ServiceDockerEventStream } from '../onRun/createServiceDockerEventStream'

export enum ReadinessCheckResult {
  TIMEOUT = 'TIMEOUT',
  SUCCESS = 'SUCCESS',
  CANCEL = 'CANCEL',
}

export interface ReadinessCheck {
  start(runner: Runner, eventStream$: ServiceDockerEventStream): Promise<ReadinessCheckResult>
  cancel(): void
}

a generic command readiness check could look like this:

import sleep from '../../utils/sleep'
import execaWrapper from '../../utils/execaWrapper'
import { Runner } from '../../runners/@types'

export class ShellCommandReadinessCheck implements ReadinessCheck {
  private command: string
  private runInsideContainer: boolean
  private timeout: number
  private isAborted: boolean

  constructor({
    command,
    runInsideContainer = false,
    timeout = 30,
  }: {
    command: string
    runInsideContainer?: boolean
    timeout: number
  }) {
    this.command = command
    this.runInsideContainer = runInsideContainer
    this.timeout = timeout
    this.isAborted = false
  }

  private getCommand(containerId: string) {
    if (!this.runInsideContainer) {
      return this.command
    }
    return `docker exec ${containerId} ${this.command}`
  }

  public async start(runner: Runner) {
    let remainingTries = this.timeout
    while (remainingTries > 0) {
      if (this.isAborted) return ReadinessCheckResult.ABORT
      try {
        await execaWrapper(this.getCommand(runner.containerId), runner)
        return ReadinessCheckResult.SUCCESS
      } catch (err) {
        remainingTries--
        await sleep(1000)
      }
    }

    return ReadinessCheckResult.TIMEOUT
  }

  public cancel() {
    this.isAborted = true
  }
}

This would be a readiness check that waits until a healthy event is fired

import { race, timer, Subject } from 'rxjs'
import { mapTo, first, take, tap, takeUntil } from 'rxjs/operators'

import { ReadinessCheck, ReadinessCheckResult } from './@types'
import { Runner } from '../runners/@types'
import { ServiceDockerEventStream } from '../onRun/createServiceDockerEventStream'

export class ContainerIsHealthyReadinessCheck implements ReadinessCheck {
  private timeout: number

  constructor({ timeout = 500 }: { timeout?: number }) {
    this.timeout = timeout
  }

  public start(_runner: Runner, eventStream$: ServiceDockerEventStream) {
    const stop$ = new Subject()

    const healthyEventEmitted$ = eventStream$.pipe(
      takeUntil(stop$),
      first(event => event.action === 'health_status' && event.attributes.healthStatus === 'healthy'),
      mapTo(ReadinessCheckResult.SUCCESS),
    )

    const timeout$ = timer(this.timeout).pipe(
      takeUntil(stop$),
      mapTo(ReadinessCheckResult.TIMEOUT),
    )

    const cancelSubject = new Subject()
    this.cancel = () => {
      cancelSubject.next()
      cancelSubject.complete()
    }
    const cancel$ = cancelSubject.pipe(mapTo(ReadinessCheckResult.CANCEL))

    return race(healthyEventEmitted$, timeout$, cancel$).pipe(
      take(1),
      tap({
        complete: () => {
          stop$.next()
          stop$.complete()
        },
      }),
    ).toPromise()
  }

  public cancel() {
    throw new Error('Cannot cancel before started.')
  }
}

Adds a docker event bus that can be used for improving the dockest experience. It is currently used in the following ways: Wait for container start before trying to resolve the container id (which solves possible timeouts due to first downloading the image) and stop resolving the container id in case the container unexpectedly dies. This results in less flakiness. In the future the readiness check API could be based upon docker events (see erikengervall#85 (comment)).

* feat: use docker events for better reliability. Adds a docker event bus that can be used for improving the dockest experience. It is currently used in the following ways: Wait for container start before trying to resolve the container id (which solves possible timeouts due to first downloading the image) and stop resolving the container id in case the container unexpectedly dies. This results in less flakiness. In the future the readiness check API could be based upon docker events (see #85 (comment)). * update snapshots * implicitly return undefined right away * resolve container id via docker events * abort check connection in case the container dies * abort healthcheck in case container unexpectedly dies. * add healthcheck for checking whether the health_status event is emitted. * apply review feedback * add more tests Co-authored-by: Erik Engervall <erik.engervall@gmail.com>

erikengervall added the enhancement New feature or request label Sep 24, 2019

n1ru4l mentioned this issue Oct 9, 2019

Prepare repository for 2.0.0 alpha #93

Closed

2 tasks

n1ru4l mentioned this issue Oct 28, 2019

feat: collect docker logs (and docker events) #98

Closed

n1ru4l mentioned this issue Feb 3, 2020

feat: use docker events for faster test feedback #136

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider leveraging docker healthchecks #85

Consider leveraging docker healthchecks #85

erikengervall commented Sep 20, 2019 •

edited

Loading

n1ru4l commented Sep 23, 2019

n1ru4l commented Oct 9, 2019 •

edited

Loading

n1ru4l commented Oct 11, 2019 •

edited

Loading

Consider leveraging docker healthchecks #85

Consider leveraging docker healthchecks #85

Comments

erikengervall commented Sep 20, 2019 • edited Loading

Background

Pro's & Con's

Pro's

Con's

n1ru4l commented Sep 23, 2019

n1ru4l commented Oct 9, 2019 • edited Loading

n1ru4l commented Oct 11, 2019 • edited Loading

erikengervall commented Sep 20, 2019 •

edited

Loading

n1ru4l commented Oct 9, 2019 •

edited

Loading

n1ru4l commented Oct 11, 2019 •

edited

Loading