Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider leveraging docker healthchecks #85

Open
erikengervall opened this issue Sep 20, 2019 · 3 comments
Open

Consider leveraging docker healthchecks #85

erikengervall opened this issue Sep 20, 2019 · 3 comments
Labels
enhancement New feature or request

Comments

@erikengervall
Copy link
Owner

erikengervall commented Sep 20, 2019

Background

Docker supports container healthchecking in various ways, one of which is via docker-compose.

The container's healthcheck periodically executes the test-command provided to establish whether or not the container is healthy.

The current state is retrievable either via docker inspect <containerId> on a State object:

{
  State: {
    Status: 'running',
    Running: true,
    Paused: false,
    Restarting: false,
    OOMKilled: false,
    Dead: false,
    Pid: ...,
    ExitCode: 0,
    Error: '',
    StartedAt: '...',
    FinishedAt: '...',
    Health: {
      Status: 'unhealthy',
      FailingStreak: 1 ... n,
      Log: [
        {
          Start: '...',
          End: '...',
          ExitCode: -1,
          Output:
            '...',
        },
      ],
    },
  },
}

...or using docker events

Pro's & Con's

Pro's

  • Builtin functionality
  • Easy to build your own healthcheck in a compose file

Con's

  • Would require polling the container's State using docker inspect
  • Harder to incorporate via code, compared to a healthcheck executed outside of the container
  • test-commands would execute periodically even when the container has been established as healthy
@n1ru4l
Copy link
Collaborator

n1ru4l commented Sep 23, 2019

Would require polling the container's State using docker inspect

It should be possible to use docker events with stdout, so no polling is required.

Harder to incorporate via code, compared to a health check executed outside of the container

An approach like the example in this comment could abstract this, in a way that allows users to decide whether they want to use a docker health check as a readiness check or not.

// readiness check passes once health status is succeeding for the first time
const readinessCheck = new DockerHealthStatusReadinessCheck()

// "legacy" implementation extracted from current `PostgresRunner`
const readinessCheck = new RedisReadinessCheck({ port: 1234, password: 'foo' })

// custom readiness implementation
const readinessCheck = new CustomReadinessCheck({
  timeout: 10000,
  command: containerId => `docker exec ${containerId} my-fancy-command`
})

@erikengervall erikengervall added the enhancement New feature or request label Sep 24, 2019
@n1ru4l
Copy link
Collaborator

n1ru4l commented Oct 9, 2019

So here is some more input:

I am currently trying to use dockest for implementing tests for a multi-container application (multiple redis, PostgreSQL, python and nodejs containers). On my local machine everything goes fine, however, there are some issues on the actual CI server:

I noticed that the current readiness checks are quite error-prone, they can timeout fast (because CI servers can be slower than your local machine). If you increase the timeout and the container dies before booting up, the readiness check will go on for a while and fail (because the container already died 😅). Ideally, the readiness check should automatically fail in case the container dies.

I think it is pretty crucial to leverage docker events. A flow could look like this:

  1. Wait for start/attach event
  2. Do the following in parallel:
    2.1 Run the readiness check
    2.2 Listen for die/stop event -> automatically fail/cancel readiness check

In case there is no readiness check specified, e.g. I have a container that is only started for creating s3 buckets for minio that will exit after creating the buckets, we could only listen for the start/attach event.

@n1ru4l
Copy link
Collaborator

n1ru4l commented Oct 11, 2019

My current idea is the following:

  1. spawn docker-compose events child process and wrap it with an EventEmitter
  2. create a ReplaySubject stream for each service (it ensures that new subscribers receive the events prior to the subscription. The observable is then assigned to the runner instance/passed as an argument into the health check start method). TBH I have not intensively used observables before, however, I could not find any other drop-in "data-structure" that solves this problem. Please let me know, in case someone has alternative recommendations!

The interface of a readiness check could look like this:

import { Runner } from '../runners/@types'
import { ServiceDockerEventStream } from '../onRun/createServiceDockerEventStream'

export enum ReadinessCheckResult {
  TIMEOUT = 'TIMEOUT',
  SUCCESS = 'SUCCESS',
  CANCEL = 'CANCEL',
}

export interface ReadinessCheck {
  start(runner: Runner, eventStream$: ServiceDockerEventStream): Promise<ReadinessCheckResult>
  cancel(): void
}

a generic command readiness check could look like this:

import sleep from '../../utils/sleep'
import execaWrapper from '../../utils/execaWrapper'
import { Runner } from '../../runners/@types'

export class ShellCommandReadinessCheck implements ReadinessCheck {
  private command: string
  private runInsideContainer: boolean
  private timeout: number
  private isAborted: boolean

  constructor({
    command,
    runInsideContainer = false,
    timeout = 30,
  }: {
    command: string
    runInsideContainer?: boolean
    timeout: number
  }) {
    this.command = command
    this.runInsideContainer = runInsideContainer
    this.timeout = timeout
    this.isAborted = false
  }

  private getCommand(containerId: string) {
    if (!this.runInsideContainer) {
      return this.command
    }
    return `docker exec ${containerId} ${this.command}`
  }

  public async start(runner: Runner) {
    let remainingTries = this.timeout
    while (remainingTries > 0) {
      if (this.isAborted) return ReadinessCheckResult.ABORT
      try {
        await execaWrapper(this.getCommand(runner.containerId), runner)
        return ReadinessCheckResult.SUCCESS
      } catch (err) {
        remainingTries--
        await sleep(1000)
      }
    }

    return ReadinessCheckResult.TIMEOUT
  }

  public cancel() {
    this.isAborted = true
  }
}

This would be a readiness check that waits until a healthy event is fired

import { race, timer, Subject } from 'rxjs'
import { mapTo, first, take, tap, takeUntil } from 'rxjs/operators'

import { ReadinessCheck, ReadinessCheckResult } from './@types'
import { Runner } from '../runners/@types'
import { ServiceDockerEventStream } from '../onRun/createServiceDockerEventStream'

export class ContainerIsHealthyReadinessCheck implements ReadinessCheck {
  private timeout: number

  constructor({ timeout = 500 }: { timeout?: number }) {
    this.timeout = timeout
  }

  public start(_runner: Runner, eventStream$: ServiceDockerEventStream) {
    const stop$ = new Subject()

    const healthyEventEmitted$ = eventStream$.pipe(
      takeUntil(stop$),
      first(event => event.action === 'health_status' && event.attributes.healthStatus === 'healthy'),
      mapTo(ReadinessCheckResult.SUCCESS),
    )

    const timeout$ = timer(this.timeout).pipe(
      takeUntil(stop$),
      mapTo(ReadinessCheckResult.TIMEOUT),
    )

    const cancelSubject = new Subject()
    this.cancel = () => {
      cancelSubject.next()
      cancelSubject.complete()
    }
    const cancel$ = cancelSubject.pipe(mapTo(ReadinessCheckResult.CANCEL))

    return race(healthyEventEmitted$, timeout$, cancel$).pipe(
      take(1),
      tap({
        complete: () => {
          stop$.next()
          stop$.complete()
        },
      }),
    ).toPromise()
  }

  public cancel() {
    throw new Error('Cannot cancel before started.')
  }
}

n1ru4l added a commit to n1ru4l/dockest that referenced this issue Feb 3, 2020
Adds a docker event bus that can be used for improving the dockest experience. It is currently used in the following ways: Wait for container start before trying to resolve the container id (which solves possible timeouts due to first downloading the image) and stop resolving the container id in case the container unexpectedly dies. This results in less flakiness. In the future the readiness check API could be based upon docker events (see erikengervall#85 (comment)).
n1ru4l added a commit to n1ru4l/dockest that referenced this issue Feb 3, 2020
Adds a docker event bus that can be used for improving the dockest experience. It is currently used in the following ways: Wait for container start before trying to resolve the container id (which solves possible timeouts due to first downloading the image) and stop resolving the container id in case the container unexpectedly dies. This results in less flakiness. In the future the readiness check API could be based upon docker events (see erikengervall#85 (comment)).
n1ru4l added a commit to n1ru4l/dockest that referenced this issue Feb 3, 2020
Adds a docker event bus that can be used for improving the dockest experience. It is currently used in the following ways: Wait for container start before trying to resolve the container id (which solves possible timeouts due to first downloading the image) and stop resolving the container id in case the container unexpectedly dies. This results in less flakiness. In the future the readiness check API could be based upon docker events (see erikengervall#85 (comment)).
n1ru4l added a commit to n1ru4l/dockest that referenced this issue Feb 3, 2020
Adds a docker event bus that can be used for improving the dockest experience. It is currently used in the following ways: Wait for container start before trying to resolve the container id (which solves possible timeouts due to first downloading the image) and stop resolving the container id in case the container unexpectedly dies. This results in less flakiness. In the future the readiness check API could be based upon docker events (see erikengervall#85 (comment)).
n1ru4l added a commit to n1ru4l/dockest that referenced this issue Feb 3, 2020
Adds a docker event bus that can be used for improving the dockest experience. It is currently used in the following ways: Wait for container start before trying to resolve the container id (which solves possible timeouts due to first downloading the image) and stop resolving the container id in case the container unexpectedly dies. This results in less flakiness. In the future the readiness check API could be based upon docker events (see erikengervall#85 (comment)).
n1ru4l added a commit to n1ru4l/dockest that referenced this issue Feb 3, 2020
Adds a docker event bus that can be used for improving the dockest experience. It is currently used in the following ways: Wait for container start before trying to resolve the container id (which solves possible timeouts due to first downloading the image) and stop resolving the container id in case the container unexpectedly dies. This results in less flakiness. In the future the readiness check API could be based upon docker events (see erikengervall#85 (comment)).
n1ru4l added a commit to n1ru4l/dockest that referenced this issue Feb 3, 2020
Adds a docker event bus that can be used for improving the dockest experience. It is currently used in the following ways: Wait for container start before trying to resolve the container id (which solves possible timeouts due to first downloading the image) and stop resolving the container id in case the container unexpectedly dies. This results in less flakiness. In the future the readiness check API could be based upon docker events (see erikengervall#85 (comment)).
erikengervall added a commit that referenced this issue Feb 12, 2020
* feat: use docker events for better reliability.

Adds a docker event bus that can be used for improving the dockest experience. It is currently used in the following ways: Wait for container start before trying to resolve the container id (which solves possible timeouts due to first downloading the image) and stop resolving the container id in case the container unexpectedly dies. This results in less flakiness. In the future the readiness check API could be based upon docker events (see #85 (comment)).

* update snapshots

* implicitly return undefined right away

* resolve container id via docker events

* abort check connection in case the container dies

* abort healthcheck in case container unexpectedly dies.

* add healthcheck for checking whether the health_status event is emitted.

* apply review feedback

* add more tests

Co-authored-by: Erik Engervall <erik.engervall@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants