Skip to content

Commit

Permalink
Refactor update based on review feedback
Browse files Browse the repository at this point in the history
  • Loading branch information
banks committed Mar 13, 2023
1 parent 00ca421 commit 2e3c67c
Show file tree
Hide file tree
Showing 2 changed files with 22 additions and 18 deletions.
15 changes: 12 additions & 3 deletions website/content/docs/agent/wal-logstore/enable.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ description: >-

# Enable the experimental WAL LogStore backend

This topic describes how to safely configure and test the WAL backend in your Consul deployment.
This topic describes how to safely configure and test the WAL backend in your Consul deployment.

The overall process for enabling the WAL LogStore backend for one server consists of the following steps. In production environments, we recommend starting by enabling the backend on a single server . If you eventually choose to expand the test to further servers, you must repeat these steps for each one.

Expand All @@ -17,9 +17,9 @@ The overall process for enabling the WAL LogStore backend for one server consist
1. Remove data directory from target server.
1. Update target server's configuration.
1. Start the target server.
1. Monitor target server raft metrics and logs.
1. Monitor target server raft metrics and logs.

!> **Experimental feature:** The WAL LogStore backend is experimental.
!> **Experimental feature:** The WAL LogStore backend is experimental and may contain bugs that could cause data loss. Follow this guide to manage risk during testing.

## Requirements

Expand All @@ -32,6 +32,15 @@ We recommend taking the following additional measures:
- Monitor Consul server metrics and logs, and set an alert on specific log events that occur when WAL is enabled. Refer to [Monitor Raft metrics and logs for WAL](/consul/docs/agent/wal-logstore/monitoring) for more information.
- Enable WAL in a pre-production environment and run it for a several days before enabling it in production.

## Known issues

The following issues were discovered after release of Consul 1.15.1 and will be
fixed in a future patch release.

* A follower that is disconnected may be unable to catch up if it is using the WAL backend.
* Restoring user snapshots can break replication to WAL-enabled followers.
* Restoring user snapshots can cause a WAL-enabled leader to panic.

## Risks

While their likelihood remains low to very low, be aware of the following risks before implementing the WAL backend:
Expand Down
25 changes: 10 additions & 15 deletions website/content/docs/agent/wal-logstore/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -7,19 +7,22 @@ description: >-

# Experimental WAL LogStore backend overview

This topic provides an overview of the experimental WAL (write-ahead log) LogStore backend.
This topic provides an overview of the WAL (write-ahead log) LogStore backend.
The WAL backend is an experimental feature. Refer to
[Requirements](/consul/docs/agent/wal-logstore/enable#requirements) for
supported environments and known issues.

!> **Experimental feature:** The WAL LogStore backend is experimental.

!> **Known Issues:** Consul 1.15.0 and 1.15.1 have [known issues](#known-issues) so should not have WAL enabled in production.
We do not recommend enabling the WAL backend in production without following
[our guide for safe
testing](/consul/docs/agent/wal-logstore/enable).

## WAL versus BoltDB

WAL implements a traditional log with rotating, append-only log files. WAL resolves many issues with the existing `LogStore` provided by the BoltDB backend. The BoltDB `LogStore` is a copy-on-write BTree, which is not optimized for append-only, write-heavy workloads.
WAL implements a traditional log with rotating, append-only log files. WAL resolves many issues with the existing `LogStore` provided by the BoltDB backend. The BoltDB `LogStore` is a copy-on-write BTree, which is not optimized for append-only, write-heavy workloads.

### BoltDB storage scalability issues

The existing BoltDB log store inefficiently stores append-only logs to disk because it was designed as a full key-value database. It is a single file that only ever grows. Deleting the oldest logs, which Consul does regularly when it makes new snapshots of the state, leaves free space in the file. The free space must be tracked in a `freelist` so that BoltDB can reuse it on future writes. By contrast, a simple segmented log can delete the oldest log files from disk.
The existing BoltDB log store inefficiently stores append-only logs to disk because it was designed as a full key-value database. It is a single file that only ever grows. Deleting the oldest logs, which Consul does regularly when it makes new snapshots of the state, leaves free space in the file. The free space must be tracked in a `freelist` so that BoltDB can reuse it on future writes. By contrast, a simple segmented log can delete the oldest log files from disk.

A burst of writes at double or triple the normal volume can suddenly cause the log file to grow to several times its steady-state size. After Consul takes the next snapshot and truncates the oldest logs, the resulting file is mostly empty space.

Expand Down Expand Up @@ -47,12 +50,4 @@ The WAL backend has been tested thoroughly during development:

We are aware of how complex and critical disk-persistence is for your data.

We hope that many users at different scales will try WAL in their environments after upgrading to 1.15 or later and report success or failure so that we can confidently replace BoltDB as the default for new clusters in a future release.

## Known Issues

The following issues were discovered after release of Consul 1.15.1. Fixes should be available in the next patch release.

* A follower that is disconnected for a while may be unable to catch up if it is using the WAL backend.
* Restoring user snapshots can break replication to WAL-enabled followers.
* Restoring user snapshots can cause a WAL-enabled leader to panic.
We hope that many users at different scales will try WAL in their environments after upgrading to 1.15 or later and report success or failure so that we can confidently replace BoltDB as the default for new clusters in a future release.

0 comments on commit 2e3c67c

Please sign in to comment.