Skip to content
This repository was archived by the owner on Oct 17, 2022. It is now read-only.

Commit ecbd992

Browse files
authored
[RFC] Background index building in CouchDB 4 (#542)
This RFC describes how background index building works in CouchDB 4
1 parent cd4357f commit ecbd992

File tree

1 file changed

+131
-0
lines changed

1 file changed

+131
-0
lines changed
Lines changed: 131 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,131 @@
1+
---
2+
name: Formal RFC
3+
about: Submit a formal Request For Comments for consideration by the team.
4+
title: 'Background index building'
5+
labels: rfc, discussion
6+
assignees: ''
7+
8+
---
9+
10+
# Introduction
11+
12+
This document describes the design for the background index builder in CouchDB 4.
13+
14+
## Abstract
15+
16+
Background index builder monitors databases for changes and then kicks off
17+
asynchronous index updates. It is also responsible for removing stale indexing
18+
data.
19+
20+
## Requirements Language
21+
22+
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD",
23+
"SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be
24+
interpreted as described in [RFC
25+
2119](https://www.rfc-editor.org/rfc/rfc2119.txt).
26+
27+
---
28+
29+
# Detailed Description
30+
31+
The two main components of the background index builder are:
32+
1) The notification mechanism
33+
2) Index building behavior API and registration facility
34+
35+
The notification mechanism monitors databases for updates and the secondary
36+
index applications register with the background indexer and provide an
37+
implementation of the index building API.
38+
39+
## Database Updates Notifications
40+
41+
After each document update transaction finishes, the background indexer is
42+
notified via a callback. The indexer then bumps the timestamp for that database
43+
in a set of sharded ETS tables. Each sharded ETS table has an associated
44+
background process which periodically removes entries from there and calls the
45+
index building API functions for each registered indexing backend.
46+
47+
In addition to buiding indices, the background index builder also cleanups up
48+
stale index data. This is index data left behind after design documents have
49+
been updated or deleted and the view signatures changed.
50+
51+
Background index building and cleaning may be enabled or disabled with
52+
configuration options. There is also a configurable delay during which db
53+
updates would accumulate for each database. This is used to avoid re-scheduling
54+
`couch_jobs` too often.
55+
56+
## Background Index Building Behavior
57+
58+
Unlike CouchDB 3 (`ken`), the background index builder in CouchDB 4 doesn't
59+
have centralized knowledge of all the possible secondary indices. Instead, each
60+
secondary indexing application may register with the background index builder
61+
and provide a set of callbacks implementing background index building for their
62+
particular index types.
63+
64+
65+
Background index building behavior is a standard Erlang/OTP behavior defined
66+
as:
67+
68+
```
69+
-callback build_indices(Db :: map(), DDocs :: list(#doc{})) ->
70+
[{ok, JobId::binary()} | {error, any()}].
71+
72+
-callback cleanup_indices(Db :: map(), DDocs :: list(#doc{})) ->
73+
[ok | {error, any()}].
74+
```
75+
76+
Each indexing application, may register with the index builder by using
77+
`fabric2_index:register(Module)` function. When it registers, it must provide
78+
an implementation of that behavior in that module.
79+
80+
* `build_indices/2`: must inspect all the passed in design doc bodies and
81+
trigger asynchronous index updates for the all views that module is responsible
82+
for.
83+
84+
*`cleanup_indices/2`: must clean up all the stale indexing data associated
85+
with all the views in the design docs passed in as an argument.
86+
87+
# Advantages and Disadvantages
88+
89+
* Main advantage is simplicity. Rely on node-local updates and the fact that
90+
all indexing is currently backed by `couch_jobs` jobs, which handle global
91+
locking and coordination.
92+
93+
* Main disadvantage is also simplicity. There is no concept of priority to
94+
allow users to build some indices before others.
95+
96+
# Key Changes
97+
98+
Configuration format has changed. Instead of configuring background index
99+
building in the `[ken]` section, it is now configured in the `[fabric]` config
100+
section. Otherwise there are no external API changes.
101+
102+
## Applications and Modules affected
103+
104+
* fabric2_index
105+
* fabric2_db
106+
* couch_views
107+
108+
## HTTP API additions
109+
110+
N/A
111+
112+
## HTTP API deprecations
113+
114+
N/A
115+
116+
# Security Considerations
117+
118+
None
119+
120+
# References
121+
122+
[fabric2_index](https://github.com/apache/couchdb/blob/prototype/fdb-layer/src/fabric/src/fabric2_index.erl)
123+
[ken](https://github.com/apache/couchdb/tree/master/src/ken)
124+
125+
# Co-authors
126+
127+
* @davisp
128+
129+
# Acknowledgements
130+
131+
* @davisp

0 commit comments

Comments
 (0)