|
| 1 | +--- |
| 2 | +name: Formal RFC |
| 3 | +about: Submit a formal Request For Comments for consideration by the team. |
| 4 | +title: 'Background index building' |
| 5 | +labels: rfc, discussion |
| 6 | +assignees: '' |
| 7 | + |
| 8 | +--- |
| 9 | + |
| 10 | +# Introduction |
| 11 | + |
| 12 | +This document describes the design for the background index builder in CouchDB 4. |
| 13 | + |
| 14 | +## Abstract |
| 15 | + |
| 16 | +Background index builder monitors databases for changes and then kicks off |
| 17 | +asynchronous index updates. It is also responsible for removing stale indexing |
| 18 | +data. |
| 19 | + |
| 20 | +## Requirements Language |
| 21 | + |
| 22 | +The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", |
| 23 | +"SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be |
| 24 | +interpreted as described in [RFC |
| 25 | +2119](https://www.rfc-editor.org/rfc/rfc2119.txt). |
| 26 | + |
| 27 | +--- |
| 28 | + |
| 29 | +# Detailed Description |
| 30 | + |
| 31 | +The two main components of the background index builder are: |
| 32 | + 1) The notification mechanism |
| 33 | + 2) Index building behavior API and registration facility |
| 34 | + |
| 35 | +The notification mechanism monitors databases for updates and the secondary |
| 36 | +index applications register with the background indexer and provide an |
| 37 | +implementation of the index building API. |
| 38 | + |
| 39 | +## Database Updates Notifications |
| 40 | + |
| 41 | +After each document update transaction finishes, the background indexer is |
| 42 | +notified via a callback. The indexer then bumps the timestamp for that database |
| 43 | +in a set of sharded ETS tables. Each sharded ETS table has an associated |
| 44 | +background process which periodically removes entries from there and calls the |
| 45 | +index building API functions for each registered indexing backend. |
| 46 | + |
| 47 | +In addition to buiding indices, the background index builder also cleanups up |
| 48 | +stale index data. This is index data left behind after design documents have |
| 49 | +been updated or deleted and the view signatures changed. |
| 50 | + |
| 51 | +Background index building and cleaning may be enabled or disabled with |
| 52 | +configuration options. There is also a configurable delay during which db |
| 53 | +updates would accumulate for each database. This is used to avoid re-scheduling |
| 54 | +`couch_jobs` too often. |
| 55 | + |
| 56 | +## Background Index Building Behavior |
| 57 | + |
| 58 | +Unlike CouchDB 3 (`ken`), the background index builder in CouchDB 4 doesn't |
| 59 | +have centralized knowledge of all the possible secondary indices. Instead, each |
| 60 | +secondary indexing application may register with the background index builder |
| 61 | +and provide a set of callbacks implementing background index building for their |
| 62 | +particular index types. |
| 63 | + |
| 64 | + |
| 65 | +Background index building behavior is a standard Erlang/OTP behavior defined |
| 66 | +as: |
| 67 | + |
| 68 | +``` |
| 69 | +-callback build_indices(Db :: map(), DDocs :: list(#doc{})) -> |
| 70 | + [{ok, JobId::binary()} | {error, any()}]. |
| 71 | +
|
| 72 | +-callback cleanup_indices(Db :: map(), DDocs :: list(#doc{})) -> |
| 73 | + [ok | {error, any()}]. |
| 74 | +``` |
| 75 | + |
| 76 | +Each indexing application, may register with the index builder by using |
| 77 | +`fabric2_index:register(Module)` function. When it registers, it must provide |
| 78 | +an implementation of that behavior in that module. |
| 79 | + |
| 80 | + * `build_indices/2`: must inspect all the passed in design doc bodies and |
| 81 | +trigger asynchronous index updates for the all views that module is responsible |
| 82 | +for. |
| 83 | + |
| 84 | + *`cleanup_indices/2`: must clean up all the stale indexing data associated |
| 85 | +with all the views in the design docs passed in as an argument. |
| 86 | + |
| 87 | +# Advantages and Disadvantages |
| 88 | + |
| 89 | + * Main advantage is simplicity. Rely on node-local updates and the fact that |
| 90 | + all indexing is currently backed by `couch_jobs` jobs, which handle global |
| 91 | + locking and coordination. |
| 92 | + |
| 93 | + * Main disadvantage is also simplicity. There is no concept of priority to |
| 94 | + allow users to build some indices before others. |
| 95 | + |
| 96 | +# Key Changes |
| 97 | + |
| 98 | +Configuration format has changed. Instead of configuring background index |
| 99 | +building in the `[ken]` section, it is now configured in the `[fabric]` config |
| 100 | +section. Otherwise there are no external API changes. |
| 101 | + |
| 102 | +## Applications and Modules affected |
| 103 | + |
| 104 | + * fabric2_index |
| 105 | + * fabric2_db |
| 106 | + * couch_views |
| 107 | + |
| 108 | +## HTTP API additions |
| 109 | + |
| 110 | +N/A |
| 111 | + |
| 112 | +## HTTP API deprecations |
| 113 | + |
| 114 | +N/A |
| 115 | + |
| 116 | +# Security Considerations |
| 117 | + |
| 118 | +None |
| 119 | + |
| 120 | +# References |
| 121 | + |
| 122 | +[fabric2_index](https://github.com/apache/couchdb/blob/prototype/fdb-layer/src/fabric/src/fabric2_index.erl) |
| 123 | +[ken](https://github.com/apache/couchdb/tree/master/src/ken) |
| 124 | + |
| 125 | +# Co-authors |
| 126 | + |
| 127 | + * @davisp |
| 128 | + |
| 129 | +# Acknowledgements |
| 130 | + |
| 131 | + * @davisp |
0 commit comments