Skip to content
This repository was archived by the owner on Oct 17, 2022. It is now read-only.

Commit a071134

Browse files
authored
Merge branch 'master' into ddocID
2 parents 2bea1dc + 9653c39 commit a071134

File tree

26 files changed

+705
-40
lines changed

26 files changed

+705
-40
lines changed

.github/PULL_REQUEST_TEMPLATE.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,5 @@
3636

3737
## Checklist
3838

39-
- [ ] Documentation is written and is accurate;
40-
- [ ] `make check` passes with no errors
4139
- [ ] Update [rebar.config.script](https://github.com/apache/couchdb/blob/master/rebar.config.script) with the commit hash once this PR is rebased and merged
40+
<!-- Before opening the PR, consider running `make check` locally for a faster turnaround time -->

rfcs/006-mango-fdb.md

Lines changed: 149 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,149 @@
1+
# Mango RFC
2+
3+
---
4+
5+
name: Formal RFC
6+
about: Submit a formal Request For Comments for consideration by the team.
7+
title: ‘Mango JSON indexes in FoundationDB’
8+
labels: rfc, discussion
9+
assignees: ‘’
10+
11+
---
12+
13+
[note]: # " ^^ Provide a general summary of the RFC in the title above. ^^ "
14+
15+
# Introduction
16+
17+
This document describes the data model, querying and indexing management for Mango JSON indexes with FoundationDB.
18+
19+
## Abstract
20+
21+
This document details the data model for storing Mango indexes. Indexes will be updated in the transaction that a document is written to FoundationDB. When an index is created on an existing database, a background task will build the index up to the Sequence that the index was created at.
22+
23+
## Requirements Language
24+
25+
[note]: # " Do not alter the section below. Follow its instructions. "
26+
27+
The keywords “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”,
28+
“SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this
29+
document are to be interpreted as described in
30+
[RFC 2119](https://www.rfc-editor.org/rfc/rfc2119.txt).
31+
32+
## Terminology
33+
34+
`Sequence`: a 13-byte value formed by combining the current `Incarnation` of the database and the `Versionstamp` of the transaction. Sequences are monotonically increasing even when a database is relocated across FoundationDB clusters. See (RFC002)[LINK TBD] for a full explanation.
35+
36+
---
37+
38+
# Detailed Description
39+
40+
Mango is a declarative JSON querying syntax that allows a user to retrieve documents based on a selector. Indexes can be defined to improve query performance. In CouchDB Mango is a query layer built on top of Map/Reduce indexes. Each Mango query follows a two-step process, first a subset of the selector is converted into a map query to be used with a predefined index or falling back to `_all_docs` if no indexes are available. Each document retrieved from the index is then matched against the full query selector.
41+
42+
With CouchDB on FoundationDB, all new created Mango indexes have the `interactive: true` option set. Thereby Mango indexes will be indexed in the same transaction that a document is add/updated to the database.
43+
44+
## Data Model
45+
46+
### Index Definitions
47+
48+
A Mango index is defined as:
49+
50+
```json
51+
{
52+
"name": "view-name",
53+
"index": {
54+
"fields": ["fieldA", "fieldB"]
55+
},
56+
"partial_filter_selector": {}
57+
}
58+
```
59+
60+
The above index definition would be converted into a map index that looks like this:
61+
62+
```json
63+
{
64+
"_id": "_design/ddoc",
65+
"language": "query",
66+
"views": {
67+
"view-name": {
68+
"map": {
69+
"fields": [{ "fieldA": "asc" }, { "fieldB": "asc" }],
70+
"selector": {}
71+
}
72+
}
73+
},
74+
"options": [{ "autoupdate": false }, { "interactive": true }]
75+
}
76+
```
77+
78+
- `{"autoupdate": false}` means that the index will not be auto updated in the background
79+
- `{"interactive": true}` configures the index to be updated in the document update transaction
80+
81+
### Index Definition
82+
83+
Mango indexes are a layer on top of map indexes. So the index definition is the same as the map index definition.
84+
85+
### Index Limits
86+
87+
This design has certain defined limits for it to work correctly:
88+
89+
- The index definition (`name`, `fields` and `partial_filter_selector`) cannot exceed 64 KB FDB value limit
90+
- The sorted keys for an index cannot exceed the 8 KB key limit
91+
- To be able to update the index in the transaction that a document is updated in, there will have to be a limit on the number of Mango indexes for a database so that the transaction stays within the 10MB transaction limit. This limit is still TBD based on testing.
92+
93+
## Index building and management
94+
95+
When an index is created on an existing database, the index will be updated in a background job up to the versionstamp that the index was added to the database at. The process for building a new index would be:
96+
97+
1. Save index to the database, along with a creation versionstamp and set the index status to `building` so that is it not used to service any queries until it is updated. Add a job to `couch_jobs` to build the index.
98+
2. Any write requests (document updates) after the saved index definition will update the index in the document update. Index writers can assume that previous versions of the document have already been indexed.
99+
3. `couch_jobs` will start reading sections of the changes feed and building the index, this background process will keep processing the changes read until it reaches the creation versionstamp. Once it reaches that point, the index is up to date and `build_status` will be marked as `active` and the index can be used to service queries.
100+
4. There is some subtle behavior around step 3 that is worth mentioning. The background process will have the 5-second transaction limit, so it will process smaller parts of the changes feed. Which means that it won’t have one consistent view of the changes feed throughout the index building process. This will lead to a conflict situation when the background process transaction is adding a document to the index while at the same time a write request has a transaction that is updating the same document. There are two possible outcomes to this, if the background process wins, the write request will get a conflict. At that point the write request will try to process the document again, read the old values for that document, remove them from the index and add the new values to the index. If the write request wins, and the background process gets a conflict, then the background process can try again, the document would have been removed from its old position in the changes feed and moved to the later position, so the background process won’t see the document and will then move on to the next one.
101+
102+
## Advantages
103+
104+
- Indexes are kept up to date when documents are changed, meaning you can read your own writes
105+
- Makes Mango indexes first-class citizens and opens up the opportunity to create more Mango specific functionality
106+
107+
## Disadvantages
108+
109+
- FoundationDB currently does not allow CouchDB to do the document selector matching at the shard level. However, there is a discussion for this [Feature Request: Predicate pushdown](https://forums.foundationdb.org/t/feature-request-predicate-pushdown/954)
110+
111+
## Key Changes
112+
113+
- Mango indexes will be stored separately to Map/Reduce indexes.
114+
- Mango Indexes will be updated when a document is updated
115+
- A background process will build a new Mango index on an existing database
116+
- There are specific index limits mentioned in the Index Limits section.
117+
118+
Index limitations aside, this design preserves all of the existing API options
119+
for working with CouchDB documents.
120+
121+
## Applications and Modules affected
122+
123+
The `mango` application will be modified to work with FoundationDB
124+
125+
## HTTP API additions
126+
127+
When querying any of the `_index` endpoints an extra field, `build_status`, will be added to the index definition.
128+
The `build_status` will either be `building` or `active`.
129+
130+
## HTTP API deprecations
131+
132+
None,
133+
134+
# Security Considerations
135+
136+
None have been identified.
137+
138+
# References
139+
140+
[Original mailing list discussion](https://lists.apache.org/thread.html/b614d41b72d98c7418aa42e5aa8e3b56f9cf1061761f912cf67b738a@%3Cdev.couchdb.apache.org%3E)
141+
142+
# Acknowledgements
143+
144+
thanks to following in participating in the design discussion
145+
146+
- @kocolosk
147+
- @willholley
148+
- @janl
149+
- @alexmiller-apple
Lines changed: 131 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,131 @@
1+
---
2+
name: Formal RFC
3+
about: Submit a formal Request For Comments for consideration by the team.
4+
title: 'Background index building'
5+
labels: rfc, discussion
6+
assignees: ''
7+
8+
---
9+
10+
# Introduction
11+
12+
This document describes the design for the background index builder in CouchDB 4.
13+
14+
## Abstract
15+
16+
Background index builder monitors databases for changes and then kicks off
17+
asynchronous index updates. It is also responsible for removing stale indexing
18+
data.
19+
20+
## Requirements Language
21+
22+
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD",
23+
"SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be
24+
interpreted as described in [RFC
25+
2119](https://www.rfc-editor.org/rfc/rfc2119.txt).
26+
27+
---
28+
29+
# Detailed Description
30+
31+
The two main components of the background index builder are:
32+
1) The notification mechanism
33+
2) Index building behavior API and registration facility
34+
35+
The notification mechanism monitors databases for updates and the secondary
36+
index applications register with the background indexer and provide an
37+
implementation of the index building API.
38+
39+
## Database Updates Notifications
40+
41+
After each document update transaction finishes, the background indexer is
42+
notified via a callback. The indexer then bumps the timestamp for that database
43+
in a set of sharded ETS tables. Each sharded ETS table has an associated
44+
background process which periodically removes entries from there and calls the
45+
index building API functions for each registered indexing backend.
46+
47+
In addition to buiding indices, the background index builder also cleanups up
48+
stale index data. This is index data left behind after design documents have
49+
been updated or deleted and the view signatures changed.
50+
51+
Background index building and cleaning may be enabled or disabled with
52+
configuration options. There is also a configurable delay during which db
53+
updates would accumulate for each database. This is used to avoid re-scheduling
54+
`couch_jobs` too often.
55+
56+
## Background Index Building Behavior
57+
58+
Unlike CouchDB 3 (`ken`), the background index builder in CouchDB 4 doesn't
59+
have centralized knowledge of all the possible secondary indices. Instead, each
60+
secondary indexing application may register with the background index builder
61+
and provide a set of callbacks implementing background index building for their
62+
particular index types.
63+
64+
65+
Background index building behavior is a standard Erlang/OTP behavior defined
66+
as:
67+
68+
```
69+
-callback build_indices(Db :: map(), DDocs :: list(#doc{})) ->
70+
[{ok, JobId::binary()} | {error, any()}].
71+
72+
-callback cleanup_indices(Db :: map(), DDocs :: list(#doc{})) ->
73+
[ok | {error, any()}].
74+
```
75+
76+
Each indexing application, may register with the index builder by using
77+
`fabric2_index:register(Module)` function. When it registers, it must provide
78+
an implementation of that behavior in that module.
79+
80+
* `build_indices/2`: must inspect all the passed in design doc bodies and
81+
trigger asynchronous index updates for the all views that module is responsible
82+
for.
83+
84+
*`cleanup_indices/2`: must clean up all the stale indexing data associated
85+
with all the views in the design docs passed in as an argument.
86+
87+
# Advantages and Disadvantages
88+
89+
* Main advantage is simplicity. Rely on node-local updates and the fact that
90+
all indexing is currently backed by `couch_jobs` jobs, which handle global
91+
locking and coordination.
92+
93+
* Main disadvantage is also simplicity. There is no concept of priority to
94+
allow users to build some indices before others.
95+
96+
# Key Changes
97+
98+
Configuration format has changed. Instead of configuring background index
99+
building in the `[ken]` section, it is now configured in the `[fabric]` config
100+
section. Otherwise there are no external API changes.
101+
102+
## Applications and Modules affected
103+
104+
* fabric2_index
105+
* fabric2_db
106+
* couch_views
107+
108+
## HTTP API additions
109+
110+
N/A
111+
112+
## HTTP API deprecations
113+
114+
N/A
115+
116+
# Security Considerations
117+
118+
None
119+
120+
# References
121+
122+
[fabric2_index](https://github.com/apache/couchdb/blob/prototype/fdb-layer/src/fabric/src/fabric2_index.erl)
123+
[ken](https://github.com/apache/couchdb/tree/master/src/ken)
124+
125+
# Co-authors
126+
127+
* @davisp
128+
129+
# Acknowledgements
130+
131+
* @davisp

src/api/database/changes.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -391,6 +391,8 @@ to simplify the job of the client - each line of the response is either empty
391391
or a JSON object representing a single change, as found in the normal feed's
392392
results.
393393
394+
If `limit` has been specified the feed will end with a `{ last_seq }` object.
395+
394396
.. code-block:: http
395397
396398
GET /somedatabase/_changes?feed=continuous HTTP/1.1

src/api/partitioned-dbs.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -199,7 +199,7 @@ See the guide for
199199
``/db/_partition/partition_id/_find``
200200
=====================================
201201

202-
.. http:get:: /{db}/_partition/{partition_id}/_find
202+
.. http:post:: /{db}/_partition/{partition_id}/_find
203203
:synopsis: Query the partition specified by ``partition_id``
204204

205205
:param db: Database name
@@ -218,7 +218,7 @@ See the guide for
218218
``/db/_partition/partition_id/_explain``
219219
========================================
220220

221-
.. http:get:: /{db}/_partition/{partition_id}/_explain
221+
.. http:post:: /{db}/_partition/{partition_id}/_explain
222222
:synopsis: Find index that is used with a query
223223

224224
:param db: Database name

0 commit comments

Comments
 (0)