You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Oct 17, 2022. It is now read-only.
`Sequence`: a 13-byte value formed by combining the current `Incarnation` of the database and the `Versionstamp` of the transaction. Sequences are monotonically increasing even when a database is relocated across FoundationDB clusters. See (RFC002)[LINK TBD] for a full explanation.
34
-
- - - -
35
+
36
+
---
35
37
36
38
# Detailed Description
37
39
38
-
Mango is a declarative JSON querying syntax that allows a user to retrieve documents based on a selector. Indexes can be defined to improve query performance. In CouchDB 2.x Mango is a query layer built on top of Map/Reduce indexes. Each Mango query follows a two-step process, first a subset of the selector is converted into a map query to be used with a predefined index or falling back to `_all_docs` if no indexes are available. Each document retrieved from the index is then matched against the full query selector.
40
+
Mango is a declarative JSON querying syntax that allows a user to retrieve documents based on a selector. Indexes can be defined to improve query performance. In CouchDB Mango is a query layer built on top of Map/Reduce indexes. Each Mango query follows a two-step process, first a subset of the selector is converted into a map query to be used with a predefined index or falling back to `_all_docs` if no indexes are available. Each document retrieved from the index is then matched against the full query selector.
39
41
40
-
For CouchDB on FoundationDB, the external behavior of Mango will remain the same but internally it will have its own indexes and index management. This will allow for Mango indexes to be updated in the same transaction where a write request happens - index on write. Later we can also look at adding Mango specific functionality.
42
+
With CouchDB on FoundationDB, all new created Mango indexes have the `interactive: true` option set. Thereby Mango indexes will be indexed in the same transaction that a document is add/updated to the database.
41
43
42
44
## Data Model
43
45
@@ -47,79 +49,71 @@ A Mango index is defined as:
47
49
48
50
```json
49
51
{
50
-
name: ‘view-name’ - optional, will be auto-generated
51
-
index: {
52
-
fields: [‘fieldA’, ‘fieldB’] - fields to be indexed
52
+
"name": "view-name",
53
+
"index": {
54
+
"fields": ["fieldA", "fieldB"]
53
55
},
54
-
partial_filter_selector {} - optional filter to process documents before adding to the index
56
+
"partial_filter_selector": {}
55
57
}
56
58
```
57
59
58
-
The above index definition would be stored in FoundationDB as:
*`active` which indicates the index is ready to service queries
66
-
*`building` if the index is still being built
67
-
68
-
*`sequence` is the sequence that the index is created at.
69
-
*`index_version` will be used to version each index to allow for easier upgrades if we make any changes to the index design later
70
-
*`?BUILD_INFO` is the sequence the background build has built the index up to and the number of rows in the index.
71
-
72
-
### Indexes
73
-
74
-
Each index defined in the Index Definition would have an index keyspace where the database’s documents are stored and sorted via the keys defined in the index’s definition. The data model for each defined index would be:
-`{"autoupdate": false}` means that the index will not be auto updated in the background
79
+
-`{"interactive": true}` configures the index to be updated in the document update transaction
85
80
86
-
In CouchDB 2.x ICU collation is used to sort string key’s when added to the index’s b-tree. The current way of using ICU string collation won’t work with FoundationDB. To resolve this strings will be converted to an ICU sort string before being stored in FDB. This is an extra performance overhead but will only be done when one when writing a key into the index.
81
+
### Index Definition
87
82
88
-
CouchDB has a defined [index collation specification](http://docs.couchdb.org/en/stable/ddocs/views/collation.html#collation-specification) that the new Mango design must adhere to. Each key added to a Mango index will be converted into a composite key or tuple with the first value in the tuple representing the type that the key so that it would be sorted correctly. The `couch_views_encoding` library will be used to encode all fields correctly.
83
+
Mango indexes are a layer on top of map indexes. So the index definition is the same as the map index definition.
89
84
90
85
### Index Limits
91
86
92
87
This design has certain defined limits for it to work correctly:
93
88
94
-
* The index definition (`name`, `fields` and `partial_filter_selector`) cannot exceed 100 KB FDB value limit
95
-
* The sorted keys for an index cannot exceed the 8 KB key limit
96
-
* To be able to update the index in the transaction that a document is updated in, there will have to be a limit on the number of Mango indexes for a database so that the transaction stays within the 10MB transaction limit. This limit is still TBD based on testing.
89
+
- The index definition (`name`, `fields` and `partial_filter_selector`) cannot exceed 64 KB FDB value limit
90
+
- The sorted keys for an index cannot exceed the 8 KB key limit
91
+
- To be able to update the index in the transaction that a document is updated in, there will have to be a limit on the number of Mango indexes for a database so that the transaction stays within the 10MB transaction limit. This limit is still TBD based on testing.
97
92
98
93
## Index building and management
99
94
100
-
When an index is created on an existing database, the index will need to be built for all existing documents in the database. The process for building a new index would be:
95
+
When an index is created on an existing database, the index will be updated in a background job up to the versionstamp that the index was added to the database at. The process for building a new index would be:
101
96
102
-
1. When a user defines a new index on an existing database, save the index definition along with the `sequence` the index was added at and set the `build_status` to `building` so it won’t be used to service queries.
103
-
2. Any write requests (document updates) after the saved index definition will update the index with the document update. Index writers should assume that previous versions of the document have already been indexed.
104
-
3. At the same time a background process via `couch_jobs` will start reading sections of the changes feed and building the index, this background process will keep processing the changes read until it reaches the sequence number that the index was saved at. Once it reaches that point, the index is up to date and `build_status` will be marked as `active` and the index can be used to service queries.
105
-
4. There is some subtle behavior around step 3 that is worth mentioning. The background process will have the 5-second transaction limit, so it will process smaller parts of the changes feed. Which means that it won’t have one consistent view of the changes feed throughout the index building process. This will lead to a conflict situation when the background process transaction is adding a document to the index while at the same time a write request has a transaction that is updating the same document. There are two possible outcomes to this, if the background process wins, the write request will get a conflict. At that point the write request will try to process the document again, read the old values for that document, remove them from the index and add the new values to the index. If the write request wins, and the background process gets a conflict, then the background process can try again, the document would have been removed from its old position in the changes feed and moved to the later position, so the background process won’t see the document and will then move on to the next one.
106
-
5. An index build progress tracker, `?BUILD_INFO` will also be added. This is used to track at what sequence the index has been updated to via the background build process. It will also store the number of rows in the index. This can be used for query planning.
97
+
1. Save index to the database, along with a creation versionstamp and set the index status to `building` so that is it not used to service any queries until it is updated. Add a job to `couch_jobs` to build the index.
98
+
2. Any write requests (document updates) after the saved index definition will update the index in the document update. Index writers can assume that previous versions of the document have already been indexed.
99
+
3.`couch_jobs` will start reading sections of the changes feed and building the index, this background process will keep processing the changes read until it reaches the creation versionstamp. Once it reaches that point, the index is up to date and `build_status` will be marked as `active` and the index can be used to service queries.
100
+
4. There is some subtle behavior around step 3 that is worth mentioning. The background process will have the 5-second transaction limit, so it will process smaller parts of the changes feed. Which means that it won’t have one consistent view of the changes feed throughout the index building process. This will lead to a conflict situation when the background process transaction is adding a document to the index while at the same time a write request has a transaction that is updating the same document. There are two possible outcomes to this, if the background process wins, the write request will get a conflict. At that point the write request will try to process the document again, read the old values for that document, remove them from the index and add the new values to the index. If the write request wins, and the background process gets a conflict, then the background process can try again, the document would have been removed from its old position in the changes feed and moved to the later position, so the background process won’t see the document and will then move on to the next one.
107
101
108
102
## Advantages
109
103
110
-
* Indexes are kept up to date when documents are changed, meaning you can read your own writes
111
-
* Makes Mango indexes first-class citizens and opens up the opportunity to create more Mango specific functionality
104
+
- Indexes are kept up to date when documents are changed, meaning you can read your own writes
105
+
- Makes Mango indexes first-class citizens and opens up the opportunity to create more Mango specific functionality
112
106
113
107
## Disadvantages
114
108
115
-
* FoundationDB currently does not allow CouchDB to do the document selector matching at the shard level. However, there is a discussion for this [Feature Request: Predicate pushdown](https://forums.foundationdb.org/t/feature-request-predicate-pushdown/954)
109
+
- FoundationDB currently does not allow CouchDB to do the document selector matching at the shard level. However, there is a discussion for this [Feature Request: Predicate pushdown](https://forums.foundationdb.org/t/feature-request-predicate-pushdown/954)
116
110
117
111
## Key Changes
118
112
119
-
* Mango indexes will be stored separately to Map/Reduce indexes.
120
-
* Mango Indexes will be updated when a document is updated
121
-
* A background process will build a new Mango index on an existing database
122
-
* There are specific index limits mentioned in the Index Limits section.
113
+
- Mango indexes will be stored separately to Map/Reduce indexes.
114
+
- Mango Indexes will be updated when a document is updated
115
+
- A background process will build a new Mango index on an existing database
116
+
- There are specific index limits mentioned in the Index Limits section.
123
117
124
118
Index limitations aside, this design preserves all of the existing API options
125
119
for working with CouchDB documents.
@@ -128,10 +122,10 @@ for working with CouchDB documents.
128
122
129
123
The `mango` application will be modified to work with FoundationDB
130
124
131
-
132
125
## HTTP API additions
133
126
134
-
None.
127
+
When querying any of the `_index` endpoints an extra field, `build_status`, will be added to the index definition.
128
+
The `build_status` will either be `building` or `active`.
135
129
136
130
## HTTP API deprecations
137
131
@@ -149,7 +143,7 @@ None have been identified.
149
143
150
144
thanks to following in participating in the design discussion
0 commit comments