RFC for Mango on FDB #407

garrensmith · 2019-04-25T10:59:46Z

RFC for Mango indexes on FoundationDB

sansato · 2019-04-26T17:49:17Z

rfcs/006-mango-fdb.md

+\x50 Array
+\x60 Objects
+
+An example for a number key would be (\x30, 1). Just too note, Null and Boolean values won’t need to be composite keys as the type key is the value.


s/too note/to note

or simply Note: Null and Boolean...

Thanks. I updated it.

sansato · 2019-04-26T17:53:10Z

rfcs/006-mango-fdb.md

+This design has certain defined limits for it to work correctly:
+
+* The index definition (name, fields and partial_filter_selector) cannot exceed 100 KB FDB value limit
+* The sorted keys for an index cannot exceed the 10 KB key limit


"sorted keys" - is this the same as the keys emitted for indexing records?

Correct. They are.

ryanworl · 2019-04-30T22:15:07Z

rfcs/006-mango-fdb.md

+
+## Terminology
+
+`Sequence`: a 13 byte value formed by combining the current `Incarnation` of the database and the `Versionstamp` of the transaction. Sequences are monotonically increasing even when a database is relocated across FoundationDB clusters. See (RFC002)[LINK TBD]  for a full explanation.


You will probably be using the tuple encoding for the keys, and the Incarnation will therefore be a variable length integer that will be one byte the vast majority of the time (under values of 249 I believe), but will not be strictly speaking 13 bytes. And if you don't use the tuple encoding, I would worry about the potential limiting factor of only one byte. You may find yourself moving databases back and forth across clusters with some regularity depending on how you choose to deploy and load balance logical databases across physical FDB clusters.

Thanks @ryanworl, yes the incarnation will be a single byte. I just copied and pasted that section from here https://github.com/apache/couchdb-documentation/pull/397/files#diff-5a51b92701c50f4d70a06d3a85daf8e9R40, maybe I should be a little more explicit in what I mean there.

I would recommend not making it a fixed-length byte string of length one and instead use an integer. It takes three additional bytes to encode it as a fixed-length byte string in the tuple encoding if the Incarnation is zero and two additional bytes otherwise.

Ah thanks @ryanworl, I'll mention that in the other RFC.

@ryanworl the motivation behind the use of a fixed-length byte string was to ensure the sortability of Sequences in their external hexadecimal string representation in CouchDB's HTTP API. @davisp reminded me that the variable-length integer encoding in the tuple layer does generate bytes that always sort correctly (of course), so I can see where this is a good improvement. Thanks for pointing it out.

janl · 2019-05-10T10:30:16Z

rfcs/006-mango-fdb.md

+
+* The index definition (name, fields and partial_filter_selector) cannot exceed 100 KB FDB value limit
+* The sorted keys for an index cannot exceed the 10 KB key limit
+* To be able to update the index in the transaction that a document is updated in, there will have to be a limit on number of Mango indexes for a database so that the transaction stays within the 10MB transaction limit. This limit is still TBD based on testing.


calculation help from @alexmiller-apple https://lists.apache.org/thread.html/4976e0b7e3df89c5d64f37b5299b04c2ed01088f357be8aceaeedec1@%3Cdev.couchdb.apache.org%3E

janl · 2019-05-10T10:32:28Z

rfcs/006-mango-fdb.md

+3. At the same time a background process will start reading sections of the changes feed and building the index, this background process will keep processing the changes read until it reaches the sequence number that the index was saved at. Once it reaches that point, the index is up to date and `build_status` will be marked as `active` and the index will be used to service queries.
+4. There are some subtle behaviour around step 3 that is worth mentioning. The background process will have the 5 second transaction limit, so it will process smaller parts of the changes feed. Which means that it won’t have one consistent view of the changes feed throughout the index building process. This will lead to a conflict situation when the background process transaction is adding a document to the index while at the same time a write request has a transaction that is updating the same document. There are two possible outcomes to this, if the background process wins, the write request will get a conflict. At that point the write request will try to process the document again, read the old values for that document, remove them from the index and add the new values to the index. If the write request wins, and the background process gets a conflict, then the background process can try again, the document would have been removed from its old position in the changes feed and moved to the later position, so the background process won’t see the document and will then move on to the next one. 
+5. An index progress tracker will also be added. This will use `doc_count` for the database, and then have a counter value that the background workers can increment with the number of documents it updated for each batch update.  It would also be updated on write requests while the index is in building mode.
+6. Some thing to explore is splitting the building of the index across multiple worker, it should be possible to use the [`get_boundary_keys` ](https://apple.github.io/foundationdb/api-python.html?highlight=boundary_keys#fdb.locality.fdb.locality.get_boundary_keys) api call on the changes feed to get the full list of changes feed keys grouped by partition boundaries and then split that by workers.


Suggested change

6. Some thing to explore is splitting the building of the index across multiple worker, it should be possible to use the [`get_boundary_keys` ](https://apple.github.io/foundationdb/api-python.html?highlight=boundary_keys#fdb.locality.fdb.locality.get_boundary_keys) api call on the changes feed to get the full list of changes feed keys grouped by partition boundaries and then split that by workers.

6. Something to explore is splitting the building of the index across multiple worker, it should be possible to use the [`get_boundary_keys` ](https://apple.github.io/foundationdb/api-python.html?highlight=boundary_keys#fdb.locality.fdb.locality.get_boundary_keys) api call on the changes feed to get the full list of changes feed keys grouped by partition boundaries and then split that by workers.

janl · 2019-05-10T10:38:25Z

rfcs/006-mango-fdb.md

+  },
+  partial_filter_selector {} - optional filter to process documents before adding to the index
+}
+```


Format trick:

{ name: "view-name" // optional will be auto-generated index: { fields: ["fieldA", "fieldB"] // fields to be indexed }, partial_filter_selector: {} // optional filter to process documents before adding to the index }

tonysun83 · 2019-10-25T17:06:47Z

rfcs/006-mango-fdb.md

+
+1. When a user defines a new index on an existing database, save the index definition along with the `sequence`  the index was added at and set the `build_status` to `building`  so it won’t be used to service queries. 
+2. Any write requests (document updates) after that must read the new index definition and update the index. When updating the new index, the index writers should assume that previous versions of the document have already been indexed.
+3. At the same time a background process will start reading sections of the changes feed and building the index, this background process will keep processing the changes read until it reaches the sequence number that the index was saved at. Once it reaches that point, the index is up to date and `build_status` will be marked as `active` and the index will be used to service queries.


revisiting: @garrensmith this background process will now likely be a couch_jobs processs right??

tonysun83 · 2019-10-25T19:22:10Z

rfcs/006-mango-fdb.md

+When an index is created on an existing database, the index will need to be built for all existing documents in the database. The process for building a new index would be:
+
+1. When a user defines a new index on an existing database, save the index definition along with the `sequence`  the index was added at and set the `build_status` to `building`  so it won’t be used to service queries. 
+2. Any write requests (document updates) after that must read the new index definition and update the index. When updating the new index, the index writers should assume that previous versions of the document have already been indexed.


Can you elaborate to this? Say you have a transaction that does a write/update. In that same transaction, I'm assuming you want the index to incorporate this new write/update. What do you mean by "read the new index definition"?

jaydoane

Found just a few nits that could be addressed

jaydoane · 2020-01-21T05:44:36Z

rfcs/006-mango-fdb.md


-This document details the data model for storing Mango indexes.  The basic model is that we would have a namespace for storing defined indexes and then a dedicated namespace per index for the key/values for a given index. Indexes will be updated in the transaction that a document is written to FoundationDB. When an index is created on an existing database, a background task will build the index up to the Sequence that the index was created at.
+This document details the data model for storing Mango indexes. Indexes will be updated in the transaction that a document is written to FoundationDB. When an index is created on an existing database, a background task will build the index up to the Sequence that the index was created at.



maybe "... up to the Sequence when the index was created" instead?

jaydoane · 2020-01-21T06:00:28Z

rfcs/006-mango-fdb.md

 # Detailed Description

-Mango is a declarative JSON querying syntax that allows a user to retrieve documents based on a given selector. It supports defining indexes for queries which will improve query performance. In CouchDB 2.x Mango is a query layer built on top of Map/Reduce indexes. Each Mango query  follows a two step process, first a subset of the selector is converted into a map query to be used with a predefined index or falling back to `_all_docs` if no indexes are available. Each document retrieved from the index is then matched against the query selector. 
+Mango is a declarative JSON querying syntax that allows a user to retrieve documents based on a selector. Indexes can be defined to improve query performance. In CouchDB 2.x Mango is a query layer built on top of Map/Reduce indexes. Each Mango query follows a two-step process, first a subset of the selector is converted into a map query to be used with a predefined index or falling back to `_all_docs` if no indexes are available. Each document retrieved from the index is then matched against the full query selector. 


"In CouchDB 2.x Mango..." isn't it actually: "In CouchDB < 4.0 Mango..."?

jaydoane · 2020-01-27T04:39:47Z

rfcs/006-mango-fdb.md

-\x40 Text converted into a sort string
-\x50 Array
-\x60 Objects
+In CouchDB 2.x ICU collation is used to sort string key’s when added to the index’s b-tree. The current way of using ICU string collation won’t work with FoundationDB. To resolve this strings will be converted to an ICU sort string before being stored in FDB. This is an extra performance overhead but will only be done when one when writing a key into the index. 


only be done when one when writing a key into the index.

typo

jaydoane · 2020-01-27T04:44:06Z

rfcs/006-mango-fdb.md

+1. When a user defines a new index on an existing database, save the index definition along with the `sequence`  the index was added at and set the `build_status` to `building` so it won’t be used to service queries. 
+2. Any write requests (document updates) after the saved index definition will update the index with the document update. Index writers should assume that previous versions of the document have already been indexed.
+3. At the same time a background process via `couch_jobs` will start reading sections of the changes feed and building the index, this background process will keep processing the changes read until it reaches the sequence number that the index was saved at. Once it reaches that point, the index is up to date and `build_status` will be marked as `active` and the index can be used to service queries.
+4. There is some subtle behavior around step 3 that is worth mentioning. The background process will have the 5-second transaction limit, so it will process smaller parts of the changes feed. Which means that it won’t have one consistent view of the changes feed throughout the index building process. This will lead to a conflict situation when the background process transaction is adding a document to the index while at the same time a write request has a transaction that is updating the same document. There are two possible outcomes to this, if the background process wins, the write request will get a conflict. At that point the write request will try to process the document again, read the old values for that document, remove them from the index and add the new values to the index. If the write request wins, and the background process gets a conflict, then the background process can try again, the document would have been removed from its old position in the changes feed and moved to the later position, so the background process won’t see the document and will then move on to the next one. 


This will lead to a conflict situation

Is this a conflict or a race? Maybe both?

garrensmith · 2020-05-18T11:47:35Z

I've updated this mango RFC so it is up to date with how it is currently implemented.

wohali · 2020-05-19T17:23:59Z

@garrensmith You have approval from Jay; please merge this.

sansato reviewed Apr 26, 2019

View reviewed changes

ryanworl reviewed Apr 30, 2019

View reviewed changes

garrensmith mentioned this pull request May 2, 2019

Add latest RFC draft for FDB revision storage #397

Merged

3 tasks

janl reviewed May 10, 2019

View reviewed changes

tonysun83 reviewed Oct 25, 2019

View reviewed changes

garrensmith force-pushed the rfc/006-mango-on-fdb branch from 9e80aed to 5a78a6a Compare January 9, 2020 14:32

jaydoane approved these changes Jan 27, 2020

View reviewed changes

wohali mentioned this pull request May 13, 2020

[RFC] Background index building in CouchDB 4 #542

Merged

garrensmith force-pushed the rfc/006-mango-on-fdb branch from 5a78a6a to 61803f7 Compare May 18, 2020 11:44

garrensmith added 4 commits May 18, 2020 13:47

RFC for Mango on FDB

90e6e73

typo fixes

4f2ead4

updates to the rfc

9b6a462

update to how it is currently implemented

672d6dd

garrensmith force-pushed the rfc/006-mango-on-fdb branch from 61803f7 to 672d6dd Compare May 18, 2020 11:47

Merge branch 'master' into rfc/006-mango-on-fdb

b271ac2

garrensmith merged commit 6370d8a into master May 20, 2020


		## Terminology

		`Sequence`: a 13 byte value formed by combining the current `Incarnation` of the database and the `Versionstamp` of the transaction. Sequences are monotonically increasing even when a database is relocated across FoundationDB clusters. See (RFC002)[LINK TBD] for a full explanation.


		This document details the data model for storing Mango indexes. The basic model is that we would have a namespace for storing defined indexes and then a dedicated namespace per index for the key/values for a given index. Indexes will be updated in the transaction that a document is written to FoundationDB. When an index is created on an existing database, a background task will build the index up to the Sequence that the index was created at.
		This document details the data model for storing Mango indexes. Indexes will be updated in the transaction that a document is written to FoundationDB. When an index is created on an existing database, a background task will build the index up to the Sequence that the index was created at.

RFC for Mango on FDB #407

RFC for Mango on FDB #407

Uh oh!

Conversation

garrensmith commented Apr 25, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ryanworl Apr 30, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ryanworl May 1, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jaydoane left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

garrensmith commented May 18, 2020

Uh oh!

wohali commented May 19, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

ryanworl Apr 30, 2019 •

edited

Loading

ryanworl May 1, 2019 •

edited

Loading