Update database partitions doc with review changes.

lbboe · kocolosk · commit 347a6cecfc2d · 2019-07-25T11:57:41.000-04:00
diff --git a/src/api/database/common.rst b/src/api/database/common.rst
@@ -86,8 +86,8 @@
     :>json string update_seq: An opaque string that describes the state
       of the database. Do not rely on this string for counting the number
       of updates.
-    :>json boolean props.partitioned: (optional) If present and true this
-      indicates the the database is partitioned.
+    :>json boolean props.partitioned: (optional) If present and true, this
+      indicates that the database is partitioned.
     :code 200: Request completed successfully
     :code 404: Requested database not found
 
diff --git a/src/api/partitioned-dbs.rst b/src/api/partitioned-dbs.rst
@@ -65,7 +65,6 @@ See the guide for
           }
         }
 
-
 ``/db/_partition/partition/_all_docs``
 ======================================
 
@@ -77,7 +76,7 @@ See the guide for
 
     This endpoint is a convenience endpoint for automatically setting
     bounds on the provided partition range. Similar results can be had
-    by using the global ``/db/_all_docs`` end point with appropriately
+    by using the global ``/db/_all_docs`` endpoint with appropriately
     configured values for ``start_key`` and ``end_key``.
 
     Refer to the :ref:`view endpoint <api/ddoc/view>` documentation for
@@ -118,8 +117,6 @@ See the guide for
           "total_rows": 1
         }
 
-
-
 ``/db/_partition/partition/_design/design-doc/_view/view-name``
 ===============================================================
 
@@ -157,7 +154,6 @@ See the guide for
         Server: CouchDB (Erlang/OTP)
         Transfer-Encoding: chunked
 
-
         {
           "offset": 0,
           "rows": [
diff --git a/src/partitioned-dbs/index.rst b/src/partitioned-dbs/index.rst
@@ -16,8 +16,16 @@
 Partitioned Databases
 =====================
 
-As a means to introducing partitioned databases we'll consider a motivating
-use case to describe the benefits of this feature. For this example we'll
+A partitioned database forms documents into logical partitions by using
+a partition key. All documents are assigned to a partition, and many documents
+are typically given the same partition key. The benefit of partitioned databases
+is that secondary indices can be significantly more efficient when locating
+matching documents since their entries are contained within their partition.
+This means a given secondary index read will only scan a single partition
+range instead of having to read from a copy of every shard.
+
+As a means to introducing partitioned databases, we'll consider a motivating
+use case to describe the benefits of this feature. For this example, we'll
 consider a database that stores readings from a large network of soil
 moisture sensors.
 
@@ -26,7 +34,6 @@ moisture sensors.
     :ref:`theory <cluster/theory>` of :ref:`sharding <cluster/sharding>`
     in CouchDB.
 
-
 Traditionally, a document in this database may have something like the
 following structure:
 
@@ -46,17 +53,15 @@ following structure:
         ]
     }
 
-
 .. note::
     While this example uses IoT sensors, the main thing to consider is that
     there is a logical grouping of documents. Similar use cases might be
     documents grouped by user or scientific data grouped by experiment.
 
-
 So we've got a bunch of sensors, all grouped by the field they monitor
 along with their readouts for a given day (or other appropriate time period).
 
-Along with our documents we might expect to have two secondary indexes
+Along with our documents, we might expect to have two secondary indexes
 for querying our database that might look something like:
 
 .. code-block:: javascript
@@ -81,17 +86,17 @@ and:
         emit(doc.field_name, doc.sensor_id)
     }
 
-With these two indexes defined we can easily find all requests for a given
+With these two indexes defined, we can easily find all readings for a given
 sensor, or list all sensors in a given field.
 
 Unfortunately, in CouchDB, when we read from either of these indexes, it
 requires finding a copy of every shard and asking for any documents related
 to the particular sensor or field. This means that as our database scales
-up the number of shards, every index request must perform more work.
+up the number of shards, every index request must perform more work,
+which is unnecessary since we are only interested in a small number of documents.
 Fortunately for you, dear reader, partitioned databases were created to solve
 this precise problem.
 
-
 What is a partition?
 ====================
 
@@ -101,34 +106,35 @@ use case, it's quite logical to group all documents by their ``sensor_id``
 field. In this case, we would call the ``sensor_id`` the partition.
 
 A good partition has two basic properties. First, it should have a high
-cardinality. That is, there is a large number of values for the partition.
-A database that has a single partition would be an anti-pattern for this
-feature. Secondly, the amount of data per partition should be "small". The
-general recommendation is to limit individual partitions to less than ten
-gigabytes of data. Which, for the example of sensor documents, equates to roughly
-60,000 years of data.
-
+cardinality. That is, a large partitioned database should have many more
+partitions than documents in any single partition. A database that has
+a single partition would be an anti-pattern for this feature. Secondly,
+the amount of data per partition should be "small". The general
+recommendation is to limit individual partitions to less than ten
+gigabytes (10 GB) of data. Which, for the example of sensor documents,
+equates to roughly 60,000 years of data.
 
 Why use partitions?
 ===================
 
 The primary benefit of using partitioned databases is for the performance
 of partitioned queries. Large databases with lots of documents often
 have a similar pattern where there are groups of related documents that
-are queried together often.
+are queried together.
 
 By using partitions, we can execute queries against these individual groups
 of documents more efficiently by placing the entire group within a specific
 shard on disk. Thus, the view engine only has to consult one copy of the
 given shard range when executing a query instead of executing
-the query across all ``Q`` shards in the database.
-
+the query across all ``q`` shards in the database. This mean that you do
+not have to wait for all ``q`` shards to respond, which is both
+efficient and faster.
 
 Partitions By Example
 =====================
 
 To create a partitioned database, we simply need to pass a query string
-parameter.
+parameter:
 
 .. code-block:: bash
 
@@ -171,19 +177,19 @@ information:
       "update_seq": "0-g1AAAAFDeJzLYWBg4M..."
     }
 
-
 You'll now see that the ``"props"`` member contains ``"partitioned": true``.
 
 .. note::
 
-    The format for document ids in a partitioned database is
-    ``partition:docid``. Every regular document (i.e., everything
-    except design and local documents) in a partitioned database
-    must follow this format.
+    Every document in a partitioned database (except _design
+    and _local documents) must have the format “partition:docid”.
+    More specifically, the partition for a given document is
+    everything before the first colon. The document id is everything
+    after the first colon, which may include more colons.
 
 .. note::
 
-    System databases are *not* allowed to be partitioned. This is
+    System databases (such as _users) are *not* allowed to be partitioned. This is
     due to system databases already having their own incompatible
     requirements on document ids.
 
@@ -262,7 +268,7 @@ Note that we can use all of the normal bells and whistles available to
 ``/dbname/_partition/name/_all_docs`` endpoint is mostly a convenience
 so that requests are guaranteed to be scoped to a given partition. Users
 are free to use the normal ``/dbname/_all_docs`` to read documents from
-multiple partitions.
+multiple partitions. Both query styles have the same performance.
 
 Next, we'll create a design document containing our index for
 getting all readings from a given sensor. The map function is similar to
@@ -280,8 +286,7 @@ id.
         }
     }
 
-We can upload our design document and try out a partitioned
-query:
+After uploading our design document, we can try out a partitioned query:
 
 .. code-block:: bash
 
@@ -294,6 +299,12 @@ query:
             }
         }
     }
+    shell> $ curl -X POST -H "Content-Type: application/json" http://127.0.0.1:5984/my_new_db -d @ddoc2.json
+    {
+        "ok": true,
+        "id": "_design/all_sensors",
+        "rev": "1-4a8188d80fab277fccf57bdd7154dec1"
+    }
     shell> curl http://127.0.0.1:5984/my_new_db/_partition/sensor-260/_design/sensor-readings/_view/by_sensor
     {"total_rows":4,"offset":0,"rows":[
     {"id":"sensor-260:sensor-reading-ca33c748-2d2c-4ed1-8abf-1bca4d9d03cf","key":["sensor-260","0"],"value":null},
@@ -306,7 +317,7 @@ Hooray! Our first partitioned query. For experienced users, that may not
 be the most exciting development, given that the only things that have
 changed are a slight tweak to the document id, and accessing views with
 a slightly different path. However, for anyone who likes performance
-improvements, its actually a big deal. By knowing that the view results
+improvements, it's actually a big deal. By knowing that the view results
 are all located within the provided partition name, our partitioned
 queries now perform nearly as fast as document lookups!
 
@@ -325,7 +336,7 @@ version:
         emit(doc.field_name, doc.sensor_id)
     }
 
-Next we'll create a new design doc with this function. Be sure to notice
+Next, we'll create a new design doc with this function. Be sure to notice
 that the ``"options"`` member contains ``"partitioned": false``.
 
 .. code-block:: bash
@@ -361,7 +372,7 @@ that the ``"options"`` member contains ``"partitioned": false``.
     Design documents are either partitioned or global. They cannot
     contain a mix of partitioned and global indexes.
 
-And to see a request showing us all sensors in a field we would use a
+And to see a request showing us all sensors in a field, we would use a
 request like:
 
 .. code-block:: bash

Original file line number	Diff line number	Diff line change
`@@ -65,7 +65,6 @@ See the guide for`
`65`	`65`	`}`
`66`	`66`	`}`
`67`	`67`
`68`		`-`
`69`	`68`	``/db/_partition/partition/_all_docs``
`70`	`69`	`======================================`
`71`	`70`
`@@ -77,7 +76,7 @@ See the guide for`
`77`	`76`
`78`	`77`	`This endpoint is a convenience endpoint for automatically setting`
`79`	`78`	`bounds on the provided partition range. Similar results can be had`
`80`		- by using the global ``/db/_all_docs`` end point with appropriately
	`79`	+ by using the global ``/db/_all_docs`` endpoint with appropriately
`81`	`80`	configured values for ``start_key`` and ``end_key``.
`82`	`81`
`83`	`82`	Refer to the :ref:`view endpoint <api/ddoc/view>` documentation for
`@@ -118,8 +117,6 @@ See the guide for`
`118`	`117`	`"total_rows": 1`
`119`	`118`	`}`
`120`	`119`
`121`		`-`
`122`		`-`
`123`	`120`	``/db/_partition/partition/_design/design-doc/_view/view-name``
`124`	`121`	`===============================================================`
`125`	`122`
`@@ -157,7 +154,6 @@ See the guide for`
`157`	`154`	`Server: CouchDB (Erlang/OTP)`
`158`	`155`	`Transfer-Encoding: chunked`
`159`	`156`
`160`		`-`
`161`	`157`	`{`
`162`	`158`	`"offset": 0,`
`163`	`159`	`"rows": [`