1616Partitioned Databases
1717=====================
1818
19- As a means to introducing partitioned databases we'll consider a motivating
20- use case to describe the benefits of this feature. For this example we'll
19+ A partitioned database forms documents into logical partitions by using
20+ a partition key. All documents are assigned to a partition, and many documents
21+ are typically given the same partition key. The benefit of partitioned databases
22+ is that secondary indices can be significantly more efficient when locating
23+ matching documents since their entries are contained within their partition.
24+ This means a given secondary index read will only scan a single partition
25+ range instead of having to read from a copy of every shard.
26+
27+ As a means to introducing partitioned databases, we'll consider a motivating
28+ use case to describe the benefits of this feature. For this example, we'll
2129consider a database that stores readings from a large network of soil
2230moisture sensors.
2331
@@ -26,7 +34,6 @@ moisture sensors.
2634 :ref: `theory <cluster/theory >` of :ref: `sharding <cluster/sharding >`
2735 in CouchDB.
2836
29-
3037Traditionally, a document in this database may have something like the
3138following structure:
3239
@@ -46,17 +53,15 @@ following structure:
4653 ]
4754 }
4855
49-
5056 .. note ::
5157 While this example uses IoT sensors, the main thing to consider is that
5258 there is a logical grouping of documents. Similar use cases might be
5359 documents grouped by user or scientific data grouped by experiment.
5460
55-
5661So we've got a bunch of sensors, all grouped by the field they monitor
5762along with their readouts for a given day (or other appropriate time period).
5863
59- Along with our documents we might expect to have two secondary indexes
64+ Along with our documents, we might expect to have two secondary indexes
6065for querying our database that might look something like:
6166
6267.. code-block :: javascript
8186 emit (doc .field_name , doc .sensor_id )
8287 }
8388
84- With these two indexes defined we can easily find all requests for a given
89+ With these two indexes defined, we can easily find all readings for a given
8590sensor, or list all sensors in a given field.
8691
8792Unfortunately, in CouchDB, when we read from either of these indexes, it
8893requires finding a copy of every shard and asking for any documents related
8994to the particular sensor or field. This means that as our database scales
90- up the number of shards, every index request must perform more work.
95+ up the number of shards, every index request must perform more work,
96+ which is unnecessary since we are only interested in a small number of documents.
9197Fortunately for you, dear reader, partitioned databases were created to solve
9298this precise problem.
9399
94-
95100What is a partition?
96101====================
97102
@@ -101,34 +106,35 @@ use case, it's quite logical to group all documents by their ``sensor_id``
101106field. In this case, we would call the ``sensor_id `` the partition.
102107
103108A good partition has two basic properties. First, it should have a high
104- cardinality. That is, there is a large number of values for the partition.
105- A database that has a single partition would be an anti-pattern for this
106- feature. Secondly, the amount of data per partition should be "small". The
107- general recommendation is to limit individual partitions to less than ten
108- gigabytes of data. Which, for the example of sensor documents, equates to roughly
109- 60,000 years of data.
110-
109+ cardinality. That is, a large partitioned database should have many more
110+ partitions than documents in any single partition. A database that has
111+ a single partition would be an anti-pattern for this feature. Secondly,
112+ the amount of data per partition should be "small". The general
113+ recommendation is to limit individual partitions to less than ten
114+ gigabytes (10 GB) of data. Which, for the example of sensor documents,
115+ equates to roughly 60,000 years of data.
111116
112117Why use partitions?
113118===================
114119
115120The primary benefit of using partitioned databases is for the performance
116121of partitioned queries. Large databases with lots of documents often
117122have a similar pattern where there are groups of related documents that
118- are queried together often .
123+ are queried together.
119124
120125By using partitions, we can execute queries against these individual groups
121126of documents more efficiently by placing the entire group within a specific
122127shard on disk. Thus, the view engine only has to consult one copy of the
123128given shard range when executing a query instead of executing
124- the query across all ``Q `` shards in the database.
125-
129+ the query across all ``q `` shards in the database. This mean that you do
130+ not have to wait for all ``q `` shards to respond, which is both
131+ efficient and faster.
126132
127133Partitions By Example
128134=====================
129135
130136To create a partitioned database, we simply need to pass a query string
131- parameter.
137+ parameter:
132138
133139.. code-block :: bash
134140
@@ -171,19 +177,19 @@ information:
171177 " update_seq" : " 0-g1AAAAFDeJzLYWBg4M..."
172178 }
173179
174-
175180 You'll now see that the ``"props" `` member contains ``"partitioned": true ``.
176181
177182.. note ::
178183
179- The format for document ids in a partitioned database is
180- ``partition:docid ``. Every regular document (i.e., everything
181- except design and local documents) in a partitioned database
182- must follow this format.
184+ Every document in a partitioned database (except _design
185+ and _local documents) must have the format “partition:docid”.
186+ More specifically, the partition for a given document is
187+ everything before the first colon. The document id is everything
188+ after the first colon, which may include more colons.
183189
184190.. note ::
185191
186- System databases are *not * allowed to be partitioned. This is
192+ System databases (such as _users) are *not * allowed to be partitioned. This is
187193 due to system databases already having their own incompatible
188194 requirements on document ids.
189195
@@ -262,7 +268,7 @@ Note that we can use all of the normal bells and whistles available to
262268``/dbname/_partition/name/_all_docs `` endpoint is mostly a convenience
263269so that requests are guaranteed to be scoped to a given partition. Users
264270are free to use the normal ``/dbname/_all_docs `` to read documents from
265- multiple partitions.
271+ multiple partitions. Both query styles have the same performance.
266272
267273Next, we'll create a design document containing our index for
268274getting all readings from a given sensor. The map function is similar to
280286 }
281287 }
282288
283- We can upload our design document and try out a partitioned
284- query:
289+ After uploading our design document, we can try out a partitioned query:
285290
286291.. code-block :: bash
287292
@@ -294,6 +299,12 @@ query:
294299 }
295300 }
296301 }
302+ shell> $ curl -X POST -H " Content-Type: application/json" http://127.0.0.1:5984/my_new_db -d @ddoc2.json
303+ {
304+ " ok" : true,
305+ " id" : " _design/all_sensors" ,
306+ " rev" : " 1-4a8188d80fab277fccf57bdd7154dec1"
307+ }
297308 shell> curl http://127.0.0.1:5984/my_new_db/_partition/sensor-260/_design/sensor-readings/_view/by_sensor
298309 {" total_rows" :4," offset" :0," rows" :[
299310 {" id" :" sensor-260:sensor-reading-ca33c748-2d2c-4ed1-8abf-1bca4d9d03cf" ," key" :[" sensor-260" ," 0" ]," value" :null},
@@ -306,7 +317,7 @@ Hooray! Our first partitioned query. For experienced users, that may not
306317be the most exciting development, given that the only things that have
307318changed are a slight tweak to the document id, and accessing views with
308319a slightly different path. However, for anyone who likes performance
309- improvements, its actually a big deal. By knowing that the view results
320+ improvements, it's actually a big deal. By knowing that the view results
310321are all located within the provided partition name, our partitioned
311322queries now perform nearly as fast as document lookups!
312323
@@ -325,7 +336,7 @@ version:
325336 emit (doc .field_name , doc .sensor_id )
326337 }
327338
328- Next we'll create a new design doc with this function. Be sure to notice
339+ Next, we'll create a new design doc with this function. Be sure to notice
329340that the ``"options" `` member contains ``"partitioned": false ``.
330341
331342.. code-block :: bash
@@ -361,7 +372,7 @@ that the ``"options"`` member contains ``"partitioned": false``.
361372 Design documents are either partitioned or global. They cannot
362373 contain a mix of partitioned and global indexes.
363374
364- And to see a request showing us all sensors in a field we would use a
375+ And to see a request showing us all sensors in a field, we would use a
365376request like:
366377
367378.. code-block :: bash
0 commit comments