Skip to content
This repository was archived by the owner on Oct 17, 2022. It is now read-only.

Commit 347a6ce

Browse files
lbboekocolosk
authored andcommitted
Update database partitions doc with review changes.
1 parent fe3000e commit 347a6ce

File tree

3 files changed

+46
-39
lines changed

3 files changed

+46
-39
lines changed

src/api/database/common.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -86,8 +86,8 @@
8686
:>json string update_seq: An opaque string that describes the state
8787
of the database. Do not rely on this string for counting the number
8888
of updates.
89-
:>json boolean props.partitioned: (optional) If present and true this
90-
indicates the the database is partitioned.
89+
:>json boolean props.partitioned: (optional) If present and true, this
90+
indicates that the database is partitioned.
9191
:code 200: Request completed successfully
9292
:code 404: Requested database not found
9393

src/api/partitioned-dbs.rst

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -65,7 +65,6 @@ See the guide for
6565
}
6666
}
6767
68-
6968
``/db/_partition/partition/_all_docs``
7069
======================================
7170

@@ -77,7 +76,7 @@ See the guide for
7776

7877
This endpoint is a convenience endpoint for automatically setting
7978
bounds on the provided partition range. Similar results can be had
80-
by using the global ``/db/_all_docs`` end point with appropriately
79+
by using the global ``/db/_all_docs`` endpoint with appropriately
8180
configured values for ``start_key`` and ``end_key``.
8281

8382
Refer to the :ref:`view endpoint <api/ddoc/view>` documentation for
@@ -118,8 +117,6 @@ See the guide for
118117
"total_rows": 1
119118
}
120119
121-
122-
123120
``/db/_partition/partition/_design/design-doc/_view/view-name``
124121
===============================================================
125122

@@ -157,7 +154,6 @@ See the guide for
157154
Server: CouchDB (Erlang/OTP)
158155
Transfer-Encoding: chunked
159156
160-
161157
{
162158
"offset": 0,
163159
"rows": [

src/partitioned-dbs/index.rst

Lines changed: 43 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -16,8 +16,16 @@
1616
Partitioned Databases
1717
=====================
1818

19-
As a means to introducing partitioned databases we'll consider a motivating
20-
use case to describe the benefits of this feature. For this example we'll
19+
A partitioned database forms documents into logical partitions by using
20+
a partition key. All documents are assigned to a partition, and many documents
21+
are typically given the same partition key. The benefit of partitioned databases
22+
is that secondary indices can be significantly more efficient when locating
23+
matching documents since their entries are contained within their partition.
24+
This means a given secondary index read will only scan a single partition
25+
range instead of having to read from a copy of every shard.
26+
27+
As a means to introducing partitioned databases, we'll consider a motivating
28+
use case to describe the benefits of this feature. For this example, we'll
2129
consider a database that stores readings from a large network of soil
2230
moisture sensors.
2331

@@ -26,7 +34,6 @@ moisture sensors.
2634
:ref:`theory <cluster/theory>` of :ref:`sharding <cluster/sharding>`
2735
in CouchDB.
2836

29-
3037
Traditionally, a document in this database may have something like the
3138
following structure:
3239

@@ -46,17 +53,15 @@ following structure:
4653
]
4754
}
4855
49-
5056
.. note::
5157
While this example uses IoT sensors, the main thing to consider is that
5258
there is a logical grouping of documents. Similar use cases might be
5359
documents grouped by user or scientific data grouped by experiment.
5460

55-
5661
So we've got a bunch of sensors, all grouped by the field they monitor
5762
along with their readouts for a given day (or other appropriate time period).
5863

59-
Along with our documents we might expect to have two secondary indexes
64+
Along with our documents, we might expect to have two secondary indexes
6065
for querying our database that might look something like:
6166

6267
.. code-block:: javascript
@@ -81,17 +86,17 @@ and:
8186
emit(doc.field_name, doc.sensor_id)
8287
}
8388
84-
With these two indexes defined we can easily find all requests for a given
89+
With these two indexes defined, we can easily find all readings for a given
8590
sensor, or list all sensors in a given field.
8691

8792
Unfortunately, in CouchDB, when we read from either of these indexes, it
8893
requires finding a copy of every shard and asking for any documents related
8994
to the particular sensor or field. This means that as our database scales
90-
up the number of shards, every index request must perform more work.
95+
up the number of shards, every index request must perform more work,
96+
which is unnecessary since we are only interested in a small number of documents.
9197
Fortunately for you, dear reader, partitioned databases were created to solve
9298
this precise problem.
9399

94-
95100
What is a partition?
96101
====================
97102

@@ -101,34 +106,35 @@ use case, it's quite logical to group all documents by their ``sensor_id``
101106
field. In this case, we would call the ``sensor_id`` the partition.
102107

103108
A good partition has two basic properties. First, it should have a high
104-
cardinality. That is, there is a large number of values for the partition.
105-
A database that has a single partition would be an anti-pattern for this
106-
feature. Secondly, the amount of data per partition should be "small". The
107-
general recommendation is to limit individual partitions to less than ten
108-
gigabytes of data. Which, for the example of sensor documents, equates to roughly
109-
60,000 years of data.
110-
109+
cardinality. That is, a large partitioned database should have many more
110+
partitions than documents in any single partition. A database that has
111+
a single partition would be an anti-pattern for this feature. Secondly,
112+
the amount of data per partition should be "small". The general
113+
recommendation is to limit individual partitions to less than ten
114+
gigabytes (10 GB) of data. Which, for the example of sensor documents,
115+
equates to roughly 60,000 years of data.
111116

112117
Why use partitions?
113118
===================
114119

115120
The primary benefit of using partitioned databases is for the performance
116121
of partitioned queries. Large databases with lots of documents often
117122
have a similar pattern where there are groups of related documents that
118-
are queried together often.
123+
are queried together.
119124

120125
By using partitions, we can execute queries against these individual groups
121126
of documents more efficiently by placing the entire group within a specific
122127
shard on disk. Thus, the view engine only has to consult one copy of the
123128
given shard range when executing a query instead of executing
124-
the query across all ``Q`` shards in the database.
125-
129+
the query across all ``q`` shards in the database. This mean that you do
130+
not have to wait for all ``q`` shards to respond, which is both
131+
efficient and faster.
126132

127133
Partitions By Example
128134
=====================
129135

130136
To create a partitioned database, we simply need to pass a query string
131-
parameter.
137+
parameter:
132138

133139
.. code-block:: bash
134140
@@ -171,19 +177,19 @@ information:
171177
"update_seq": "0-g1AAAAFDeJzLYWBg4M..."
172178
}
173179
174-
175180
You'll now see that the ``"props"`` member contains ``"partitioned": true``.
176181

177182
.. note::
178183

179-
The format for document ids in a partitioned database is
180-
``partition:docid``. Every regular document (i.e., everything
181-
except design and local documents) in a partitioned database
182-
must follow this format.
184+
Every document in a partitioned database (except _design
185+
and _local documents) must have the format “partition:docid”.
186+
More specifically, the partition for a given document is
187+
everything before the first colon. The document id is everything
188+
after the first colon, which may include more colons.
183189

184190
.. note::
185191

186-
System databases are *not* allowed to be partitioned. This is
192+
System databases (such as _users) are *not* allowed to be partitioned. This is
187193
due to system databases already having their own incompatible
188194
requirements on document ids.
189195

@@ -262,7 +268,7 @@ Note that we can use all of the normal bells and whistles available to
262268
``/dbname/_partition/name/_all_docs`` endpoint is mostly a convenience
263269
so that requests are guaranteed to be scoped to a given partition. Users
264270
are free to use the normal ``/dbname/_all_docs`` to read documents from
265-
multiple partitions.
271+
multiple partitions. Both query styles have the same performance.
266272

267273
Next, we'll create a design document containing our index for
268274
getting all readings from a given sensor. The map function is similar to
@@ -280,8 +286,7 @@ id.
280286
}
281287
}
282288
283-
We can upload our design document and try out a partitioned
284-
query:
289+
After uploading our design document, we can try out a partitioned query:
285290

286291
.. code-block:: bash
287292
@@ -294,6 +299,12 @@ query:
294299
}
295300
}
296301
}
302+
shell> $ curl -X POST -H "Content-Type: application/json" http://127.0.0.1:5984/my_new_db -d @ddoc2.json
303+
{
304+
"ok": true,
305+
"id": "_design/all_sensors",
306+
"rev": "1-4a8188d80fab277fccf57bdd7154dec1"
307+
}
297308
shell> curl http://127.0.0.1:5984/my_new_db/_partition/sensor-260/_design/sensor-readings/_view/by_sensor
298309
{"total_rows":4,"offset":0,"rows":[
299310
{"id":"sensor-260:sensor-reading-ca33c748-2d2c-4ed1-8abf-1bca4d9d03cf","key":["sensor-260","0"],"value":null},
@@ -306,7 +317,7 @@ Hooray! Our first partitioned query. For experienced users, that may not
306317
be the most exciting development, given that the only things that have
307318
changed are a slight tweak to the document id, and accessing views with
308319
a slightly different path. However, for anyone who likes performance
309-
improvements, its actually a big deal. By knowing that the view results
320+
improvements, it's actually a big deal. By knowing that the view results
310321
are all located within the provided partition name, our partitioned
311322
queries now perform nearly as fast as document lookups!
312323

@@ -325,7 +336,7 @@ version:
325336
emit(doc.field_name, doc.sensor_id)
326337
}
327338
328-
Next we'll create a new design doc with this function. Be sure to notice
339+
Next, we'll create a new design doc with this function. Be sure to notice
329340
that the ``"options"`` member contains ``"partitioned": false``.
330341

331342
.. code-block:: bash
@@ -361,7 +372,7 @@ that the ``"options"`` member contains ``"partitioned": false``.
361372
Design documents are either partitioned or global. They cannot
362373
contain a mix of partitioned and global indexes.
363374

364-
And to see a request showing us all sensors in a field we would use a
375+
And to see a request showing us all sensors in a field, we would use a
365376
request like:
366377

367378
.. code-block:: bash

0 commit comments

Comments
 (0)