cockroachdb · rmloveland · Jan 9, 2020 · Dec 3, 2019
diff --git a/v19.1/limit-offset.md b/v19.1/limit-offset.md
@@ -26,6 +26,10 @@ as part of [`INSERT`](insert.html) or [`UPSERT`](upsert.html).
 `OFFSET` restricts the operation to skip the first `offset_value` number of rows.
 It is often used in conjunction with `LIMIT` to "paginate" through retrieved rows.
 
+{{site.data.alerts.callout_danger}}
+Using `LIMIT`/`OFFSET` to implement pagination can be very slow for large tables.  We recommend using [keyset pagination](selection-queries.html#pagination-example) instead.
+{{site.data.alerts.end}}
+
 For PostgreSQL compatibility, CockroachDB also supports `FETCH FIRST
 limit_val ROWS ONLY` and `FETCH NEXT limit_val ROWS ONLY` as aliases
 for `LIMIT`. If `limit_val` is omitted, then one row is fetched.

diff --git a/v19.1/selection-queries.md b/v19.1/selection-queries.md
@@ -337,26 +337,167 @@ LIMIT 5;
 
 ### Paginate through limited results
 
-If you want to limit the number of results, but go beyond the initial set, use `OFFSET` to proceed to the next set of results. This is often used to paginate through large tables where not all of the values need to be immediately retrieved.
+To iterate through a table one "page" of results at a time (also known as pagination) there are two options, only one of which is recommended:
+
+- Keyset pagination (**fast, recommended**)
+- `LIMIT` / `OFFSET` pagination (slow, not recommended)
+
+Keyset pagination (also known as the "seek method") is used to fetch a subset of records from a table quickly. It does this by restricting the set of records returned with a combination of `WHERE` and [`LIMIT`](limit-offset.html) clauses. To get the next page, you check the value of the column in the `WHERE` clause against the last row returned in the previous page of results.
+
+The general pattern for keyset pagination queries is:
 
 {% include copy-clipboard.html %}
 ~~~ sql
-> SELECT id, name
-FROM accounts
-LIMIT 5
-OFFSET 5;
+SELECT * FROM t
+  WHERE key > ${value}
+  ORDER BY key
+  LIMIT ${amount}
 ~~~
+
+This is faster than using `LIMIT`/`OFFSET` because, instead of doing a full table scan up to the value of the `OFFSET`, a keyset pagination query looks at a fixed-size set of records for each iteration. This can be done quickly provided that the key used in the `WHERE` clause to implement the pagination is [indexed](indexes.html#best-practices) and [unique](unique.html). A [primary key](primary-key.html) meets both of these criteria.
+
+#### Pagination example
+
+The examples in this section use the [employees data set](https://github.com/datacharmer/test_db), which you can load into CockroachDB as follows:
+
+{% include copy-clipboard.html %}
+~~~ sql
+CREATE DATABASE IF NOT EXISTS employees;
+USE employees;
+IMPORT PGDUMP 'https://s3-us-west-1.amazonaws.com/cockroachdb-movr/datasets/employees-db/pg_dump/employees-full.sql.gz';
 ~~~
-+----+------------------+
-| id |       name       |
-+----+------------------+
-|  6 | Juno Studwick    |
-|  7 | Juno Studwick    |
-|  8 | Eutychia Roberts |
-|  9 | Ricarda Moriarty |
-| 10 | Henrik Brankovic |
-+----+------------------+
+
+To get the first page of results using keyset pagination, run:
+
+{% include copy-clipboard.html %}
+~~~ sql
+SELECT * FROM employees WHERE emp_no > 10000 LIMIT 25;
+~~~
+
+~~~
+  emp_no |        birth_date         | first_name |  last_name  | gender |         hire_date          
++--------+---------------------------+------------+-------------+--------+---------------------------+
+   10001 | 1953-09-02 00:00:00+00:00 | Georgi     | Facello     | M      | 1986-06-26 00:00:00+00:00  
+   10002 | 1964-06-02 00:00:00+00:00 | Bezalel    | Simmel      | F      | 1985-11-21 00:00:00+00:00  
+   10003 | 1959-12-03 00:00:00+00:00 | Parto      | Bamford     | M      | 1986-08-28 00:00:00+00:00  
+   ... snip
+
+(25 rows)
+
+Time: 1.31ms
+~~~
+
+{{site.data.alerts.callout_success}}
+If you don't know what the minimum value of the key is, either `SELECT min(key) FROM table` or use a known minimum value for the key's data type.
+{{site.data.alerts.end}}
+
+To get the second page of results, run:
+
+{% include copy-clipboard.html %}
+~~~ sql
+SELECT * FROM employees WHERE emp_no > 10025 LIMIT 25;
+~~~
+
+~~~
+  emp_no |        birth_date         | first_name | last_name  | gender |         hire_date          
++--------+---------------------------+------------+------------+--------+---------------------------+
+   10026 | 1953-04-03 00:00:00+00:00 | Yongqiao   | Berztiss   | M      | 1995-03-20 00:00:00+00:00  
+   10027 | 1962-07-10 00:00:00+00:00 | Divier     | Reistad    | F      | 1989-07-07 00:00:00+00:00  
+   10028 | 1963-11-26 00:00:00+00:00 | Domenick   | Tempesti   | M      | 1991-10-22 00:00:00+00:00  
+   ... snip!
+
+(25 rows)
+
+Time: 1.473ms
+~~~
+
+To get an arbitrary page of results showing employees whose IDs (`emp_no`) are in a much higher range, try the following query. Note that it takes about the same amount of time to run as the previous queries.
+
+{% include copy-clipboard.html %}
+~~~ sql
+SELECT * FROM employees WHERE emp_no > 300025 LIMIT 25;
+~~~
+
 ~~~
+  emp_no |        birth_date         | first_name |  last_name   | gender |         hire_date          
++--------+---------------------------+------------+--------------+--------+---------------------------+
+  400000 | 1963-11-29 00:00:00+00:00 | Mitsuyuki  | Reinhart     | M      | 1985-08-27 00:00:00+00:00  
+  400001 | 1962-06-02 00:00:00+00:00 | Rosalie    | Chinin       | M      | 1986-11-28 00:00:00+00:00  
+  400002 | 1964-08-16 00:00:00+00:00 | Quingbo    | Birnbaum     | F      | 1986-04-23 00:00:00+00:00  
+  ... snip!
+
+(25 rows)
+
+Time: 1.319ms
+~~~
+
+Compare the execution speed of the previous keyset pagination queries with the query below that uses `LIMIT` / `OFFSET` to get the same page of results:
+
+{% include copy-clipboard.html %}
+~~~ sql
+SELECT * FROM employees LIMIT 25 OFFSET 200024;
+~~~
+
+~~~
+  emp_no |        birth_date         | first_name |  last_name   | gender |         hire_date          
++--------+---------------------------+------------+--------------+--------+---------------------------+
+  400000 | 1963-11-29 00:00:00+00:00 | Mitsuyuki  | Reinhart     | M      | 1985-08-27 00:00:00+00:00  
+  400001 | 1962-06-02 00:00:00+00:00 | Rosalie    | Chinin       | M      | 1986-11-28 00:00:00+00:00  
+  400002 | 1964-08-16 00:00:00+00:00 | Quingbo    | Birnbaum     | F      | 1986-04-23 00:00:00+00:00  
+  ... snip!
+
+(25 rows)
+
+Time: 118.114ms
+~~~
+
+The query using `LIMIT`/`OFFSET` for pagination is almost 100 times slower. To see why, let's use [`EXPLAIN`](explain.html).
+
+{% include copy-clipboard.html %}
+~~~ sql
+EXPLAIN SELECT * FROM employees LIMIT 25 OFFSET 200024;
+~~~
+
+~~~
+    tree    |    field    |    description
++-----------+-------------+-------------------+
+            | distributed | true
+            | vectorized  | true
+  limit     |             |
+   │        | offset      | 200024
+   └── scan |             |
+            | table       | employees@primary
+            | spans       | ALL
+            | limit       | 200049
+~~~
+
+The culprit is this: because we used `LIMIT`/`OFFSET`, we are performing a full table scan (see `spans` = `ALL` above) from the first record all the way up to the value of the offset. In other words, we are iterating over a big array of rows from 1 to *n*, where *n* is 200049.
+
+Meanwhile, the keyset pagination queries are looking at a much smaller range of table spans, which is much faster (see `spans` = `300026-` + 25 below). Because [there is an index on every column in the `WHERE` clause](indexes.html#best-practices), these queries are doing an index lookup to jump to the start of the page of results, and then getting an additional 25 rows from there. This is much faster.
+
+{% include copy-clipboard.html %}
+~~~ sql
+EXPLAIN SELECT * FROM employees WHERE emp_no > 300025 LIMIT 25;
+~~~
+
+~~~
+  tree |    field    |    description
++------+-------------+-------------------+
+       | distributed | false
+       | vectorized  | false
+  scan |             |
+       | table       | employees@primary
+       | spans       | /300026-
+       | limit       | 25
+~~~
+
+{{site.data.alerts.callout_danger}}
+Using a sequential (i.e., non-[UUID](uuid.html)) primary key creates hot spots in the database for write-heavy workloads, since concurrent [`INSERT`](insert.html)s to the table will attempt to write to the same (or nearby) underlying ranges. This can be mitigated by designing your schema with [multi-column primary keys which include a monotonically increasing column](performance-best-practices-overview.html#use-multi-column-primary-keys).
+{{site.data.alerts.end}}
+
+{{site.data.alerts.callout_info}}
+CockroachDB does not implement cursors. For a scale-out system like CockroachDB, using a cursor would not be recommended for the same reason that paginating with `LIMIT`/`OFFSET` is not recommended: it forces the server to keep track of state, which means the pagination queries don't scale well.
+{{site.data.alerts.end}}
 
 ## Composability
 

diff --git a/v19.2/limit-offset.md b/v19.2/limit-offset.md
@@ -26,6 +26,10 @@ as part of [`INSERT`](insert.html) or [`UPSERT`](upsert.html).
 `OFFSET` restricts the operation to skip the first `offset_value` number of rows.
 It is often used in conjunction with `LIMIT` to "paginate" through retrieved rows.
 
+{{site.data.alerts.callout_danger}}
+Using `LIMIT`/`OFFSET` to implement pagination can be very slow for large tables.  We recommend using [keyset pagination](selection-queries.html#pagination-example) instead.
+{{site.data.alerts.end}}
+
 For PostgreSQL compatibility, CockroachDB also supports `FETCH FIRST
 limit_val ROWS ONLY` and `FETCH NEXT limit_val ROWS ONLY` as aliases
 for `LIMIT`. If `limit_val` is omitted, then one row is fetched.

diff --git a/v19.2/selection-queries.md b/v19.2/selection-queries.md
@@ -338,26 +338,167 @@ LIMIT 5;
 
 ### Paginate through limited results
 
-If you want to limit the number of results, but go beyond the initial set, use `OFFSET` to proceed to the next set of results. This is often used to paginate through large tables where not all of the values need to be immediately retrieved.
+To iterate through a table one "page" of results at a time (also known as pagination) there are two options, only one of which is recommended:
+
+- Keyset pagination (**fast, recommended**)
+- `LIMIT` / `OFFSET` pagination (slow, not recommended)
+
+Keyset pagination (also known as the "seek method") is used to fetch a subset of records from a table quickly. It does this by restricting the set of records returned with a combination of `WHERE` and [`LIMIT`](limit-offset.html) clauses. To get the next page, you check the value of the column in the `WHERE` clause against the last row returned in the previous page of results.
+
+The general pattern for keyset pagination queries is:
 
 {% include copy-clipboard.html %}
 ~~~ sql
-> SELECT id, name
-FROM accounts
-LIMIT 5
-OFFSET 5;
+SELECT * FROM t
+  WHERE key > ${value}
+  ORDER BY key
+  LIMIT ${amount}
 ~~~
+
+This is faster than using `LIMIT`/`OFFSET` because, instead of doing a full table scan up to the value of the `OFFSET`, a keyset pagination query looks at a fixed-size set of records for each iteration. This can be done quickly provided that the key used in the `WHERE` clause to implement the pagination is [indexed](indexes.html#best-practices) and [unique](unique.html). A [primary key](primary-key.html) meets both of these criteria.
+
+#### Pagination example
+
+The examples in this section use the [employees data set](https://github.com/datacharmer/test_db), which you can load into CockroachDB as follows:
+
+{% include copy-clipboard.html %}
+~~~ sql
+CREATE DATABASE IF NOT EXISTS employees;
+USE employees;
+IMPORT PGDUMP 'https://s3-us-west-1.amazonaws.com/cockroachdb-movr/datasets/employees-db/pg_dump/employees-full.sql.gz';
 ~~~
-+----+------------------+
-| id |       name       |
-+----+------------------+
-|  6 | Juno Studwick    |
-|  7 | Juno Studwick    |
-|  8 | Eutychia Roberts |
-|  9 | Ricarda Moriarty |
-| 10 | Henrik Brankovic |
-+----+------------------+
+
+To get the first page of results using keyset pagination, run:
+
+{% include copy-clipboard.html %}
+~~~ sql
+SELECT * FROM employees WHERE emp_no > 10000 LIMIT 25;
+~~~
+
+~~~
+  emp_no |        birth_date         | first_name |  last_name  | gender |         hire_date          
++--------+---------------------------+------------+-------------+--------+---------------------------+
+   10001 | 1953-09-02 00:00:00+00:00 | Georgi     | Facello     | M      | 1986-06-26 00:00:00+00:00  
+   10002 | 1964-06-02 00:00:00+00:00 | Bezalel    | Simmel      | F      | 1985-11-21 00:00:00+00:00  
+   10003 | 1959-12-03 00:00:00+00:00 | Parto      | Bamford     | M      | 1986-08-28 00:00:00+00:00  
+   ... snip
+
+(25 rows)
+
+Time: 1.31ms
+~~~
+
+{{site.data.alerts.callout_success}}
+If you don't know what the minimum value of the key is, either `SELECT min(key) FROM table` or use a known minimum value for the key's data type.
+{{site.data.alerts.end}}
+
+To get the second page of results, run:
+
+{% include copy-clipboard.html %}
+~~~ sql
+SELECT * FROM employees WHERE emp_no > 10025 LIMIT 25;
+~~~
+
+~~~
+  emp_no |        birth_date         | first_name | last_name  | gender |         hire_date          
++--------+---------------------------+------------+------------+--------+---------------------------+
+   10026 | 1953-04-03 00:00:00+00:00 | Yongqiao   | Berztiss   | M      | 1995-03-20 00:00:00+00:00  
+   10027 | 1962-07-10 00:00:00+00:00 | Divier     | Reistad    | F      | 1989-07-07 00:00:00+00:00  
+   10028 | 1963-11-26 00:00:00+00:00 | Domenick   | Tempesti   | M      | 1991-10-22 00:00:00+00:00  
+   ... snip!
+
+(25 rows)
+
+Time: 1.473ms
+~~~
+
+To get an arbitrary page of results showing employees whose IDs (`emp_no`) are in a much higher range, try the following query. Note that it takes about the same amount of time to run as the previous queries.
+
+{% include copy-clipboard.html %}
+~~~ sql
+SELECT * FROM employees WHERE emp_no > 300025 LIMIT 25;
+~~~
+
 ~~~
+  emp_no |        birth_date         | first_name |  last_name   | gender |         hire_date          
++--------+---------------------------+------------+--------------+--------+---------------------------+
+  400000 | 1963-11-29 00:00:00+00:00 | Mitsuyuki  | Reinhart     | M      | 1985-08-27 00:00:00+00:00  
+  400001 | 1962-06-02 00:00:00+00:00 | Rosalie    | Chinin       | M      | 1986-11-28 00:00:00+00:00  
+  400002 | 1964-08-16 00:00:00+00:00 | Quingbo    | Birnbaum     | F      | 1986-04-23 00:00:00+00:00  
+  ... snip!
+
+(25 rows)
+
+Time: 1.319ms
+~~~
+
+Compare the execution speed of the previous keyset pagination queries with the query below that uses `LIMIT` / `OFFSET` to get the same page of results:
+
+{% include copy-clipboard.html %}
+~~~ sql
+SELECT * FROM employees LIMIT 25 OFFSET 200024;
+~~~
+
+~~~
+  emp_no |        birth_date         | first_name |  last_name   | gender |         hire_date          
++--------+---------------------------+------------+--------------+--------+---------------------------+
+  400000 | 1963-11-29 00:00:00+00:00 | Mitsuyuki  | Reinhart     | M      | 1985-08-27 00:00:00+00:00  
+  400001 | 1962-06-02 00:00:00+00:00 | Rosalie    | Chinin       | M      | 1986-11-28 00:00:00+00:00  
+  400002 | 1964-08-16 00:00:00+00:00 | Quingbo    | Birnbaum     | F      | 1986-04-23 00:00:00+00:00  
+  ... snip!
+
+(25 rows)
+
+Time: 118.114ms
+~~~
+
+The query using `LIMIT`/`OFFSET` for pagination is almost 100 times slower. To see why, let's use [`EXPLAIN`](explain.html).
+
+{% include copy-clipboard.html %}
+~~~ sql
+EXPLAIN SELECT * FROM employees LIMIT 25 OFFSET 200024;
+~~~
+
+~~~
+    tree    |    field    |    description
++-----------+-------------+-------------------+
+            | distributed | true
+            | vectorized  | true
+  limit     |             |
+   │        | offset      | 200024
+   └── scan |             |
+            | table       | employees@primary
+            | spans       | ALL
+            | limit       | 200049
+~~~
+
+The culprit is this: because we used `LIMIT`/`OFFSET`, we are performing a full table scan (see `spans` = `ALL` above) from the first record all the way up to the value of the offset. In other words, we are iterating over a big array of rows from 1 to *n*, where *n* is 200049.
+
+Meanwhile, the keyset pagination queries are looking at a much smaller range of table spans, which is much faster (see `spans` = `300026-` + 25 below). Because [there is an index on every column in the `WHERE` clause](indexes.html#best-practices), these queries are doing an index lookup to jump to the start of the page of results, and then getting an additional 25 rows from there. This is much faster.
+
+{% include copy-clipboard.html %}
+~~~ sql
+EXPLAIN SELECT * FROM employees WHERE emp_no > 300025 LIMIT 25;
+~~~
+
+~~~
+  tree |    field    |    description
++------+-------------+-------------------+
+       | distributed | false
+       | vectorized  | false
+  scan |             |
+       | table       | employees@primary
+       | spans       | /300026-
+       | limit       | 25
+~~~
+
+{{site.data.alerts.callout_danger}}
+Using a sequential (i.e., non-[UUID](uuid.html)) primary key creates hot spots in the database for write-heavy workloads, since concurrent [`INSERT`](insert.html)s to the table will attempt to write to the same (or nearby) underlying [ranges](architecture/overview.html#architecture-range). This can be mitigated by designing your schema with [multi-column primary keys which include a monotonically increasing column](performance-best-practices-overview.html#use-multi-column-primary-keys).
+{{site.data.alerts.end}}
+
+{{site.data.alerts.callout_info}}
+CockroachDB does not implement cursors. For a scale-out system like CockroachDB, using a cursor would not be recommended for the same reason that paginating with `LIMIT`/`OFFSET` is not recommended: it forces the server to keep track of state, which means the pagination queries don't scale well.
+{{site.data.alerts.end}}
 
 ## Composability
 

diff --git a/v20.1/limit-offset.md b/v20.1/limit-offset.md
@@ -26,6 +26,10 @@ as part of [`INSERT`](insert.html) or [`UPSERT`](upsert.html).
 `OFFSET` restricts the operation to skip the first `offset_value` number of rows.
 It is often used in conjunction with `LIMIT` to "paginate" through retrieved rows.
 
+{{site.data.alerts.callout_danger}}
+Using `LIMIT`/`OFFSET` to implement pagination can be very slow for large tables.  We recommend using [keyset pagination](selection-queries.html#pagination-example) instead.
+{{site.data.alerts.end}}
+
 For PostgreSQL compatibility, CockroachDB also supports `FETCH FIRST
 limit_val ROWS ONLY` and `FETCH NEXT limit_val ROWS ONLY` as aliases
 for `LIMIT`. If `limit_val` is omitted, then one row is fetched.