Skip to content

Update pagination docs to use keyset / seek method #6114

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 9, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions v19.1/limit-offset.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,10 @@ as part of [`INSERT`](insert.html) or [`UPSERT`](upsert.html).
`OFFSET` restricts the operation to skip the first `offset_value` number of rows.
It is often used in conjunction with `LIMIT` to "paginate" through retrieved rows.

{{site.data.alerts.callout_danger}}
Using `LIMIT`/`OFFSET` to implement pagination can be very slow for large tables. We recommend using [keyset pagination](selection-queries.html#pagination-example) instead.
{{site.data.alerts.end}}

For PostgreSQL compatibility, CockroachDB also supports `FETCH FIRST
limit_val ROWS ONLY` and `FETCH NEXT limit_val ROWS ONLY` as aliases
for `LIMIT`. If `limit_val` is omitted, then one row is fetched.
Expand Down
169 changes: 155 additions & 14 deletions v19.1/selection-queries.md
Original file line number Diff line number Diff line change
Expand Up @@ -337,26 +337,167 @@ LIMIT 5;

### Paginate through limited results

If you want to limit the number of results, but go beyond the initial set, use `OFFSET` to proceed to the next set of results. This is often used to paginate through large tables where not all of the values need to be immediately retrieved.
To iterate through a table one "page" of results at a time (also known as pagination) there are two options, only one of which is recommended:

- Keyset pagination (**fast, recommended**)
- `LIMIT` / `OFFSET` pagination (slow, not recommended)

Keyset pagination (also known as the "seek method") is used to fetch a subset of records from a table quickly. It does this by restricting the set of records returned with a combination of `WHERE` and [`LIMIT`](limit-offset.html) clauses. To get the next page, you check the value of the column in the `WHERE` clause against the last row returned in the previous page of results.

The general pattern for keyset pagination queries is:

{% include copy-clipboard.html %}
~~~ sql
> SELECT id, name
FROM accounts
LIMIT 5
OFFSET 5;
SELECT * FROM t
WHERE key > ${value}
ORDER BY key
LIMIT ${amount}
~~~

This is faster than using `LIMIT`/`OFFSET` because, instead of doing a full table scan up to the value of the `OFFSET`, a keyset pagination query looks at a fixed-size set of records for each iteration. This can be done quickly provided that the key used in the `WHERE` clause to implement the pagination is [indexed](indexes.html#best-practices) and [unique](unique.html). A [primary key](primary-key.html) meets both of these criteria.

#### Pagination example

The examples in this section use the [employees data set](https://github.com/datacharmer/test_db), which you can load into CockroachDB as follows:

{% include copy-clipboard.html %}
~~~ sql
CREATE DATABASE IF NOT EXISTS employees;
USE employees;
IMPORT PGDUMP 'https://s3-us-west-1.amazonaws.com/cockroachdb-movr/datasets/employees-db/pg_dump/employees-full.sql.gz';
~~~
+----+------------------+
| id | name |
+----+------------------+
| 6 | Juno Studwick |
| 7 | Juno Studwick |
| 8 | Eutychia Roberts |
| 9 | Ricarda Moriarty |
| 10 | Henrik Brankovic |
+----+------------------+

To get the first page of results using keyset pagination, run:

{% include copy-clipboard.html %}
~~~ sql
SELECT * FROM employees WHERE emp_no > 10000 LIMIT 25;
~~~

~~~
emp_no | birth_date | first_name | last_name | gender | hire_date
+--------+---------------------------+------------+-------------+--------+---------------------------+
10001 | 1953-09-02 00:00:00+00:00 | Georgi | Facello | M | 1986-06-26 00:00:00+00:00
10002 | 1964-06-02 00:00:00+00:00 | Bezalel | Simmel | F | 1985-11-21 00:00:00+00:00
10003 | 1959-12-03 00:00:00+00:00 | Parto | Bamford | M | 1986-08-28 00:00:00+00:00
... snip

(25 rows)

Time: 1.31ms
~~~

{{site.data.alerts.callout_success}}
If you don't know what the minimum value of the key is, either `SELECT min(key) FROM table` or use a known minimum value for the key's data type.
{{site.data.alerts.end}}

To get the second page of results, run:

{% include copy-clipboard.html %}
~~~ sql
SELECT * FROM employees WHERE emp_no > 10025 LIMIT 25;
~~~

~~~
emp_no | birth_date | first_name | last_name | gender | hire_date
+--------+---------------------------+------------+------------+--------+---------------------------+
10026 | 1953-04-03 00:00:00+00:00 | Yongqiao | Berztiss | M | 1995-03-20 00:00:00+00:00
10027 | 1962-07-10 00:00:00+00:00 | Divier | Reistad | F | 1989-07-07 00:00:00+00:00
10028 | 1963-11-26 00:00:00+00:00 | Domenick | Tempesti | M | 1991-10-22 00:00:00+00:00
... snip!

(25 rows)

Time: 1.473ms
~~~

To get an arbitrary page of results showing employees whose IDs (`emp_no`) are in a much higher range, try the following query. Note that it takes about the same amount of time to run as the previous queries.

{% include copy-clipboard.html %}
~~~ sql
SELECT * FROM employees WHERE emp_no > 300025 LIMIT 25;
~~~

~~~
emp_no | birth_date | first_name | last_name | gender | hire_date
+--------+---------------------------+------------+--------------+--------+---------------------------+
400000 | 1963-11-29 00:00:00+00:00 | Mitsuyuki | Reinhart | M | 1985-08-27 00:00:00+00:00
400001 | 1962-06-02 00:00:00+00:00 | Rosalie | Chinin | M | 1986-11-28 00:00:00+00:00
400002 | 1964-08-16 00:00:00+00:00 | Quingbo | Birnbaum | F | 1986-04-23 00:00:00+00:00
... snip!

(25 rows)

Time: 1.319ms
~~~

Compare the execution speed of the previous keyset pagination queries with the query below that uses `LIMIT` / `OFFSET` to get the same page of results:

{% include copy-clipboard.html %}
~~~ sql
SELECT * FROM employees LIMIT 25 OFFSET 200024;
~~~

~~~
emp_no | birth_date | first_name | last_name | gender | hire_date
+--------+---------------------------+------------+--------------+--------+---------------------------+
400000 | 1963-11-29 00:00:00+00:00 | Mitsuyuki | Reinhart | M | 1985-08-27 00:00:00+00:00
400001 | 1962-06-02 00:00:00+00:00 | Rosalie | Chinin | M | 1986-11-28 00:00:00+00:00
400002 | 1964-08-16 00:00:00+00:00 | Quingbo | Birnbaum | F | 1986-04-23 00:00:00+00:00
... snip!

(25 rows)

Time: 118.114ms
~~~

The query using `LIMIT`/`OFFSET` for pagination is almost 100 times slower. To see why, let's use [`EXPLAIN`](explain.html).

{% include copy-clipboard.html %}
~~~ sql
EXPLAIN SELECT * FROM employees LIMIT 25 OFFSET 200024;
~~~

~~~
tree | field | description
+-----------+-------------+-------------------+
| distributed | true
| vectorized | true
limit | |
│ | offset | 200024
└── scan | |
| table | employees@primary
| spans | ALL
| limit | 200049
~~~

The culprit is this: because we used `LIMIT`/`OFFSET`, we are performing a full table scan (see `spans` = `ALL` above) from the first record all the way up to the value of the offset. In other words, we are iterating over a big array of rows from 1 to *n*, where *n* is 200049.

Meanwhile, the keyset pagination queries are looking at a much smaller range of table spans, which is much faster (see `spans` = `300026-` + 25 below). Because [there is an index on every column in the `WHERE` clause](indexes.html#best-practices), these queries are doing an index lookup to jump to the start of the page of results, and then getting an additional 25 rows from there. This is much faster.

{% include copy-clipboard.html %}
~~~ sql
EXPLAIN SELECT * FROM employees WHERE emp_no > 300025 LIMIT 25;
~~~

~~~
tree | field | description
+------+-------------+-------------------+
| distributed | false
| vectorized | false
scan | |
| table | employees@primary
| spans | /300026-
| limit | 25
~~~

{{site.data.alerts.callout_danger}}
Using a sequential (i.e., non-[UUID](uuid.html)) primary key creates hot spots in the database for write-heavy workloads, since concurrent [`INSERT`](insert.html)s to the table will attempt to write to the same (or nearby) underlying ranges. This can be mitigated by designing your schema with [multi-column primary keys which include a monotonically increasing column](performance-best-practices-overview.html#use-multi-column-primary-keys).
{{site.data.alerts.end}}

{{site.data.alerts.callout_info}}
CockroachDB does not implement cursors. For a scale-out system like CockroachDB, using a cursor would not be recommended for the same reason that paginating with `LIMIT`/`OFFSET` is not recommended: it forces the server to keep track of state, which means the pagination queries don't scale well.
{{site.data.alerts.end}}

## Composability

Expand Down
4 changes: 4 additions & 0 deletions v19.2/limit-offset.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,10 @@ as part of [`INSERT`](insert.html) or [`UPSERT`](upsert.html).
`OFFSET` restricts the operation to skip the first `offset_value` number of rows.
It is often used in conjunction with `LIMIT` to "paginate" through retrieved rows.

{{site.data.alerts.callout_danger}}
Using `LIMIT`/`OFFSET` to implement pagination can be very slow for large tables. We recommend using [keyset pagination](selection-queries.html#pagination-example) instead.
{{site.data.alerts.end}}

For PostgreSQL compatibility, CockroachDB also supports `FETCH FIRST
limit_val ROWS ONLY` and `FETCH NEXT limit_val ROWS ONLY` as aliases
for `LIMIT`. If `limit_val` is omitted, then one row is fetched.
Expand Down
169 changes: 155 additions & 14 deletions v19.2/selection-queries.md
Original file line number Diff line number Diff line change
Expand Up @@ -338,26 +338,167 @@ LIMIT 5;

### Paginate through limited results

If you want to limit the number of results, but go beyond the initial set, use `OFFSET` to proceed to the next set of results. This is often used to paginate through large tables where not all of the values need to be immediately retrieved.
To iterate through a table one "page" of results at a time (also known as pagination) there are two options, only one of which is recommended:

- Keyset pagination (**fast, recommended**)
- `LIMIT` / `OFFSET` pagination (slow, not recommended)

Keyset pagination (also known as the "seek method") is used to fetch a subset of records from a table quickly. It does this by restricting the set of records returned with a combination of `WHERE` and [`LIMIT`](limit-offset.html) clauses. To get the next page, you check the value of the column in the `WHERE` clause against the last row returned in the previous page of results.

The general pattern for keyset pagination queries is:

{% include copy-clipboard.html %}
~~~ sql
> SELECT id, name
FROM accounts
LIMIT 5
OFFSET 5;
SELECT * FROM t
WHERE key > ${value}
ORDER BY key
LIMIT ${amount}
~~~

This is faster than using `LIMIT`/`OFFSET` because, instead of doing a full table scan up to the value of the `OFFSET`, a keyset pagination query looks at a fixed-size set of records for each iteration. This can be done quickly provided that the key used in the `WHERE` clause to implement the pagination is [indexed](indexes.html#best-practices) and [unique](unique.html). A [primary key](primary-key.html) meets both of these criteria.

#### Pagination example

The examples in this section use the [employees data set](https://github.com/datacharmer/test_db), which you can load into CockroachDB as follows:

{% include copy-clipboard.html %}
~~~ sql
CREATE DATABASE IF NOT EXISTS employees;
USE employees;
IMPORT PGDUMP 'https://s3-us-west-1.amazonaws.com/cockroachdb-movr/datasets/employees-db/pg_dump/employees-full.sql.gz';
~~~
+----+------------------+
| id | name |
+----+------------------+
| 6 | Juno Studwick |
| 7 | Juno Studwick |
| 8 | Eutychia Roberts |
| 9 | Ricarda Moriarty |
| 10 | Henrik Brankovic |
+----+------------------+

To get the first page of results using keyset pagination, run:

{% include copy-clipboard.html %}
~~~ sql
SELECT * FROM employees WHERE emp_no > 10000 LIMIT 25;
~~~

~~~
emp_no | birth_date | first_name | last_name | gender | hire_date
+--------+---------------------------+------------+-------------+--------+---------------------------+
10001 | 1953-09-02 00:00:00+00:00 | Georgi | Facello | M | 1986-06-26 00:00:00+00:00
10002 | 1964-06-02 00:00:00+00:00 | Bezalel | Simmel | F | 1985-11-21 00:00:00+00:00
10003 | 1959-12-03 00:00:00+00:00 | Parto | Bamford | M | 1986-08-28 00:00:00+00:00
... snip

(25 rows)

Time: 1.31ms
~~~

{{site.data.alerts.callout_success}}
If you don't know what the minimum value of the key is, either `SELECT min(key) FROM table` or use a known minimum value for the key's data type.
{{site.data.alerts.end}}

To get the second page of results, run:

{% include copy-clipboard.html %}
~~~ sql
SELECT * FROM employees WHERE emp_no > 10025 LIMIT 25;
~~~

~~~
emp_no | birth_date | first_name | last_name | gender | hire_date
+--------+---------------------------+------------+------------+--------+---------------------------+
10026 | 1953-04-03 00:00:00+00:00 | Yongqiao | Berztiss | M | 1995-03-20 00:00:00+00:00
10027 | 1962-07-10 00:00:00+00:00 | Divier | Reistad | F | 1989-07-07 00:00:00+00:00
10028 | 1963-11-26 00:00:00+00:00 | Domenick | Tempesti | M | 1991-10-22 00:00:00+00:00
... snip!

(25 rows)

Time: 1.473ms
~~~

To get an arbitrary page of results showing employees whose IDs (`emp_no`) are in a much higher range, try the following query. Note that it takes about the same amount of time to run as the previous queries.

{% include copy-clipboard.html %}
~~~ sql
SELECT * FROM employees WHERE emp_no > 300025 LIMIT 25;
~~~

~~~
emp_no | birth_date | first_name | last_name | gender | hire_date
+--------+---------------------------+------------+--------------+--------+---------------------------+
400000 | 1963-11-29 00:00:00+00:00 | Mitsuyuki | Reinhart | M | 1985-08-27 00:00:00+00:00
400001 | 1962-06-02 00:00:00+00:00 | Rosalie | Chinin | M | 1986-11-28 00:00:00+00:00
400002 | 1964-08-16 00:00:00+00:00 | Quingbo | Birnbaum | F | 1986-04-23 00:00:00+00:00
... snip!

(25 rows)

Time: 1.319ms
~~~

Compare the execution speed of the previous keyset pagination queries with the query below that uses `LIMIT` / `OFFSET` to get the same page of results:

{% include copy-clipboard.html %}
~~~ sql
SELECT * FROM employees LIMIT 25 OFFSET 200024;
~~~

~~~
emp_no | birth_date | first_name | last_name | gender | hire_date
+--------+---------------------------+------------+--------------+--------+---------------------------+
400000 | 1963-11-29 00:00:00+00:00 | Mitsuyuki | Reinhart | M | 1985-08-27 00:00:00+00:00
400001 | 1962-06-02 00:00:00+00:00 | Rosalie | Chinin | M | 1986-11-28 00:00:00+00:00
400002 | 1964-08-16 00:00:00+00:00 | Quingbo | Birnbaum | F | 1986-04-23 00:00:00+00:00
... snip!

(25 rows)

Time: 118.114ms
~~~

The query using `LIMIT`/`OFFSET` for pagination is almost 100 times slower. To see why, let's use [`EXPLAIN`](explain.html).

{% include copy-clipboard.html %}
~~~ sql
EXPLAIN SELECT * FROM employees LIMIT 25 OFFSET 200024;
~~~

~~~
tree | field | description
+-----------+-------------+-------------------+
| distributed | true
| vectorized | true
limit | |
│ | offset | 200024
└── scan | |
| table | employees@primary
| spans | ALL
| limit | 200049
~~~

The culprit is this: because we used `LIMIT`/`OFFSET`, we are performing a full table scan (see `spans` = `ALL` above) from the first record all the way up to the value of the offset. In other words, we are iterating over a big array of rows from 1 to *n*, where *n* is 200049.

Meanwhile, the keyset pagination queries are looking at a much smaller range of table spans, which is much faster (see `spans` = `300026-` + 25 below). Because [there is an index on every column in the `WHERE` clause](indexes.html#best-practices), these queries are doing an index lookup to jump to the start of the page of results, and then getting an additional 25 rows from there. This is much faster.

{% include copy-clipboard.html %}
~~~ sql
EXPLAIN SELECT * FROM employees WHERE emp_no > 300025 LIMIT 25;
~~~

~~~
tree | field | description
+------+-------------+-------------------+
| distributed | false
| vectorized | false
scan | |
| table | employees@primary
| spans | /300026-
| limit | 25
~~~

{{site.data.alerts.callout_danger}}
Using a sequential (i.e., non-[UUID](uuid.html)) primary key creates hot spots in the database for write-heavy workloads, since concurrent [`INSERT`](insert.html)s to the table will attempt to write to the same (or nearby) underlying [ranges](architecture/overview.html#architecture-range). This can be mitigated by designing your schema with [multi-column primary keys which include a monotonically increasing column](performance-best-practices-overview.html#use-multi-column-primary-keys).
{{site.data.alerts.end}}

{{site.data.alerts.callout_info}}
CockroachDB does not implement cursors. For a scale-out system like CockroachDB, using a cursor would not be recommended for the same reason that paginating with `LIMIT`/`OFFSET` is not recommended: it forces the server to keep track of state, which means the pagination queries don't scale well.
{{site.data.alerts.end}}

## Composability

Expand Down
4 changes: 4 additions & 0 deletions v20.1/limit-offset.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,10 @@ as part of [`INSERT`](insert.html) or [`UPSERT`](upsert.html).
`OFFSET` restricts the operation to skip the first `offset_value` number of rows.
It is often used in conjunction with `LIMIT` to "paginate" through retrieved rows.

{{site.data.alerts.callout_danger}}
Using `LIMIT`/`OFFSET` to implement pagination can be very slow for large tables. We recommend using [keyset pagination](selection-queries.html#pagination-example) instead.
{{site.data.alerts.end}}

For PostgreSQL compatibility, CockroachDB also supports `FETCH FIRST
limit_val ROWS ONLY` and `FETCH NEXT limit_val ROWS ONLY` as aliases
for `LIMIT`. If `limit_val` is omitted, then one row is fetched.
Expand Down
Loading