Skip to content

Commit

Permalink
minor fixes (yugabyte#4162)
Browse files Browse the repository at this point in the history
  • Loading branch information
schoudhury authored Apr 7, 2020
1 parent 8f0c9dd commit bc751e2
Show file tree
Hide file tree
Showing 13 changed files with 81 additions and 100 deletions.
2 changes: 1 addition & 1 deletion docs/content/latest/admin/yb-ts-cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
title: yb-ts-cli - command line tool for advanced yb-tserver operations
headerTitle: yb-ts-cli
linkTitle: yb-ts-cli
description: Use yb-ts-cli to perform advanced operations on YB-TServer nodes.
description: Use yb-ts-cli to perform advanced operations on YB-TServer.
menu:
latest:
identifier: yb-ts-cli
Expand Down
2 changes: 1 addition & 1 deletion docs/content/latest/architecture/concepts/_index.html
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
title: Key concepts
headerTitle: Key concepts
linkTitle: Key concepts
description: Learn key concepts essential to understanding YugabyteDB.
description: Learn about the query, sharding, replication, and persistence layers in YugabyteDB.
image: /images/section_icons/architecture/concepts.png
headcontent: Learn about the query, sharding, replication, and persistence layers in YugabyteDB.
menu:
Expand Down
12 changes: 3 additions & 9 deletions docs/content/latest/architecture/concepts/universe.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,16 +73,10 @@ Below is an illustration of a simple 4-node YugabyteDB universe:

## Universe vs cluster

A YugabyteDB universe can comprise of one or more clusters. Each cluster is a logical group of nodes running YB-TServer services that are performing one of the following replication modes:
A YugabyteDB universe comprises of exactly one primary cluster and zero or more read replica clusters.

- Synchronous replication
- Asynchronous replication
- A primary cluster can perform both writes and reads. Replication between nodes in a primary cluster is performed synchronously.

The set of nodes that are performing strong replication are referred to as the **primary cluster** and other groups are called **read replica clusters**.

Note that:

- There is always one primary cluster in a universe.
- There can be zero or more read replica clusters in that universe.
- Read replica clusters can perform only reads. Writes sent to read replica clusters get automatically rerouted to the primary cluster for the universe. These clusters help in powering reads in regions that are far away from the primary cluster with timeline-consistent data. This ensures low latency reads for geo-distributed applications. Data is brought into the read replica clusters through asynchronous replication from the primary cluster. In other words, nodes in a read replica cluster act as Raft observers that do not participate in the write path involing the Raft leader and Raft followers present in the primary cluster.

For more information about read replica clusters, see [read replicas](../../docdb/replication/#read-only-replicas).
4 changes: 2 additions & 2 deletions docs/content/latest/deploy/checklist.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
---
title: Deploy checklist for YugabyteDB clusters
title: Deployment checklist for YugabyteDB clusters
headerTitle: Deployment checklist
linkTitle: Deployment checklist
description: Deployment checklist for multi-node YugabyteDB clusters for production and performance testing
description: Deployment checklist for multi-node YugabyteDB clusters used for production and performance testing
aliases:
- /deploy/checklist/
menu:
Expand Down
4 changes: 2 additions & 2 deletions docs/content/latest/deploy/multi-dc/3dc-deployment.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ Follow the [installation instructions](../../../deploy/manual-deployment/install

## 2. Start YB-Masters

Run the `yb-master` server on each of the nodes as shown below. Note how multiple directories can be provided to the `--fs_data_dirs` option. Replace the [`--rpc_bind_addresses`]((../../../reference/configuration/yb-master/#rpc-bind-addresses) value with the private IP address of the host as well as the set the `--placement_cloud`,`--placement_region` and `--placement_zone` values appropriately.
Run the `yb-master` server on each of the nodes as shown below. Note how multiple directories can be provided to the `--fs_data_dirs` option. Replace the [`--rpc_bind_addresses`](../../../reference/configuration/yb-master/#rpc-bind-addresses) value with the private IP address of the host as well as the set the `--placement_cloud`,`--placement_region` and `--placement_zone` values appropriately.

```sh
$ ./bin/yb-master \
Expand All @@ -54,7 +54,7 @@ $ ./bin/yb-master \
>& /home/centos/disk1/yb-master.out &
```

Note that we also set the [`--leader_failure_max_missed_heartbeat_periods`](../../../reference/configuration/yb-master/#leader-failure-max-missed-heartbeat-periods) option to `10`. This option specifies the maximum heartbeat periods that the leader can fail to heartbeat before the leader is considered to be failed. Since the data is geo-replicated across data centers, RPC latencies are expected to be higher. We use this flag to increase the failure detection interval in such a higher RPC latency deployment. Note that the total failure timeout is now 5 seconds since it is computed by multiplying [`--raft_heartbeat_interval_ms`](../../../reference/configuration/yb-master/#raft-heartbeat-interval-ms) (default of 500ms) with [`--leader_failure_max_missed_heartbeat_periods`]((../../../reference/configuration/yb-master/#leader-failure-max-missed-heartbeat-periods) (current value of `10`).
Note that we also set the [`--leader_failure_max_missed_heartbeat_periods`](../../../reference/configuration/yb-master/#leader-failure-max-missed-heartbeat-periods) option to `10`. This option specifies the maximum heartbeat periods that the leader can fail to heartbeat before the leader is considered to be failed. Since the data is geo-replicated across data centers, RPC latencies are expected to be higher. We use this flag to increase the failure detection interval in such a higher RPC latency deployment. Note that the total failure timeout is now 5 seconds since it is computed by multiplying [`--raft_heartbeat_interval_ms`](../../../reference/configuration/yb-master/#raft-heartbeat-interval-ms) (default of 500ms) with [`--leader_failure_max_missed_heartbeat_periods`](../../../reference/configuration/yb-master/#leader-failure-max-missed-heartbeat-periods)(current value of `10`).

For the full list of configuration options, see the [YB-Master reference](../../../reference/configuration/yb-master/).

Expand Down
13 changes: 13 additions & 0 deletions docs/content/latest/develop/_index.html
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,19 @@
</a>
</div>

<div class="col-12 col-md-6 col-lg-12 col-xl-6">
<a class="section-link icon-offset" href="best-practices/">
<div class="head">
<img class="icon" src="/images/section_icons/develop/real-world-apps.png" aria-hidden="true" />
<div class="articles">3 articles</div>
<div class="title">Best practices</div>
</div>
<div class="body">
Best practices for data modeling and cluster configuration.
</div>
</a>
</div>

<!-- <a class="section-link icon-offset" href="port-existing-apps/">
<div class="icon">
<i class="fas fa-sign-in" aria-hidden="true"></i>
Expand Down
71 changes: 20 additions & 51 deletions docs/content/latest/develop/best-practices-ycql.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,14 @@
---
title: Best practices
title: Best practices for YCQL
linkTitle: Best practices
description: Best practices when using YugabyteDB
description: Best practices for YCQL
menu:
latest:
identifier: best-practices-ycql
parent: develop
weight: 582
aliases:
- /latest/develop/best-practices/
isTocNested: 4
showAsideToc: true
---
Expand All @@ -24,93 +26,60 @@ showAsideToc: true
## Core Features

### Global secondary indexes
Indexes use multi-shard transactional capability of YugabyteDB and are global and strongly consistent (ACID).
To add secondary indexes you need to create tables with [transactions enabled](../api/ycql/ddl_create_table.md#table-properties-1).
They can also be used as materialized views by using the `INCLUDE` [clause](../../api/ycql/ddl_create_index#included-columns).
Indexes use multi-shard transactional capability of YugabyteDB and are global and strongly consistent (ACID). To add secondary indexes you need to create tables with [transactions enabled](../api/ycql/ddl_create_table.md#table-properties-1). They can also be used as materialized views by using the `INCLUDE` [clause](../../api/ycql/ddl_create_index#included-columns).

### Unique indexes
YCQL supports [unique indexes](../../api/ycql/ddl_create_index#unique-index).
A unique index disallows duplicate values from being inserted into the indexed columns.
YCQL supports [unique indexes](../../api/ycql/ddl_create_index#unique-index). A unique index disallows duplicate values from being inserted into the indexed columns.

### Covered indexes
When querying by a secondary index, the original table is consulted to get the columns that aren't specified in the
index. This can result in multiple random reads across the main table.
When querying by a secondary index, the original table is consulted to get the columns that aren't specified in the index. This can result in multiple random reads across the main table.

Sometimes a better way is to include the other columns that we're querying that are not part of the index
using the [`INCLUDE`](../api/ycql/ddl_create_index.md#included-columns) clause.
When additional columns are included in the index, they can be used to respond to queries directly from the index without querying the table.
Sometimes a better way is to include the other columns that we're querying that are not part of the index using the [`INCLUDE`](../api/ycql/ddl_create_index.md#included-columns) clause. When additional columns are included in the index, they can be used to respond to queries directly from the index without querying the table.

This turns a (possible) random read from the main table to just a filter on the index.

### Atomic read modify write operations with UPDATE IF EXISTS
Operations like `UPDATE ... IF EXISTS`, `INSERT ... IF NOT EXISTS` which require an atomic read-modify-write,
Apache Cassandra uses LWT which requires 4 round-trips between peers. These operations are supported in YugabyteDB a
lot more efficiently, because of YugabyteDB's CP (in the CAP theorem) design based on strong consistency,
and require only 1 Raft-round trip between peers. Number & Counter types work the same and don't need a separate "counters" table.
Operations like `UPDATE ... IF EXISTS`, `INSERT ... IF NOT EXISTS` which require an atomic read-modify-write, Apache Cassandra uses LWT which requires 4 round-trips between peers. These operations are supported in YugabyteDB a lot more efficiently, because of YugabyteDB's CP (in the CAP theorem) design based on strong consistency, and require only 1 Raft-round trip between peers. Number & Counter types work the same and don't need a separate "counters" table.

### JSONB document datatype
YugabyteDB has [`jsonb`](https://docs.yugabyte.com/latest/api/ycql/type_jsonb/) datatype that makes it easy to model
json data which does not have a set schema and might change often.
You can use jsonb to group less interesting / lesser accessed columns of a table.
YCQL also supports JSONB expression indexes that can be used to speed up data retrieval that would otherwise require scanning the json entries.

YugabyteDB has [`jsonb`](https://docs.yugabyte.com/latest/api/ycql/type_jsonb/) datatype that makes it easy to model json data which does not have a set schema and might change often. You can use jsonb to group less interesting / lesser accessed columns of a table. YCQL also supports JSONB expression indexes that can be used to speed up data retrieval that would otherwise require scanning the json entries.

{{< note title="Use jsonb columns only when necessary" >}}

`jsonb` columns are slower to read/write compared to normal columns.
They also take more space because they need to store keys in strings and make keeping data consistency harder.
A good schema design is to keep most columns as regular ones or collections, and only using `jsonb` for truly dynamic values.
Don't create a `data jsonb` column where you put everything, but a `dynamic_data jsonb` column and other ones being
primitive columns.
`jsonb` columns are slower to read/write compared to normal columns. They also take more space because they need to store keys in strings and make keeping data consistency harder. A good schema design is to keep most columns as regular ones or collections, and only using `jsonb` for truly dynamic values. Don't create a `data jsonb` column where you store everything, but a `dynamic_data jsonb` column and other ones being primitive columns.

{{< /note >}}


### Incrementing numeric types
We've extend Apache Cassandra to support increment and decrement operators for integer data types.
[Integers](../../api/ycql/type_int) can be set, inserted, incremented, and decremented while `COUNTER` can only be incremented or decremented.
YugabyteDB implements CAS(compare and swap) operations in one round trip, compared to 4 for Apache Cassandra.

We've extend Apache Cassandra to support increment and decrement operators for integer data types. [Integers](../../api/ycql/type_int) can be set, inserted, incremented, and decremented while `COUNTER` can only be incremented or decremented. YugabyteDB implements CAS(compare and swap) operations in one round trip, compared to 4 for Apache Cassandra.

### Expire older records automatically with TTL
YCQL supports automatic expiry of data using the [`TTL feature`](../api/ycql/ddl_create_table.md#use-table-property-to-define-the-default-expiration-time-for-rows).
You can set a retention policy for data at table/row/column level and the older data is automatically purged from the DB.
YCQL supports automatic expiry of data using the [`TTL feature`](../api/ycql/ddl_create_table.md#use-table-property-to-define-the-default-expiration-time-for-rows). You can set a retention policy for data at table/row/column level and the older data is automatically purged from the DB.

{{< note title="Note" >}}
TTL doesn't work with transactional tables.
TTL is not applicable to transactional tables and hence is not supported in that context.
{{< /note >}}


## Performance

### Use YugabyteDB drivers
Use YugabyteDB specific [client drivers](../../quick-start/build-apps/) because they are cluster and partition aware and support `jsonb` columns.

### Leverage connection pooling in the YCQL client
Single client (say a multi-threaded application) should ideally use a single cluster object.
The single cluster object typically holds underneath the covers a configurable number of connections to yb-tservers.
Typically 1 or 2 per TServer suffices to serve even 64-128 application threads).
The same connection can be used for multiple outstanding requests, also known as multiplexing.
Single client (say a multi-threaded application) should ideally use a single cluster object. The single cluster object typically holds underneath the covers a configurable number of connections to yb-tservers. Typically 1 or 2 per TServer suffices to serve even 64-128 application threads). The same connection can be used for multiple outstanding requests, also known as multiplexing.

### Use prepared statements
Use prepared statements wherever possible. This will ensure that YB partition aware drivers are able to route
queries to the tablet leader, improve throughput and server doesn't have to parse the query on each operation.
Use prepared statements wherever possible. This will ensure that YB partition aware drivers are able to route queries to the tablet leader, improve throughput and server doesn't have to parse the query on each operation.

### Use batching for higher throughput
Use batching for writing a set of operations. This will send all operations in a single RPC call instead of using multiple RPC calls, one per operation.
Each batch operation has higher latency compared to single rows operations but has higher throughput overall.
Use batching for writing a set of operations. This will send all operations in a single RPC call instead of using multiple RPC calls, one per operation. Each batch operation has higher latency compared to single rows operations but has higher throughput overall.

### Column and row sizes
For consistent latency/performance, we suggest keeping columns in the `2MB` range
or less even though we support an individual column being about `32MB`.
For consistent latency/performance, we suggest keeping columns in the `2MB` range or less even though we support an individual column being about `32MB`.

Big columns add up when selecting full rows or multiple of them.
For consistent latency/performance, we suggest keeping the size of rows in the `32MB` range
or less.
Big columns add up when selecting full rows or multiple of them. For consistent latency/performance, we suggest keeping the size of rows in the `32MB` range or less.

## Miscellaneous

### Use `TRUNCATE` to empty tables instead of `DELETE`
`TRUNCATE` deletes the database files that store the table and is very fast.
While DELETE inserts a `delete marker` for each row in transactions and they are removed from storage when a compaction
runs.
`TRUNCATE` deletes the database files that store the table and is very fast. While DELETE inserts a `delete marker` for each row in transactions and they are removed from storage when a compaction runs.
12 changes: 0 additions & 12 deletions docs/content/latest/faq/compatibility.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,18 +50,6 @@ The YugabyteDB APIs are currently isolated and independent from one another. Dat

Allowing YCQL tables to be accessed from the PostgreSQL-compatible YSQL API as foreign tables using foreign data wrappers (FDW) is on the roadmap. You can comment or increase the priority of the associated [GitHub](https://github.com/yugabyte/yugabyte-db/issues/830) issue.

## When should I pick YCQL over YSQL?

You should pick YCQL over YSQL if your application:

- Does not require fully-relational data modeling constructs, such as foreign keys and JOINs. Note that strongly-consistent secondary indexes and unique constraints are supported by YCQL.
- Requires storing large amounts of data (for example, 10TB or more).
- Needs to serve low-latency (sub-millisecond) queries.
- Needs TTL-driven automatic data expiration.
- Needs to integrate with stream processors, such as Apache Spark and KSQL.

If you have a specific use case in mind, share it in our [Slack community](https://www.yugabyte.com/slack) and the community can help you decide the best approach.

## YCQL compatibility with Apache Cassandra QL

YCQL is compatible with v3.4 of Apache Cassandra QL (CQL). Following questions highlight how YCQL differs from CQL.
Expand Down
Loading

0 comments on commit bc751e2

Please sign in to comment.