minor changes to architecture docs (#4329)

yugabyte · Apr 28, 2020 · 780dfe7 · 780dfe7
1 parent 1cb5dbb
commit 780dfe7
Show file tree

Hide file tree

Showing 19 changed files with 130 additions and 130 deletions.
diff --git a/docs/content/latest/architecture/_index.md b/docs/content/latest/architecture/_index.md
@@ -2,9 +2,9 @@
 title: Architecture
 headerTitle: Architecture
 linkTitle: Architecture
-description: Learn about the YugabyteDB architecture, including query, sharding, replication, transactions, and storage layers.
+description: Learn about the YugabyteDB architecture, including query, transactions, sharding, replication, and storage layers.
 image: /images/section_icons/index/architecture.png
-headcontent: YugabyteDB architecture including the query, sharding, replication, transactions, and storage layers.
+headcontent: YugabyteDB architecture including the query, transactions, sharding, replication, and storage layers.
 section: CONCEPTS
 menu:
   latest:
@@ -30,7 +30,7 @@ menu:
     <a class="section-link icon-offset" href="concepts/">
       <div class="head">
         <img class="icon" src="/images/section_icons/architecture/concepts.png" aria-hidden="true" />
-        <div class="articles">4 articles</div>
+        <div class="articles">3 articles</div>
         <div class="title">Key concepts</div>
       </div>
       <div class="body">
@@ -81,8 +81,8 @@ menu:
     <a class="section-link icon-offset" href="transactions/">
       <div class="head">
         <img class="icon" src="/images/section_icons/architecture/distributed_acid.png" aria-hidden="true" />
-        <div class="articles">4 articles</div>
-        <div class="title">Transactions layer</div>
+        <div class="articles">6 articles</div>
+        <div class="title">DocDB transactions layer</div>
       </div>
       <div class="body">
         Review transaction model and how transactional consistency is ensured at various isolation levels.
@@ -94,7 +94,7 @@ menu:
     <a class="section-link icon-offset" href="docdb-sharding/">
       <div class="head">
         <img class="icon" src="/images/section_icons/architecture/distributed_acid.png" aria-hidden="true" />
-        <div class="articles">4 articles</div>
+        <div class="articles">3 articles</div>
         <div class="title">DocDB sharding layer</div>
       </div>
       <div class="body">
@@ -107,7 +107,7 @@ menu:
     <a class="section-link icon-offset" href="docdb-replication/">
       <div class="head">
         <img class="icon" src="/images/section_icons/architecture/distributed_acid.png" aria-hidden="true" />
-        <div class="articles">4 articles</div>
+        <div class="articles">3 articles</div>
         <div class="title">DocDB replication layer</div>
       </div>
       <div class="body">
@@ -120,11 +120,11 @@ menu:
     <a class="section-link icon-offset" href="docdb/">
       <div class="head">
         <img class="icon" src="/images/section_icons/architecture/distributed_acid.png" aria-hidden="true" />
-        <div class="articles">5 articles</div>
-        <div class="title">DocDB store</div>
+        <div class="articles">3 articles</div>
+        <div class="title">DocDB storage layer</div>
       </div>
       <div class="body">
-          Sharding, replication, persistence, performance, and colocated tables.
+          How persistence storage works in DocDB.
       </div>
     </a>
   </div>

diff --git a/docs/content/latest/architecture/design-goals.md b/docs/content/latest/architecture/design-goals.md
@@ -25,43 +25,40 @@ YugabyteDB offers strong consistency guarantees in the face of a variety of fail
 
 In terms of the [CAP theorem](https://en.wikipedia.org/wiki/CAP_theorem), YugabyteDB is a CP database (consistent and partition tolerant), but achieves very high availability. The architectural design of YugabyteDB is similar to Google Cloud Spanner, which is also a CP system. The description about [Spanner](https://cloudplatform.googleblog.com/2017/02/inside-Cloud-Spanner-and-the-CAP-Theorem.html) is just as valid for YugabyteDB. The key takeaway is that no system provides 100% availability, so the pragmatic question is whether or not the system delivers availability that is so high that most users no longer have to be concerned about outages. For example, given there are many sources of outages for an application, if YugabyteDB is an insignificant contributor to its downtime, then users are correct to not worry about it.
 
-### Single-key linearizability
+### Single-row linearizability
 
-YugabyteDB supports single-key linearizable writes. Linearizability is one of the strongest single-key consistency models, and implies that every operation appears to take place atomically and in some total linear order that is consistent with the real-time ordering of those operations. In other words, the following should be true of operations on a single key: 
+YugabyteDB supports single-row linearizable writes. Linearizability is one of the strongest single-row consistency models, and implies that every operation appears to take place atomically and in some total linear order that is consistent with the real-time ordering of those operations. In other words, the following should be true of operations on a single row: 
 
 - Operations can execute concurrently, but the state of the database at any point in time must appear to be the result of some totally ordered, sequential execution of operations.
 - If operation A completes before operation B begins, then B should logically take effect after A.
 
-### Multi-key ACID transactions
+### Multi-row ACID transactions
 
-YugabyteDB supports multi-key transactions with both Serializable and Snapshot isolation.
+YugabyteDB supports multi-row transactions with both Serializable and Snapshot isolation.
 
-- The [YSQL](../../api/ysql/) API supports both Serializable and Snapshot Isolation using the PostgreSQL isolation level syntax of `SERIALIZABLE` and `REPEATABLE READS` respectively. Note that YSQL Serializable support was added in [v1.2.6](../../releases/v1.2.6/).
+- The [YSQL](../../api/ysql/) API supports both Serializable and Snapshot Isolation using the PostgreSQL isolation level syntax of `SERIALIZABLE` and `REPEATABLE READS` (default) respectively. 
 - The [YCQL](../../api/ycql/dml_transaction/) API supports only Snapshot Isolation using the `BEGIN TRANSACTION` syntax.
 
 {{< tip title="Read more about consistency" >}}
 
 - Achieving [consistency with Raft consensus](../docdb/replication/).
 - How [fault tolerance and high availability](../core-functions/high-availability/) are achieved.
-- [Single-key linearizable transactions](../transactions/single-row-transactions/) in YugabyteDB.
+- [Single-row linearizable transactions](../transactions/single-row-transactions/) in YugabyteDB.
 - The architecture of [distributed transactions](../transactions/single-row-transactions/).
 
 {{< /tip >}}
 
 ## Query APIs
 
-YugabyteDB does not reinvent storage APIs. It is wire-compatible with existing APIs and extends functionality. The following APIs are supported:
+YugabyteDB does not reinvent data client APIs. It is wire-compatible with existing APIs and extends functionality. The following APIs are supported.
 
-- **YSQL** — ANSI-SQL compliant and is wire-compatible with PostgreSQL
-- **YCQL** (or the *Yugabyte Cloud Query Language*) which is a semi-relational API with Cassandra roots
+### YSQL
 
-### Distributed SQL
-
-The YSQL API is PostgreSQL-compatible as noted before. It reuses PostgreSQL code base.
+[YSQL](../../api/ysql/) is a fully-relational SQL API that is wire compatible with the SQL language in PostgreSQL. It is best fit for RDBMS workloads that need horizontal write scalability and global data distribution while also using relational modeling features such as JOINs, distributed transactions and referential integrity (such as foreign keys). Note that [reuses the native query layer](https://blog.yugabyte.com/why-we-built-yugabytedb-by-reusing-the-postgresql-query-layer/) of the PostgreSQL open source project.
 
 - New changes do not break existing PostgreSQL functionality
 
-- Designed with migrations to newer PostgreSQL versions over time as an explicit goal. This means that new features are implemented in a modular fashion in the YugabyteDB codebase to enable rapidly integrating with new PostgreSQL features in an on-going fashion.
+- Designed with migrations to newer PostgreSQL versions over time as an explicit goal. This means that new features are implemented in a modular fashion in the YugabyteDB codebase to enable rapid integration with new PostgreSQL features as an ongoing process.
 
 - Support wide SQL functionality:
   - All data types
@@ -74,6 +71,10 @@ The YSQL API is PostgreSQL-compatible as noted before. It reuses PostgreSQL code
   - Stored procedures
   - Triggers
 
+### YCQL
+
+[YCQL](../../api/ycql/) is a SQL-based flexible-schema API that is best fit for internet-scale OLTP apps needing a semi-relational API highly optimized for write-intensive applications as well as blazing-fast queries. It supports distributed transactions, strongly consistent secondary indexes and a native JSON column type. YCQL has its roots in the Cassandra Query Language. 
+
 {{< tip title="Read more" >}}
 
 Understanding [the design of the query layer](../query-layer/overview/).
@@ -93,9 +94,9 @@ Written in C++ to ensure high performance and the ability to leverage large memo
 Achieving [high performance in YugabyteDB](../docdb/performance/).
 {{< /tip >}}
 
-## Geo-distributed
+## Geo-distributed deployments
 
-### Multi-region deployments
+### Multi-region configurations
 
 YugabyteDB should work well in deployments where the nodes of the cluster span:
 
@@ -109,7 +110,7 @@ In order to achieve this, a number of features would be required. For example, c
 - Cluster-aware, with ability to handle node failures seamlessly
 - Topology-aware, with ability to route traffic seamlessly
 
-## Cloud native
+## Cloud native architecture
 
 YugabyteDB is a cloud-native database. It has been designed with the following cloud-native principles in mind:
 

diff --git a/docs/content/latest/architecture/docdb-replication/_index.md b/docs/content/latest/architecture/docdb-replication/_index.md
@@ -6,27 +6,27 @@ description: Learn about the YugabyteDB distributed document store that is respo
 image: /images/section_icons/architecture/concepts.png
 aliases:
   - /latest/architecture/docdb/replication/
-headcontent: YugabyteDB distributed document store responsible for sharding, replication, transactions, and persistence.
+headcontent: DocDB is YugabyteDB's distributed document store responsible for transactions, sharding, replication, and persistence.
 menu:
   latest:
     identifier: architecture-docdb-replication
     parent: architecture
     weight: 1135
 ---
 
-This section describes how replication works in DocDB. The data in a DocDB table is split into tablets. By default, each tablet is synchronously replicated using the Raft algorithm across various nodes or fault domains (such as availability zones/racks/regions/cloud providers). 
+{{< note title="Note" >}}
 
+* YugabyteDB's synchronous replication architecture is inspired by <a href="https://research.google.com/archive/spanner-osdi2012.pdf">Google Spanner</a>. 
+* YugabyteDB asynchronous replication architecture is inspired by RDBMS databases such as Oracle, MySQL and PostgreSQL.
 
-There are other advanced replication features in YugabyteDB. These include two forms of asynchronous replication of data:
-* **xCluster Replication** Data is asynchronously replicated between different YugabyteDB clusters - both unidirectional replication (master-slave) or  bidirectional replication across two clusters.
-* **Read replicas** The in-cluster asynchronous replicas are called read replicas.
+{{</note >}}
 
-{{< note title="Note" >}}
+This section describes how replication works in DocDB. The data in a DocDB table is split into tablets. By default, each tablet is synchronously replicated using the Raft algorithm across various nodes or fault domains (such as availability zones/racks/regions/cloud providers). 
 
-* Synchronous replication in YugabyteDB synchronous replication architecture is inspired by <a href="https://research.google.com/archive/spanner-osdi2012.pdf">Google Spanner</a>. 
-* Asynchronous replication in YugabyteDB is inspired by RDBMS databases such as Oracle, MySQL and PostgreSQL.
+There are other advanced replication features in YugabyteDB. These include two forms of asynchronous replication of data:
+* **xCluster replication** Data is asynchronously replicated between different YugabyteDB clusters - both unidirectional replication (master-slave) or  bidirectional replication across two clusters.
+* **Read replicas** The in-cluster asynchronous replicas are called read replicas.
 
-{{</note >}}
 
 <div class="row">
 
@@ -37,12 +37,11 @@ There are other advanced replication features in YugabyteDB. These include two f
         <div class="title">Default replication</div>
       </div>
       <div class="body">
-        Replicating the data in every table with Raft consensus.
+        In-cluster synchronous replication with Raft consensus.
       </div>
     </a>
   </div>
 
-
   <div class="col-12 col-md-6 col-lg-12 col-xl-6">
     <a class="section-link icon-offset" href="xcluster-replication/">
       <div class="head">

diff --git a/docs/content/latest/architecture/docdb-replication/async-replication.md b/docs/content/latest/architecture/docdb-replication/async-replication.md
@@ -1,7 +1,7 @@
 ---
-title: xCluster Replication
-headerTitle: xCluster Replication
-linkTitle: xCluster Replication
+title: xCluster replication
+headerTitle: xCluster replication
+linkTitle: xCluster replication
 description: Asynchronous replication between multiple YugabyteDB clusters.
 aliases:
   - /latest/architecture/docdb/2dc-deployments/

diff --git a/docs/content/latest/architecture/docdb-replication/replication.md b/docs/content/latest/architecture/docdb-replication/replication.md
@@ -16,8 +16,7 @@ isTocNested: true
 showAsideToc: true
 ---
 
-DocDB replicates data in order to survive failures while continuing to maintain consistency of
-data and not requiring operator intervention.
+DocDB automatically replicates data synchronously in order to survive failures while maintaining data consistency and avoiding operator intervention. It does so using the Raft distributed consensus protocol.
 
 ## Overview
 
@@ -49,7 +48,6 @@ Replication of data in DocDB is achieved at the level of tablets, using **tablet
 
 Each tablet comprises of a set of tablet-peers - each of which stores one copy of the data belonging to the tablet. There are as many tablet-peers for a tablet as the replication factor, and they form a Raft group. The tablet-peers are hosted on different nodes to allow data redundancy on node failures. Note that the replication of data between the tablet-peers is **strongly consistent**.
 
-
 The figure below illustrates three tablet-peers that belong to a tablet (tablet 1). The tablet-peers are hosted on different YB-TServers and form a Raft group for leader election, failure detection and replication of the write-ahead logs.
 
 ![raft_replication](/images/architecture/raft_replication.png)

diff --git a/docs/content/latest/architecture/docdb-sharding/_index.md b/docs/content/latest/architecture/docdb-sharding/_index.md
@@ -6,24 +6,26 @@ description: Learn about the YugabyteDB distributed document store that is respo
 image: /images/section_icons/architecture/concepts.png
 aliases:
   - /latest/architecture/docdb/sharding
-headcontent: How automatic sharding of data works in YugabyteDB.
+headcontent: DocDB is YugabyteDB's distributed document store responsible for transactions, sharding, replication, and persistence.
 menu:
   latest:
     identifier: architecture-docdb-sharding
     parent: architecture
     weight: 1130
 ---
 
+{{< note title="Note" >}}
+
+YugabyteDB's sharding architecture is inspired by <a href="https://research.google.com/archive/spanner-osdi2012.pdf">Google Spanner</a>.
+
+{{</note >}}
+
 This section describes how sharding works in DocDB. A distributed SQL database needs to automatically partition the data in a table and distribute it across nodes. This is known as data sharding and it can be achieved through different strategies, each with its own tradeoffs.
 
 Data sharding helps in scalability and geo-distribution by horizontally partitioning data. A SQL table is decomposed into multiple sets of rows according to a specific sharding strategy. Each of these sets of rows is called a shard. These shards are distributed across multiple server nodes (containers, VMs, bare-metal) in a shared-nothing architecture. This ensures that the shards do not get bottlenecked by the compute, storage and networking resources available at a single node. High availability is achieved by replicating each shard across multiple nodes. However, the application interacts with a SQL table as one logical unit and remains agnostic to the physical placement of the shards.
 
 DocDB supports range and hash sharding natively.
 
-{{< note title="Note" >}}
-Read more about the [tradeoffs in the various sharding strategies considered](https://blog.yugabyte.com/four-data-sharding-strategies-we-analyzed-in-building-a-distributed-sql-database/).
-{{</note >}}
-
 
 <div class="row">
   <div class="col-12 col-md-6 col-lg-12 col-xl-6">
@@ -50,4 +52,16 @@ Read more about the [tradeoffs in the various sharding strategies considered](ht
     </a>
   </div>
 
+  <div class="col-12 col-md-6 col-lg-12 col-xl-6">
+    <a class="section-link icon-offset" href="colocated-tables/">
+      <div class="head">
+        <img class="icon" src="/images/section_icons/explore/linear_scalability.png" aria-hidden="true" />
+        <div class="title">Colocated tables</div>
+      </div>
+      <div class="body">
+        Scaling number of relations by colocating data.
+      </div>
+    </a>
+  </div
+
 </div>
diff --git a/...st/architecture/docdb/colocated-tables.md → ...ecture/docdb-sharding/colocated-tables.md b/...st/architecture/docdb/colocated-tables.md → ...ecture/docdb-sharding/colocated-tables.md
@@ -8,8 +8,8 @@ aliases:
 menu:
   latest:
     identifier: docdb-colocated-tables
-    parent: docdb
-    weight: 1200
+    parent: architecture-docdb-sharding
+    weight: 1144
 isTocNested: false
 showAsideToc: true
 ---

diff --git a/docs/content/latest/architecture/docdb-sharding/sharding.md b/docs/content/latest/architecture/docdb-sharding/sharding.md
@@ -1,7 +1,7 @@
 ---
-title: Hash and range sharding
-headerTitle: Hash and range sharding
-linkTitle: Hash and range sharding
+title: Hash & range sharding
+headerTitle: Hash & range sharding
+linkTitle: Hash & range sharding
 description: Learn how YugabyteDB uses hash and range sharding for horizontal scaling.
 aliases:
   - /latest/architecture/docdb/sharding/

diff --git a/docs/content/latest/architecture/docdb-sharding/tablet-splitting.md b/docs/content/latest/architecture/docdb-sharding/tablet-splitting.md
@@ -1,7 +1,7 @@
 ---
-title: Tablet Splitting
-headerTitle: Tablet Splitting
-linkTitle: Tablet Splitting
+title: Tablet splitting
+headerTitle: Tablet splitting
+linkTitle: Tablet splitting
 description: Learn how YugabyteDB splits tablets.
 menu:
   latest:
@@ -16,16 +16,18 @@ Tablet splitting enables changing the number of tablets at runtime. This enables
 
 ## Overview
 
-There are a number of scenarios where this is useful:
+There are a number of scenarios where this is useful.
 
-### Range Scans
+### Range scans
 In use-cases that scan a range of data, the data is stored in the natural sort order (also known as range-sharding). In these usage patterns, it is often impossible to predict a good split boundary ahead of time. For example:
 
+```
 CREATE TABLE census_stats (
     age INTEGER,
     user_id INTEGER,
     ...
     );
+```    
 In the table above, it is not possible for the database to infer the range of values for age (typically in the 1 to 100 range). It is also impossible to predict the distribution of rows in the table, meaning how many user_id rows will be inserted for each value of age to make an evenly distributed data split. This makes it hard to pick good split points ahead of time.
 
 ### Low-cardinality primary keys
@@ -35,7 +37,7 @@ In use-cases with a low-cardinality of the primary keys (or the secondary index)
 This feature is also useful for use-cases where tables begin small, and thereby start with a few shards. If these tables grow very large, then nodes continuously get added to the cluster. We may reach a scenario where the number of nodes exceeds the number of tablets. Such cases require tablet splitting to effectively re-balance the cluster.
 
 
-## Ways to split tablets
+## Approaches to tablet splitting
 
 DocDB allows the following mechanisms to re-shard data by splitting tablets:
 
@@ -58,7 +60,6 @@ As the name suggests, the table is split at creation time into a desired number
     num_tablets_in_table = num_tablets_per_node * num_nodes_at_table_creation_time
     ```
 
-
 {{< note title="Note" >}}
 
 In order to pre-split a table into a desired number of tablets, we need the start and end key for each tablet. This makes pre-splitting slightly different for hash-sharded and range-sharded tables.
@@ -203,3 +204,7 @@ There are a few important things to note here.
 
 {{</note >}}
 
+## Dynamic splitting
+
+Dynamic splitting is currently a [work-in-progress](https://github.com/yugabyte/yugabyte-db/issues/1004) feature and is not yet available. Design for this feature can be reviewed on [GitHub](https://github.com/yugabyte/yugabyte-db/blob/master/architecture/design/docdb-automatic-tablet-splitting.md).
+