Skip to content
This repository has been archived by the owner on Sep 21, 2022. It is now read-only.

Commit

Permalink
Add empty lines before bulleted lists in Markdown.
Browse files Browse the repository at this point in the history
The 'recarpet' Markdown converter we use doesn't convert bulleted lists
if they aren't preceeded by empty lines.
  • Loading branch information
enisoc committed May 11, 2015
1 parent ec7a73a commit cbc39cd
Show file tree
Hide file tree
Showing 16 changed files with 118 additions and 10 deletions.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ and a more [detailed presentation from @Scale '14](http://youtu.be/5yDO-tmIoXY).
## Documentation

### Intro

* [Helicopter overview](http://vitess.io):
high level overview of Vitess that should tell you whether Vitess is for you.
* [Sharding in Vitess](http://vitess.io/doc/Sharding)
Expand Down
6 changes: 6 additions & 0 deletions doc/Concepts.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
# Concepts

We need to introduce some common terminologies that are used in Vitess:

### Keyspace

A keyspace is a logical database.
In its simplest form, it directly maps to a MySQL database name.
When you read data from a keyspace, it is as if you read from a MySQL database.
Expand Down Expand Up @@ -30,6 +33,7 @@ eventual consistency guarantees), run data analysis tools that take a long time
### Tablet

A tablet is a single server that runs:

* a MySQL instance
* a vttablet instance
* a local row cache instance
Expand All @@ -38,6 +42,7 @@ A tablet is a single server that runs:
It can be idle (not assigned to any keyspace), or assigned to a keyspace/shard. If it becomes unhealthy, it is usually changed to scrap.

It has a type. The commonly used types are:

* master: for the mysql master, RW database.
* replica: for a mysql slave that serves read-only traffic, with guaranteed low replication latency.
* rdonly: for a mysql slave that serves read-only traffic for backend processing jobs (like map-reduce type jobs). It has no real guaranteed replication latency.
Expand Down Expand Up @@ -107,6 +112,7 @@ There is one local instance of that service per Cell (Data Center). The goal is
using the remaining Cells. (a Zookeeper instance running on 3 or 5 hosts locally is a good configuration).

The data is partitioned as follows:

* Keyspaces: global instance
* Shards: global instance
* Tablets: local instances
Expand Down
4 changes: 0 additions & 4 deletions doc/GettingStarted.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,10 +25,6 @@ Other images can be built with scripts located in the

## Manual Build





### Dependencies

* We currently develop on Ubuntu 14.04 (Trusty) and Debian 7.0 (Wheezy).
Expand Down
2 changes: 2 additions & 0 deletions doc/HorizontalReshardingGuide.md
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,7 @@ A single SplitDiff vtworker can be run by, for example:
`vtworker -min_healthy_rdonly_endpoints=1 --cell=test SplitDiff test_keyspace/-80`

After completion, the source and destination rdonly tablets need to be put back into the serving graph:

```
vtctl ChangeSlaveType <source rdonly tablet alias> rdonly
vtctl ChangeSlaveType <destination rdonly tablet alias> rdonly
Expand Down Expand Up @@ -122,6 +123,7 @@ vtctl MigrateServedTypes -reverse test_keyspace/0 replica
## Scrap the source shard

If all the above steps were successful, it’s safe to remove the source shard (which should no longer be in use):

* For each tablet in the source shard: `vtctl ScrapTablet <source tablet alias>`
* For each tablet in the source shard: `vtctl DeleteTablet <source tablet alias>`
* Rebuild the serving graph: `vtctl RebuildKeyspaceGraph test_keyspace`
Expand Down
1 change: 1 addition & 0 deletions doc/Production.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# Production setup
Setting up vitess in production will depend on many factors.
Here are some initial considerations:

* *Global Transaction IDs*: Vitess requires a version of MySQL
that supports GTIDs.
We currently support MariaDB 10.0 and MySQL 5.6.
Expand Down
7 changes: 6 additions & 1 deletion doc/Reparenting.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,7 @@ live system, it errs on the side of safety, and will abort if any
tablet is not responding right.

The actions performed are:

* any existing tablet replication is stopped. If any tablet fails
(because it is not available or not succeeding), we abort.
* the master-elect is initialized as a master.
Expand All @@ -69,6 +70,7 @@ This command is used when both the current master and the new master
are alive and functioning properly.

The actions performed are:

* we tell the old master to go read-only. It then shuts down its query
service. We get its replication position back.
* we tell the master-elect to wait for that replication data, and then
Expand All @@ -80,7 +82,7 @@ The actions performed are:
wait for the entry in the test table. (if a slave wasn't
replicating, we don't change its state and don't start replication
after reparent)
- additionally, on the old master, we start replication, so it catches up.
* additionally, on the old master, we start replication, so it catches up.

The old master is left as 'spare' in this scenario. If health checking
is enabled on that tablet (using target\_tablet\_type parameter for
Expand All @@ -96,6 +98,7 @@ just make sure the master-elect is the most advanced in replication
within all the available slaves, and reparent everybody.

The actions performed are:

* if the current master is still alive, we scrap it. That will make it
stop what it's doing, stop its query service, and be unusable.
* we gather the current replication position on all slaves.
Expand All @@ -122,6 +125,7 @@ servers. We then trigger the 'vtctl TabletExternallyReparented'
command.

The flow for that command is as follows:

* the shard is locked in the global topology server.
* we read the Shard object from the global topology server.
* we read all the tablets in the replication graph for the shard. Note
Expand All @@ -145,6 +149,7 @@ The flow for that command is as follows:
successfully reparented.

Failure cases:

* The global topology server has to be available for locking and
modification during this operation. If not, the operation will just
fail.
Expand Down
4 changes: 4 additions & 0 deletions doc/Resharding.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ higher level concepts on Sharding.
To follow a step-by-step guide for how to shard a keyspace, you can see [this page](HorizontalReshardingGuide.md).

In general, the process to achieve this goal is composed of the following steps:

* pick the original shard(s)
* pick the destination shard(s) coverage
* create the destination shard(s) tablets (in a mode where they are not used to serve traffic yet)
Expand All @@ -33,6 +34,7 @@ In general, the process to achieve this goal is composed of the following steps:
## Applications

The main application we currently support:

* in a sharded keyspace, split or merge shards (horizontal sharding)
* in a non-sharded keyspace, break out some tables into a different keyspace (vertical sharding)

Expand All @@ -43,6 +45,7 @@ downtime for the application.
## Scaling Up and Down

Here is a quick table of what to do with Vitess when a change is required:

* uniformly increase read capacity: add replicas, or split shards
* uniformly increase write capacity: split shards
* reclaim free space: merge shards / keyspaces
Expand All @@ -53,6 +56,7 @@ Here is a quick table of what to do with Vitess when a change is required:

The cornerstone of Resharding is being able to replicate the right data. Mysql doesn't support any filtering, so the
Vitess project implements it entirely:

* the tablet server tags transactions with comments that describe what the scope of the statements are (which keyspace_id,
which table, ...). That way the MySQL binlogs contain all filtering data.
* a server process can filter and stream the MySQL binlogs (using the comments).
Expand Down
10 changes: 10 additions & 0 deletions doc/SchemaManagement.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ $ vtctl -wait-time=30s ValidateSchemaKeyspace user
## Changing the Schema

Goals:

* simplify schema updates on the fleet
* minimize human actions / errors
* guarantee no or very little downtime for most schema updates
Expand All @@ -51,6 +52,7 @@ Goals:
We’re trying to get reasonable confidence that a schema update is going to work before applying it. Since we cannot really apply a change to live tables without potentially causing trouble, we have implemented a Preflight operation: it copies the current schema into a temporary database, applies the change there to validate it, and gathers the resulting schema. After this Preflight, we have a good idea of what to expect, and we can apply the change to any database and make sure it worked.

The Preflight operation takes a sql string, and returns a SchemaChangeResult:

```go
type SchemaChangeResult struct {
Error string
Expand All @@ -60,6 +62,7 @@ type SchemaChangeResult struct {
```

The ApplySchema action applies a schema change. It is described by the following structure (also returns a SchemaChangeResult):

```go
type SchemaChange struct {
Sql string
Expand All @@ -71,18 +74,21 @@ type SchemaChange struct {
```

And the associated ApplySchema remote action for a tablet. Then the performed steps are:

* The database to use is either derived from the tablet dbName if UseVt is false, or is the _vt database. A ‘use dbname’ is prepended to the Sql.
* (if BeforeSchema is not nil) read the schema, make sure it is equal to BeforeSchema. If not equal: if Force is not set, we will abort, if Force is set, we’ll issue a warning and keep going.
* if AllowReplication is false, we’ll disable replication (adding SET sql_log_bin=0 before the Sql).
* We will then apply the Sql command.
* (if AfterSchema is not nil) read the schema again, make sure it is equal to AfterSchema. If not equal: if Force is not set, we will issue an error, if Force is set, we’ll issue a warning.

We will return the following information:

* whether it worked or not (doh!)
* BeforeSchema
* AfterSchema

### Use case 1: Single tablet update:

* we first do a Preflight (to know what BeforeSchema and AfterSchema will be). This can be disabled, but is not recommended.
* we then do the schema upgrade. We will check BeforeSchema before the upgrade, and AfterSchema after the upgrade.

Expand All @@ -108,6 +114,7 @@ This translates into the following vtctl commands:
```
PreflightSchema {-sql=<sql> || -sql_file=<filename>} <tablet alias>
```

apply the schema change to a temporary database to gather before and after schema and validate the change. The sql can be inlined or read from a file.
This will create a temporary database, copy the existing keyspace schema into it, apply the schema change, and re-read the resulting schema.

Expand All @@ -119,12 +126,14 @@ $ vtctl PreflightSchema -sql_file=change.sql nyc-0002009001
```
ApplySchema {-sql=<sql> || -sql_file=<filename>} [-skip_preflight] [-stop_replication] <tablet alias>
```

apply the schema change to the specific tablet (allowing replication by default). The sql can be inlined or read from a file.
a PreflightSchema operation will first be used to make sure the schema is OK (unless skip_preflight is specified).

```
ApplySchemaShard {-sql=<sql> || -sql_file=<filename>} [-simple] [-new_parent=<tablet alias>] <keyspace/shard>
```

apply the schema change to the specific shard. If simple is specified, we just apply on the live master. Otherwise, we do the shell game and will optionally re-parent.
if new_parent is set, we will also reparent (otherwise the master won't be touched at all). Using the force flag will cause a bunch of checks to be ignored, use with care.

Expand All @@ -136,4 +145,5 @@ $ vtctl ApplySchemaShard --sql-file=change.sql -new_parent=nyc-0002009002 vtx/0
```
ApplySchemaKeyspace {-sql=<sql> || -sql_file=<filename>} [-simple] <keyspace>
```

apply the schema change to the specified shard. If simple is specified, we just apply on the live master. Otherwise we will need to do the shell game. So we will apply the schema change to every single slave.
5 changes: 5 additions & 0 deletions doc/Sharding.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ sharding, as we need to figure out if a value is within a Shard's
range.

Vitess was designed to allow two types of sharding keys:

* Binary data: just an array of bytes. We use regular byte array
comparison here. Can be used for strings. MySQL representation is a
VARBINARY field.
Expand Down Expand Up @@ -72,12 +73,14 @@ Two Key Ranges are consecutive if the End of the first one is equal to
the Start of the next one.

Two special values exist:

* if a Start is empty, it represents the lowest value, and all values
are greater than it.
* if an End is empty, it represents the biggest value, and all values
are strictly lower than it.

Examples:

* Start=[], End=[]: full Key Range
* Start=[], End=[0x80]: Lower half of the Key Range.
* Start=[0x80], End=[]: Upper half of the Key Range.
Expand All @@ -101,6 +104,7 @@ We will use this convention in the rest of this document.
### Sharding Key Partition

A partition represent a set of Key Ranges that cover the entire space. For instance, the following four shards are a valid full partition:

* -40
* 40-80
* 80-c0
Expand All @@ -115,6 +119,7 @@ minimal downtime.
### Resharding

Vitess provides a set of tools and processes to deal with Range Based Shards:

* [Dynamic resharding](Resharding.md) allows splitting or merging of shards with no
read downtime, and very minimal master unavailability (<5s).
* Client APIs are designed to take sharding into account.
Expand Down
2 changes: 2 additions & 0 deletions doc/Tools.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,7 @@ level picture of all the servers and their current state.

### vtworker
vtworker is meant to host long-running processes. It supports a plugin infrastructure, and offers libraries to easily pick tablets to use. We have developed:

* resharding differ jobs: meant to check data integrity during shard splits and joins.
* vertical split differ jobs: meant to check data integrity during vertical splits and joins.

Expand All @@ -66,6 +67,7 @@ It is very easy to add other checker processes for in-tablet integrity checks (v
vtprimecache is a mysql cache primer for faster replication. If the single MySQL replication thread is falling behind, vtprimecache activates and starts reading the available relay logs. It then uses a few threads / connections to MySQL to execute modified statements and prime the MySQL buffer cache. The idea is for instance if an 'update table X where id=2' statement is going to be executed by the replication SQL thread 2 or 3 seconds from now, might as well execute a concurrent 'select from table X where id=2' now and prime the MySQL buffer cache. In practice, this shows a speed improvement in replication speed by 30 to 40 percents.

### Other support tools

* *mysqlctl*: manage MySQL instances.
* *zkctl*: manage ZooKeeper instances.
* *zk*: command line ZooKeeper client and explorer.
Expand Down
7 changes: 7 additions & 0 deletions doc/TopologyService.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ An entire Keyspace can be locked. We use this during resharding for instance, wh
### Shard

A Shard contains a subset of the data for a Keyspace. The Shard record in the global topology contains:

* the MySQL Master tablet alias for this shard
* the sharding key range covered by this Shard inside the Keyspace
* the tablet types this Shard is serving (master, replica, batch, …), per cell if necessary.
Expand All @@ -61,6 +62,7 @@ This section describes the data structures stored in the local instance (per cel
### Tablets

The Tablet record has a lot of information about a single vttablet process running inside a tablet (along with the MySQL process):

* the Tablet Alias (cell+unique id) that uniquely identifies the Tablet
* the Hostname, IP address and port map of the Tablet
* the current Tablet type (master, replica, batch, spare, …)
Expand All @@ -70,6 +72,7 @@ The Tablet record has a lot of information about a single vttablet process runni
* user-specified tag map (to store per installation data for instance)

A Tablet record is created before a tablet can be running (either by `vtctl InitTablet` or by passing the `init_*` parameters to vttablet). The only way a Tablet record will be updated is one of:

* The vttablet process itself owns the record while it is running, and can change it.
* At init time, before the tablet starts
* After shutdown, when the tablet gets scrapped or deleted.
Expand All @@ -86,6 +89,7 @@ The Serving Graph is what the clients use to find which EndPoints to send querie
#### SrvKeyspace

It is the local representation of a Keyspace. It contains information on what shard to use for getting to the data (but not information about each individual shard):

* the partitions map is keyed by the tablet type (master, replica, batch, …) and the values are list of shards to use for serving.
* it also contains the global Keyspace fields, copied for fast access.

Expand All @@ -94,6 +98,7 @@ It can be rebuilt by running `vtctl RebuildKeyspaceGraph`. It is not automatical
#### SrvShard

It is the local representation of a Shard. It contains information on details internal to this Shard only, but not to any tablet running in this shard:

* the name and sharding Key Range for this Shard.
* the cell that has the master for this Shard.

Expand All @@ -104,6 +109,7 @@ It can be rebuilt (along with all the EndPoints in this Shard) by running `vtctl
#### EndPoints

For each possible serving type (master, replica, batch), in each Cell / Keyspace / Shard, we maintain a rolled-up EndPoint list. Each entry in the list has information about one Tablet:

* the Tablet Uid
* the Host on which the Tablet resides
* the port map for that Tablet
Expand Down Expand Up @@ -171,6 +177,7 @@ We use the `_Data` filename to store the data, JSON encoded.
For locking, we store a `_Lock` file with various contents in the directory that contains the object to lock.

We use the following paths:

* Keyspace: `/vt/keyspaces/<keyspace>/_Data`
* Shard: `/vt/keyspaces/<keyspace>/<shard>/_Data`
* Tablet: `/vt/tablets/<cell>-<uid>/_Data`
Expand Down
Loading

0 comments on commit cbc39cd

Please sign in to comment.