-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #162 from doctrine/Sharding
Sharding
- Loading branch information
Showing
36 changed files
with
2,812 additions
and
10 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,94 @@ | ||
# Azure Federations | ||
|
||
Implementing Federations inside a new Doctrine Sharding Extension. Some extensions to the DBAL and ORM core have to be done to get this working. | ||
|
||
1. DBAL (Database Abstraction Layer) | ||
|
||
* Add support for Database Schema Operations | ||
* CREATE FEDERATION | ||
* CREATE TABLE ... FEDERATED ON | ||
* Add support to create a multi-tenent schema from any given schema | ||
* Add API to pick a shard based on distribution key and atomic value | ||
* Add API to ask about federations, federation members and so on. | ||
* Add Sharding Abstraction | ||
* If a shard is picked via distribution key and atomic value fire queries against this only | ||
* Or query the global database. | ||
|
||
2. ORM (Object-Relational Mapper) | ||
|
||
* Federation Key has to be part of the clustered index of the table | ||
* Test with a pure Multi-Tenent App with Filtering = ON (TaskList) | ||
* Test with sharded app (Weather) | ||
|
||
## Implementation Details | ||
|
||
SQL Azure requires one and exactly one clustered index. It makes no difference if the primary key | ||
or any other key is the clustered index. Sharding requires an external ID generation (no auto-increment) | ||
such as GUIDs. GUIDs have negative properties with regard to clustered index performance, so that | ||
typically you would add a "created" timestamp for example that holds the clustered index instead | ||
of making the GUID a clustered index. | ||
|
||
## Example API: | ||
|
||
@@@ php | ||
<?php | ||
use Doctrine\DBAL\DriverManager; | ||
|
||
$dbParams = array( | ||
'dbname' => 'tcp:dbname.database.windows.net', | ||
'sharding' => array( | ||
'federationName' => 'Orders_Federation', | ||
'distributionKey' => 'CustID', | ||
'distributionType' => 'integer', | ||
'filteringEnabled' => false, | ||
), | ||
// ... | ||
); | ||
|
||
$conn = DriverManager::getConnection($dbParams); | ||
$shardManager = $conn->getShardManager(); | ||
|
||
// Example 1: query against root database | ||
$sql = "SELECT * FROM Products"; | ||
$rows = $conn->executeQuery($sql); | ||
|
||
// Example 2: query against the selected shard with CustomerId = 100 | ||
$aCustomerID = 100; | ||
$shardManager->selectShard($aCustomerID); // Using Default federationName and distributionKey | ||
// Query: "USE FEDERATION Orders_Federation (CustID = $aCustomerID) WITH RESET, FILTERING OFF;" | ||
|
||
$sql = "SELECT * FROM Customers"; | ||
$rows = $conn->executeQuery($sql); | ||
|
||
// Example 3: Reset API to root database again | ||
$shardManager->selectGlobal(); | ||
|
||
## ID Generation | ||
|
||
With sharding all the ids have to be generated for global uniqueness. There are three strategies for this. | ||
|
||
1. Use GUIDs as described here http://blogs.msdn.com/b/cbiyikoglu/archive/2011/06/20/id-generation-in-federations-identity-sequences-and-guids-uniqueidentifier.aspx | ||
2. Having a central table that is accessed with a second connection to generate sequential ids | ||
3. Using natural keys from the domain. | ||
|
||
The second approach has the benefit of having numerical primary keys, however also a central failure location. The third strategy can seldom be used, because the domains dont allow this. Identity columns cannot be used at all. | ||
|
||
@@@ php | ||
<?php | ||
use Doctrine\DBAL\DriverManager; | ||
use Doctrine\DBAL\Id\TableHiLoIdGenerator; | ||
|
||
$dbParams = array( | ||
'dbname' => 'dbname.database.windows.net', | ||
// ... | ||
); | ||
$conn = DriverManager::getConnection($dbParams); | ||
|
||
$idGenerator = new TableHiLoIdGenerator($conn, 'id_table_name', $multiplicator = 1); | ||
// only once, create this table | ||
$idGenerator->createTable(); | ||
|
||
$nextId = $idGenerator->generateId('for_table_name'); | ||
$nextOtherId = $idGenerator->generateId('for_other_table'); | ||
|
||
The connection for the table generator has to be a different one than the one used for the main app to avoid transaction clashes. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,74 @@ | ||
# Doctrine Shards | ||
|
||
Doctrine Extension to support horizontal sharding in the Doctrine ORM. | ||
|
||
## Idea | ||
|
||
Implement sharding inside Doctrine at a level that is as unobtrusive to the developer as possible. | ||
|
||
Problems to tackle: | ||
|
||
1. Where to send INSERT statements? | ||
2. How to generate primary keys? | ||
3. How to pick shards for update, delete statements? | ||
4. How to pick shards for select operations? | ||
5. How to merge select queries that span multiple shards? | ||
6. How to handle/prevent multi-shard queries that cannot be merged (GROUP BY)? | ||
7. How to handle non-sharded data? (static metadata tables for example) | ||
8. How to handle multiple connections? | ||
9. Implementation on the DBAL or ORM level? | ||
|
||
## Roadmap | ||
|
||
Version 1: DBAL 2.3 (Multi-Tenant Apps) | ||
|
||
1. ID Generation support (in DBAL + ORM done) | ||
2. Multi-Tenant Support: Either pick a global metadata database or exactly one shard. | ||
3. Fan-out queries over all shards (or a subset) by result appending | ||
|
||
Version 2: ORM related (complex): | ||
|
||
4. ID resolving (Pick shard for a new ID) | ||
5. Query resolving (Pick shards a query should send to) | ||
6. Shard resolving (Pick shards an ID could be on) | ||
7. Transactions | ||
8. Read Only objects | ||
|
||
## Technical Requirements for Database Schemas | ||
|
||
Sharded tables require the sharding-distribution key as one of their columns. This will affect your code compared to a normalized db-schema. If you have a Blog <-> BlogPost <-> PostComments entity setup sharded by `blog_id` then even the PostComment table needs this column, even if an "unsharded", normalized DB-Schema does not need this information. | ||
|
||
## Implementation Details | ||
|
||
Assumptions: | ||
|
||
* For querying you either want to query ALL or just exactly one shard. | ||
* IDs for ALL sharded tables have to be unique across all shards. | ||
* Non-shareded data is replicated between all shards. They redundantly keep the information available. This is necessary so join queries on shards to reference data work. | ||
* If you retrieve an object A from a shard, then all references and collections of this object reside on the same shard. | ||
* The database schema on all shards is the same (or compatible) | ||
|
||
### SQL Azure Federations | ||
|
||
SQL Azure is a special case, points 1, 2, 3, 4, 7 and 8 are partly handled on the database level. This makes it a perfect test-implementation for just the subset of features in points 5-6. However there need to be a way to configure SchemaTool to generate the correct Schema on SQL Azure. | ||
|
||
* SELECT Operations: The most simple assumption is to always query all shards unless the user specifies otherwise explicitly. | ||
* Queries can be merged in PHP code, this obviously does not work for DISTINCT, GROUP BY and ORDER BY queries. | ||
|
||
### Generic Sharding | ||
|
||
More features are necessary to implement sharding on the PHP level, independent from database support: | ||
|
||
1. Configuration of multiple connections, one connection = one shard. | ||
2. Primary Key Generation mechanisms (UUID, central table, sequence emulation) | ||
|
||
## Primary Use-Cases | ||
|
||
1. Multi-Tenant Applications | ||
|
||
These are easier to support as you have some value to determine the shard id for the whole request very early on. | ||
Here also queries can always be limited to a single shard. | ||
|
||
2. Scale-Out by some attribute (Round-Robin?) | ||
|
||
This strategy requires access to multiple shards in a single request based on the data accessed. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
# Sharding with SQLAzure Example | ||
|
||
This example demonstrates Sharding with SQL Azure Federations. | ||
|
||
## Requirements | ||
|
||
1. Windows Azure Account | ||
2. SQL Azure Database | ||
3. Composer for dependencies | ||
|
||
## Install | ||
|
||
composer install | ||
|
||
Change "examples/sharding/bootstrap.php" to contain Database connection. | ||
|
||
## Order to execute Scripts | ||
|
||
1. create_schema.php | ||
2. view_federation_members.php | ||
3. insert_data.php | ||
4. split_federation.php | ||
5. insert_data_after_split.php | ||
6. query_filtering_off.php | ||
7. query_filtering_on.php | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
<?php | ||
// bootstrap.php | ||
use Doctrine\DBAL\DriverManager; | ||
use Doctrine\Shards\DBAL\SQLAzure\SQLAzureShardManager; | ||
|
||
require_once "vendor/autoload.php"; | ||
|
||
$config = array( | ||
'dbname' => 'SalesDB', | ||
'host' => 'tcp:dbname.windows.net', | ||
'user' => 'user@dbname', | ||
'password' => 'XXX', | ||
'sharding' => array( | ||
'federationName' => 'Orders_Federation', | ||
'distributionKey' => 'CustId', | ||
'distributionType' => 'integer', | ||
) | ||
); | ||
|
||
if ($config['host'] == "tcp:dbname.windows.net") { | ||
die("You have to change the configuration to your Azure account.\n"); | ||
} | ||
|
||
$conn = DriverManager::getConnection($config); | ||
$shardManager = new SQLAzureShardManager($conn); | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
{ | ||
"require": { | ||
"doctrine/dbal": "*", | ||
"doctrine/shards": "0.3" | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,51 @@ | ||
<?php | ||
// create_schema.php | ||
use Doctrine\DBAL\Schema\Schema; | ||
use Doctrine\Shards\DBAL\SQLAzure\SQLAzureSchemaSynchronizer; | ||
|
||
require_once 'bootstrap.php'; | ||
|
||
$schema = new Schema(); | ||
|
||
$products = $schema->createTable('Products'); | ||
$products->addColumn('ProductID', 'integer'); | ||
$products->addColumn('SupplierID', 'integer'); | ||
$products->addColumn('ProductName', 'string'); | ||
$products->addColumn('Price', 'decimal', array('scale' => 2, 'precision' => 12)); | ||
$products->setPrimaryKey(array('ProductID')); | ||
$products->addOption('azure.federated', true); | ||
|
||
$customers = $schema->createTable('Customers'); | ||
$customers->addColumn('CustomerID', 'integer'); | ||
$customers->addColumn('CompanyName', 'string'); | ||
$customers->addColumn('FirstName', 'string'); | ||
$customers->addColumn('LastName', 'string'); | ||
$customers->setPrimaryKey(array('CustomerID')); | ||
$customers->addOption('azure.federated', true); | ||
$customers->addOption('azure.federatedOnColumnName', 'CustomerID'); | ||
|
||
$orders = $schema->createTable('Orders'); | ||
$orders->addColumn('CustomerID', 'integer'); | ||
$orders->addColumn('OrderID', 'integer'); | ||
$orders->addColumn('OrderDate', 'datetime'); | ||
$orders->setPrimaryKey(array('CustomerID', 'OrderID')); | ||
$orders->addOption('azure.federated', true); | ||
$orders->addOption('azure.federatedOnColumnName', 'CustomerID'); | ||
|
||
$orderItems = $schema->createTable('OrderItems'); | ||
$orderItems->addColumn('CustomerID', 'integer'); | ||
$orderItems->addColumn('OrderID', 'integer'); | ||
$orderItems->addColumn('ProductID', 'integer'); | ||
$orderItems->addColumn('Quantity', 'integer'); | ||
$orderItems->setPrimaryKey(array('CustomerID', 'OrderID', 'ProductID')); | ||
$orderItems->addOption('azure.federated', true); | ||
$orderItems->addOption('azure.federatedOnColumnName', 'CustomerID'); | ||
|
||
// Create the Schema + Federation: | ||
$synchronizer = new SQLAzureSchemaSynchronizer($conn, $shardManager); | ||
|
||
// Or jut look at the SQL: | ||
echo implode("\n", $synchronizer->getCreateSchema($schema)); | ||
|
||
$synchronizer->createSchema($schema); | ||
|
Oops, something went wrong.