-
Notifications
You must be signed in to change notification settings - Fork 15
Introduce database import and export protocol messages #224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
message Res { | ||
DatabaseReplicas database = 1; | ||
} | ||
} | ||
|
||
message Import { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've placed all migration-related messages in a single file, so it's "encapsulated". However, this introduces an additional layer of optional client
and server
in Rust while unpacking this message, so it's a little irritating. Not sure if it's worth it or not. I like this design more, I guess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like what you have here, very readable
|
||
// This is an emulation of the google ErrorDetails message. Generally, ErrorDetails are submitted via the GRPC error | ||
// mechanism, but a manual error sending is required in streams | ||
message Error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was planning to reuse this error, but I understood that I don't need non-terminal errors in my protocol. However, I think that it's reasonable to generalize error messages for the whole TypeDB protocol.
UNSPECIFIED = 0; | ||
VERSION = 6; | ||
VERSION = 7; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@flyingsilverfin We need to discuss the new versioning approach. I may roll it back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, couple notes.
proto/migration.proto
Outdated
message Migration { | ||
message Export { | ||
message Req { | ||
string database = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This and name
on L45 should have the same name. Either name
or database_name
, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair. I thought there are more exceptions, but it's called string database
only in the transaction opening request
proto/migration.proto
Outdated
int64 relation_count = 3; | ||
int64 role_count = 4; | ||
int64 ownership_count = 5; | ||
// 6 was deleted and cannot be used until a breaking change occurs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
6 should be reserved
then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was copied from the 2.x implementation, but I didn't even know there was such a feature. Cool, done!
proto/migration.proto
Outdated
bool boolean = 2; | ||
int64 integer = 3; | ||
double double = 4; | ||
int64 datetime_millis = 5; // reserved for 2.x, time since epoch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
int64 datetime_millis = 5; // reserved for 2.x, time since epoch | |
int64 datetime_millis = 5; // compatibility with 2.x, milliseconds since epoch |
// ATTENTION: the messages below are used to import multiple versions of TypeDB. | ||
// DO NOT reorder or delete existing and reserved indices. Be careful while extending this. | ||
// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bigger please!! like we had it in server - this is very dangerous
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ASCII art here we go???
// _ _____ _____ _____ _ _ _____ ___ ___ _ _ _
// / \|_ _|_ _| ____| \ | |_ _|_ _/ _ \| \ | | |
// / _ \ | | | | | _| | \| | | | | | | | | \| | |
// / ___ \| | | | | |___| |\ | | | | | |_| | |\ |_|
// /_/ \_\_| |_| |_____|_| \_| |_| |___\___/|_| \_(_)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's do it :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
## Product change and motivation Add database export and database import operations. Unlike in TypeDB 2.x, these operations are run through TypeDB **GRPC clients** such as the **Rust driver** or **TypeDB Console**, which solves a number of issues with networking and encryption, especially relevant for users of TypeDB Enterprise. With this, it becomes an official part of the [TypeDB GPRC protocol](typedb/typedb-protocol#224). Both operations are performed through the network, but the server and the client can be used on the same host. Each TypeDB database can be represented as two files: 1. A text file with its TypeQL schema description: a complete `define` query for the whole schema. 2. A binary file with its data. This format is an extension to the TypeDB 2.x's export format. See more details below for version compatibility information. ### Database export Database export allows a client to download database schema and data files from a TypeDB server for future import to the same or higher TypeDB version. The files are created on the client side. While the database data is being exported, parallel queries are allowed, but none of them will affect the exported data thanks to TypeDB's transactionality. However, the database will not be available for such operations as deletion. **Exported TypeDB 3.x databases cannot be imported into servers of older versions.** ### Database import Database import allows a client to upload previously exported database schema and data files to a TypeDB server. It is possible to assign any new name to the imported database. While the database data is being imported, it is not recommended to perform parallel operations against it. Interfering actions may lead to import errors or database corruption. **Import supports all exported TypeDB 2.x and TypeDB 3.x databases.** It can be used to migrate between TypeDB versions with breaking changes. Please visit [our docs](https://typedb.com/docs/manual/migration/2_to_3) for more information. ## Implementation Implement [the new protocol](typedb/typedb-protocol#224). The two operations are implemented as two separate services running in `tokio` tasks, similar to transaction services. ### Database export This operation is simple. After establishing a stream, we just export the database's schema (the same operation as schema retrieval) and then send a number of items containing the header, the database's data (encoded concepts), and data checksums at the end. ### Database import This operation is more tricky. It is executed in steps, containing a couple of "states": 1. The database's name and schema are expected. Without it, we can't continue. Ignoring this step leads to an error, signaling that the client is probably implemented incorrectly. 2. After the schema is received, it is executed and committed as is, to check that the provided schema is correct. Otherwise, a user error is returned, so they can rewrite the schema and try again. 3. After the schema is persisted, it is relaxed. This contains: a) substituting default cardinalities and card/key annotations with `@card(0..)` for all capabilities in the schema b) making all attributes independent (`@independent`) so they are not cleaned up between transactions without owners if they are not yet received c) making all relations independent (a new system `system independent` property, not exposed to users) so they are not cleaned up between transactions without role players. All errors at this stage are considered internal errors and bugs. 4. After the schema is prepared, we receive items, decode them into concepts, and persist in the database. There is a transaction buffer size we use to execute commits from time to time, reducing memory consumption and final commit time. This optimization requires 3b and 3c. One of the expected and required items is checksums. Another optional item is the header (we don't really use it outside of logs, so it's not required. Not sure if we need to force it). 5. After the `Done` message is received, signaling that the stream of items is completed, we perform the final data commit and unrelax the schema (undo 3). 6. If everything is good, a verification `Done` response is sent. At any point, an error can be returned. To make it possible, the protocol introduces streaming from the server, although it remains silent until the end unless there are errors in the provided messages. While being imported, databases are not accessible through `database_manager`: they are owned by `database_importer`. If a server crashes, the uncompleted databases will be cleaned up on the next bootup. To avoid overflowing memory, `InstanceIDMapping` uses `SpilloverCache`, a new component combining `HashMap` and RocksDB to spill over the excessive data which is too much to fit in memory.
## Usage and product changes Introduce interfaces to export databases into schema definition and data files and to import databases using these files. Database import supports files exported from both TypeDB 2.x and TypeDB 3.x. Both operations are blocking and may take a significant amount of time to execute for large databases. Use parallel connections to continue operating with the server and its other databases. Usage examples in Rust: ```rust // export let db = driver.databases().get(db_name).await.unwrap(); db.export_to_file(schema_file_path, data_file_path).await.unwrap(); // import let schema = read_to_string(schema_file_path).unwrap(); driver.databases().import_from_file(db_name2, schema, data_file_path).await.unwrap(); ``` Usage examples in Python: ```py # export database = driver.databases.get(db_name) database.export_to_file(schema_file_path, data_file_path) # import with open(schema_file_path, 'r', encoding='utf-8') as f: schema = f.read() driver.databases.import_from_file(db_name2, schema, data_file_path) ``` Usage examples in Java: ```java // export Database database = driver.databases().get(dbName); database.exportToFile(schemaFilePath, dataFilePath); // import String schema = Files.readString(Path.of(schemaFilePath)); driver.databases().importFromFile(dbName2, schema, dataFilePath); ``` ## Implementation Implemented the updated [protocol](typedb/typedb-protocol#224). As both operations work with streaming, the implementation is similar to transactions. The behavior is split into the file processing logic and networking (specialized for sync and async modes). The exposed interfaces present only the file-based versions, but additional interfaces for direct work with streams can be presented in future updates. In Rust, paths are accepted as Rust `Path`s. In other languages working through C interfaces, it's pure strings for C layer transmission. ### Database export Implemented through the `database` interface and accepts two target files for export. Does not require a specific format of naming (so it's necessarily `.typeql` or `.typedb`. If any of the target files already exist, an error is returned. The export operation consists of these steps: * prepare the output files * open a unidirectional GRPC stream from the server to the client * "block" on server response listening until an error or a "done" is received (blocking is implemented through a loop which resolves a `listen` promise presented by the network layer) * if there is a schema message, write it to the schema file and flush it right away * if there is a data items message, encode the items and write them to the data file * in case of an error, the output files are deleted (we own them as we create them at the beginning) The network layer is basically just a task listening for the GRPC stream and transmitting the converted messages to the processing loop. ### Database import Implemented through the `database_manager` interface and accepts a database name, a schema definition query string (can be read from the exported file), and an exported data file. No naming requirements as well. The import operation consists of these steps: * open the input file * open a bidirectional GRPC stream between the server and the client, send the initial request with the database's name and schema * eagerly start reading and decoding data items from the input file one by one, storing up to 250 items in the buffer (this number can be easily changed) * once the buffer is full, attempt items sending operation: this operation will check a potential early error signal from the server and then send the batch, returning to processing the rest of the file * once the file is read, send a "done" message and block until the server responds with either an error or its "done" message The network layer consists of a blocking task for client-side requests and a listening task waiting for a one-shot signal from the server (either an error or a "done" message). Errors can be received at any time of processing, while "done" is expected only after a client-side "done" request. When a response is received, either an async or a sync sink receives this message, which should be checked before any client-side network operations to ensure proper interruption.
Release notes: usage and product changes
Add
migration
protocol messages for usage in database import and database export operations:database_export
stream from a TypeDB server to export a specific database, similar to TypeDB 2.x;databases_import
stream between a client and a server to import an exported 2.x/3.x TypeDB database into a TypeDB 3.x server from a client.The format of migration items used for these operations is an extended version of TypeDB 2.x's migration items, so it is backward compatible with 2.x database files. Important: it's not intended to import 3.x databases into 2.x servers.
Implementation
Add
Migration { Item }
message. The format is an extended version of the 2.x protocol, so it contains "outdated" fields for its compatibility with old databases.Add
Migration { Export }
message. This operation consists of a single clientReq { database }
and multiple streamed server responses:Done
message to signal that the server is ready to close the stream without errors. It can be substituted by a silent stream closure, but I preferred explicitness here.Add
Migration { Import }
message. This operation consists of a stream of client requests:Done
message to signal that the client is finished without errors, and the server can perform the final validation. ThisDone
message is required and cannot be removed because the client has to check whether there were finalization errors or not.and a stream of server responses (actually, there is either a single
Done
or a single error, but the stream is needed to return errors at any stage of the communication).