Introduce database import and export protocol messages #224

farost · 2025-05-08T11:37:47Z

Release notes: usage and product changes

Add migration protocol messages for usage in database import and database export operations:

a unidirectional database_export stream from a TypeDB server to export a specific database, similar to TypeDB 2.x;
a bidirectional databases_import stream between a client and a server to import an exported 2.x/3.x TypeDB database into a TypeDB 3.x server from a client.

The format of migration items used for these operations is an extended version of TypeDB 2.x's migration items, so it is backward compatible with 2.x database files. Important: it's not intended to import 3.x databases into 2.x servers.

Implementation

Add Migration { Item } message. The format is an extended version of the 2.x protocol, so it contains "outdated" fields for its compatibility with old databases.

Add Migration { Export } message. This operation consists of a single client Req { database } and multiple streamed server responses:

An initial response with the schema.
An unlimited number of migration items (multiple messages with multiple items in one message for potential optimizations)
A Done message to signal that the server is ready to close the stream without errors. It can be substituted by a silent stream closure, but I preferred explicitness here.

Add Migration { Import } message. This operation consists of a stream of client requests:

An initial request with the name of the database and its schema string.
An unlimited number of migration items.
A Done message to signal that the client is finished without errors, and the server can perform the final validation. This Done message is required and cannot be removed because the client has to check whether there were finalization errors or not.

and a stream of server responses (actually, there is either a single Done or a single error, but the stream is needed to return errors at any stage of the communication).

…errors

farost · 2025-05-27T09:33:26Z

proto/database.proto

        message Res {
            DatabaseReplicas database = 1;
        }
    }
+
+    message Import {


I've placed all migration-related messages in a single file, so it's "encapsulated". However, this introduces an additional layer of optional client and server in Rust while unpacking this message, so it's a little irritating. Not sure if it's worth it or not. I like this design more, I guess.

I like what you have here, very readable

farost · 2025-05-27T09:36:23Z

proto/error.proto

+
+// This is an emulation of the google ErrorDetails message. Generally, ErrorDetails are submitted via the GRPC error
+// mechanism, but a manual error sending is required in streams
+message Error {


I was planning to reuse this error, but I understood that I don't need non-terminal errors in my protocol. However, I think that it's reasonable to generalize error messages for the whole TypeDB protocol.

farost · 2025-05-29T08:40:13Z

proto/version.proto

  UNSPECIFIED = 0;
-  VERSION = 6;
+  VERSION = 7;


@flyingsilverfin We need to discuss the new versioning approach. I may roll it back.

dmitrii-ubskii

Looks good, couple notes.

dmitrii-ubskii · 2025-05-29T15:08:41Z

proto/migration.proto

+message Migration {
+    message Export {
+        message Req {
+            string database = 1;


This and name on L45 should have the same name. Either name or database_name, I think.

Fair. I thought there are more exceptions, but it's called string database only in the transaction opening request

dmitrii-ubskii · 2025-05-29T15:12:14Z

proto/migration.proto

+            int64 relation_count = 3;
+            int64 role_count = 4;
+            int64 ownership_count = 5;
+            // 6 was deleted and cannot be used until a breaking change occurs


6 should be reserved then.

This was copied from the 2.x implementation, but I didn't even know there was such a feature. Cool, done!

dmitrii-ubskii · 2025-05-29T15:14:48Z

proto/migration.proto

+            bool boolean = 2;
+            int64 integer = 3;
+            double double = 4;
+            int64 datetime_millis = 5; // reserved for 2.x, time since epoch


Suggested change

int64 datetime_millis = 5; // reserved for 2.x, time since epoch

int64 datetime_millis = 5; // compatibility with 2.x, milliseconds since epoch

flyingsilverfin · 2025-06-03T16:01:32Z

proto/migration.proto

+    // ATTENTION: the messages below are used to import multiple versions of TypeDB.
+    // DO NOT reorder or delete existing and reserved indices. Be careful while extending this.
+    //


bigger please!! like we had it in server - this is very dangerous

ASCII art here we go???

// _ _____ _____ _____ _ _ _____ ___ ___ _ _ _ // / \|_ _|_ _| ____| \ | |_ _|_ _/ _ \| \ | | | // / _ \ | | | | | _| | \| | | | | | | | | \| | | // / ___ \| | | | | |___| |\ | | | | | |_| | |\ |_| // /_/ \_\_| |_| |_____|_| \_| |_| |___\___/|_| \_(_)

let's do it :D

flyingsilverfin

LGTM

## Product change and motivation Add database export and database import operations. Unlike in TypeDB 2.x, these operations are run through TypeDB **GRPC clients** such as the **Rust driver** or **TypeDB Console**, which solves a number of issues with networking and encryption, especially relevant for users of TypeDB Enterprise. With this, it becomes an official part of the [TypeDB GPRC protocol](typedb/typedb-protocol#224). Both operations are performed through the network, but the server and the client can be used on the same host. Each TypeDB database can be represented as two files: 1. A text file with its TypeQL schema description: a complete `define` query for the whole schema. 2. A binary file with its data. This format is an extension to the TypeDB 2.x's export format. See more details below for version compatibility information. ### Database export Database export allows a client to download database schema and data files from a TypeDB server for future import to the same or higher TypeDB version. The files are created on the client side. While the database data is being exported, parallel queries are allowed, but none of them will affect the exported data thanks to TypeDB's transactionality. However, the database will not be available for such operations as deletion. **Exported TypeDB 3.x databases cannot be imported into servers of older versions.** ### Database import Database import allows a client to upload previously exported database schema and data files to a TypeDB server. It is possible to assign any new name to the imported database. While the database data is being imported, it is not recommended to perform parallel operations against it. Interfering actions may lead to import errors or database corruption. **Import supports all exported TypeDB 2.x and TypeDB 3.x databases.** It can be used to migrate between TypeDB versions with breaking changes. Please visit [our docs](https://typedb.com/docs/manual/migration/2_to_3) for more information. ## Implementation Implement [the new protocol](typedb/typedb-protocol#224). The two operations are implemented as two separate services running in `tokio` tasks, similar to transaction services. ### Database export This operation is simple. After establishing a stream, we just export the database's schema (the same operation as schema retrieval) and then send a number of items containing the header, the database's data (encoded concepts), and data checksums at the end. ### Database import This operation is more tricky. It is executed in steps, containing a couple of "states": 1. The database's name and schema are expected. Without it, we can't continue. Ignoring this step leads to an error, signaling that the client is probably implemented incorrectly. 2. After the schema is received, it is executed and committed as is, to check that the provided schema is correct. Otherwise, a user error is returned, so they can rewrite the schema and try again. 3. After the schema is persisted, it is relaxed. This contains: a) substituting default cardinalities and card/key annotations with `@card(0..)` for all capabilities in the schema b) making all attributes independent (`@independent`) so they are not cleaned up between transactions without owners if they are not yet received c) making all relations independent (a new system `system independent` property, not exposed to users) so they are not cleaned up between transactions without role players. All errors at this stage are considered internal errors and bugs. 4. After the schema is prepared, we receive items, decode them into concepts, and persist in the database. There is a transaction buffer size we use to execute commits from time to time, reducing memory consumption and final commit time. This optimization requires 3b and 3c. One of the expected and required items is checksums. Another optional item is the header (we don't really use it outside of logs, so it's not required. Not sure if we need to force it). 5. After the `Done` message is received, signaling that the stream of items is completed, we perform the final data commit and unrelax the schema (undo 3). 6. If everything is good, a verification `Done` response is sent. At any point, an error can be returned. To make it possible, the protocol introduces streaming from the server, although it remains silent until the end unless there are errors in the provided messages. While being imported, databases are not accessible through `database_manager`: they are owned by `database_importer`. If a server crashes, the uncompleted databases will be cleaned up on the next bootup. To avoid overflowing memory, `InstanceIDMapping` uses `SpilloverCache`, a new component combining `HashMap` and RocksDB to spill over the excessive data which is too much to fit in memory.

## Usage and product changes Introduce interfaces to export databases into schema definition and data files and to import databases using these files. Database import supports files exported from both TypeDB 2.x and TypeDB 3.x. Both operations are blocking and may take a significant amount of time to execute for large databases. Use parallel connections to continue operating with the server and its other databases. Usage examples in Rust: ```rust // export let db = driver.databases().get(db_name).await.unwrap(); db.export_to_file(schema_file_path, data_file_path).await.unwrap(); // import let schema = read_to_string(schema_file_path).unwrap(); driver.databases().import_from_file(db_name2, schema, data_file_path).await.unwrap(); ``` Usage examples in Python: ```py # export database = driver.databases.get(db_name) database.export_to_file(schema_file_path, data_file_path) # import with open(schema_file_path, 'r', encoding='utf-8') as f: schema = f.read() driver.databases.import_from_file(db_name2, schema, data_file_path) ``` Usage examples in Java: ```java // export Database database = driver.databases().get(dbName); database.exportToFile(schemaFilePath, dataFilePath); // import String schema = Files.readString(Path.of(schemaFilePath)); driver.databases().importFromFile(dbName2, schema, dataFilePath); ``` ## Implementation Implemented the updated [protocol](typedb/typedb-protocol#224). As both operations work with streaming, the implementation is similar to transactions. The behavior is split into the file processing logic and networking (specialized for sync and async modes). The exposed interfaces present only the file-based versions, but additional interfaces for direct work with streams can be presented in future updates. In Rust, paths are accepted as Rust `Path`s. In other languages working through C interfaces, it's pure strings for C layer transmission. ### Database export Implemented through the `database` interface and accepts two target files for export. Does not require a specific format of naming (so it's necessarily `.typeql` or `.typedb`. If any of the target files already exist, an error is returned. The export operation consists of these steps: * prepare the output files * open a unidirectional GRPC stream from the server to the client * "block" on server response listening until an error or a "done" is received (blocking is implemented through a loop which resolves a `listen` promise presented by the network layer) * if there is a schema message, write it to the schema file and flush it right away * if there is a data items message, encode the items and write them to the data file * in case of an error, the output files are deleted (we own them as we create them at the beginning) The network layer is basically just a task listening for the GRPC stream and transmitting the converted messages to the processing loop. ### Database import Implemented through the `database_manager` interface and accepts a database name, a schema definition query string (can be read from the exported file), and an exported data file. No naming requirements as well. The import operation consists of these steps: * open the input file * open a bidirectional GRPC stream between the server and the client, send the initial request with the database's name and schema * eagerly start reading and decoding data items from the input file one by one, storing up to 250 items in the buffer (this number can be easily changed) * once the buffer is full, attempt items sending operation: this operation will check a potential early error signal from the server and then send the batch, returning to processing the rest of the file * once the file is read, send a "done" message and block until the server responds with either an error or its "done" message The network layer consists of a blocking task for client-side requests and a listening task waiting for a one-shot signal from the server (either an error or a "done" message). Errors can be received at any time of processing, while "done" is expected only after a client-side "done" request. When a response is received, either an async or a sync sink receives this message, which should be checked before any client-side network operations to ensure proper interruption.

farost added 12 commits May 8, 2025 12:36

Introduce migration items

2dc98f3

Propagate migration to the drivers

9ef738d

Rename migration items and values

7ec1478

Add migration import and export messages

0e6532a

Bump version

df787bb

Fix build

4520ca7

Add done requests to import

8c9dae8

Rename repeated fields and simplify the api by removing non-terminal …

4fb10af

…errors

More refactoring to make export-import symmetrical

04677a7

repeated role -> repeated roles

a7c97ff

Return owned attributes to attributes

83555a0

Add server streaming to the import operation. Simplify import client

1480387

farost changed the title ~~Add database import and export protocol messages~~ Introduce database import and export protocol messages May 27, 2025

Update comments

59ad51a

farost commented May 27, 2025

View reviewed changes

farost requested review from flyingsilverfin and dmitrii-ubskii May 27, 2025 09:36

farost added type: feature priority: high labels May 27, 2025

farost mentioned this pull request May 27, 2025

Introduce file-based database export and import typedb/typedb#7477

Merged

farost commented May 29, 2025

View reviewed changes

farost marked this pull request as ready for review May 29, 2025 08:40

farost requested a review from haikalpribadi as a code owner May 29, 2025 08:40

farost mentioned this pull request May 29, 2025

Introduce file-based database export and import typedb/typedb-driver#758

Merged

dmitrii-ubskii approved these changes May 30, 2025

View reviewed changes

Update fields names and comments

3779844

flyingsilverfin reviewed Jun 3, 2025

View reviewed changes

flyingsilverfin approved these changes Jun 9, 2025

View reviewed changes

Add an ASCII art ATTENTION screaming at us

72ae649

farost merged commit f6528be into typedb:master Jun 9, 2025

farost deleted the 3.x-export-import branch June 9, 2025 16:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Introduce database import and export protocol messages #224

Introduce database import and export protocol messages #224

Uh oh!

farost commented May 8, 2025 •

edited

Loading

Uh oh!

farost May 27, 2025

Uh oh!

flyingsilverfin Jun 3, 2025

Uh oh!

farost May 27, 2025

Uh oh!

farost May 29, 2025

Uh oh!

dmitrii-ubskii left a comment

Uh oh!

dmitrii-ubskii May 29, 2025

Uh oh!

farost May 30, 2025

Uh oh!

dmitrii-ubskii May 29, 2025

Uh oh!

farost May 30, 2025

Uh oh!

dmitrii-ubskii May 29, 2025

Uh oh!

flyingsilverfin Jun 3, 2025

Uh oh!

farost Jun 3, 2025

Uh oh!

flyingsilverfin Jun 9, 2025

Uh oh!

flyingsilverfin left a comment

Uh oh!

Uh oh!

	int64 datetime_millis = 5; // reserved for 2.x, time since epoch
	int64 datetime_millis = 5; // compatibility with 2.x, milliseconds since epoch

Introduce database import and export protocol messages #224

Introduce database import and export protocol messages #224

Uh oh!

Conversation

farost commented May 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Release notes: usage and product changes

Implementation

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dmitrii-ubskii left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

flyingsilverfin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

farost commented May 8, 2025 •

edited

Loading