Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: sync command breakdown update and remove useless commands #622

Merged
merged 3 commits into from
Nov 19, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 35 additions & 0 deletions docs/playbook.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,3 +55,38 @@ Run the following commands:
Note that the first four commands will remove all replication job setup from the databases. `remove-constraints` removes NOT VALID constraints from the target schema so when you restart replication, they don't cause failed inserts (these must not exist during the initial setup). `remove-indexes` removes all indexes from the target schema to help speed up the initial bulk load. `remove-indexes` is not necessary to run, you may skip this if needed.

After running these commands, you can `TRUNCATE` the tables in the destination database and start the migration from the beginning. **Please take as much precaution as possible when running TRUNCATE, as it will delete all data in the tables. Especially please ensure you are running this on the correct database!**

## My `sync` command has failed or is hanging. What can I do?

The `sync` command from Step 7 of the Quickstart guide does the following:

- Sync sequence values
- Dump and load tables without Primary Keys
- Add NOT VALID constraints to the target schema (they were removed in Step 1 in the target database)
- Create Indexes (as long as this was run in Step 2, this will be glossed over. If step 2 was missed, indexes will build now amd this will take longer than expected).
- Validate data (take 100 random rows and 100 last rows of each table, and compare data)
- Run ANALYZE to ensure optimal performance

If the `sync` command fails, you can try to run the individual commands that make up the `sync` command to see where the failure is. The individual commands are:

1. Syncing Sequences:

- `sync-sequences` - reads and sets sequences values from SRC to DST at the time of command execution

2. Syncing Tables without Primary Keys:

- `dump-tables` - dumps only tables without Primary Keys (to ensure only tables without Primary Keys are dumped, DO NOT specify the `--tables` flag for this command)
- `load-tables` - load into DST DB the tables from the `dump-tables` command (found on disk)

3. Syncing NOT VALID Constraints:

- `dump-schema` - dumps schema from your SRC DB schema onto disk (the files may already be on disk, but run this command just to ensure they exist anyways)
- `load-constraints` - load NOT VALID constraints from disk (obtained by the `dump-schema` command) to your DST DB schema

4. Creating Indexes & Running ANALYZE:

- `create-indexes` - Create indexes on the target database, and then runs ANALYZE as well.

5. Validating Data:

- `validate-data` - Check random 100 rows and last 100 rows of every table involved in the replication job, and ensure all match exactly.
12 changes: 2 additions & 10 deletions docs/quickstart.md
Original file line number Diff line number Diff line change
Expand Up @@ -179,23 +179,15 @@ Therefore the next command will do the following:
- Sync sequence values
- Dump and load tables without Primary Keys
- Add NOT VALID constraints to the target schema (they were removed in Step 1 in the target database)
- Create Indexes (as long as this was run in Step 2, this will be glossed over. If step 2 was missed, indexes will build now amnd this will take longer than expected).
- Create Indexes (as long as this was run in Step 2, this will be glossed over. If step 2 was missed, indexes will build now amd this will take longer than expected).
- Validate data (take 100 random rows and 100 last rows of each table, and compare data)
- Run ANALYZE to ensure optimal performance

```
$ belt sync testdatacenter1 database1
```

If the above command fails, you can diagnose and run the individual steps with the following commands:

- `sync-sequences` - reads and sets sequences values from SRC to DST at the time of command execution
- `dump-tables` - dumps only tables without Primary Keys
- `load-tables` - load into DST DB the tables from the `dump-tables` command (found on disk)
- `dump-contraints` - dumps NOT VALID constraints from your SRC DB schema onto disk
- `load-constraints` - load NOT VALID constraints from disk to your DST DB schema
- `validate-data` - Check random 100 rows and last 100 rows of every table involved in the replication job, and ensure all match exactly.
- `analyze` - Run ANALYZE on the database
If the above command fails, please see the `playbook.md` document in this repository for more information on how to resolve the issue.

## Step 8: Enable write traffic to the destination host

Expand Down
35 changes: 0 additions & 35 deletions pgbelt/cmd/sync.py
Original file line number Diff line number Diff line change
Expand Up @@ -108,40 +108,6 @@ async def load_tables(
await load_dumped_tables(conf, tables, logger)


@run_with_configs
async def sync_tables(
config_future: Awaitable[DbupgradeConfig],
tables: list[str] = Option([], help="Specific tables to sync"),
):
"""
Dump and load all tables from the source database to the destination database.
Equivalent to running dump-tables followed by load-tables. Table data will be
saved locally in files.

You may also provide a list of tables to sync with the
--tables option and only these tables will be synced.
"""
conf = await config_future
src_logger = get_logger(conf.db, conf.dc, "sync.src")
dst_logger = get_logger(conf.db, conf.dc, "sync.dst")

if tables:
dump_tables = tables.split(",")
else:
async with create_pool(conf.src.pglogical_uri, min_size=1) as src_pool:
_, dump_tables, _ = await analyze_table_pkeys(
src_pool, conf.schema_name, src_logger
)

if conf.tables:
dump_tables = [t for t in dump_tables if t in conf.tables]

await dump_source_tables(conf, dump_tables)
await load_dumped_tables(
conf, [] if not tables and not conf.tables else dump_tables, dst_logger
)


@run_with_configs(skip_src=True)
async def analyze(config_future: Awaitable[DbupgradeConfig]) -> None:
"""
Expand Down Expand Up @@ -276,7 +242,6 @@ async def sync(
sync_sequences,
dump_tables,
load_tables,
sync_tables,
analyze,
validate_data,
sync,
Expand Down