Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: stripping indexes and adding them later, and testing #273

Merged
merged 12 commits into from
Oct 6, 2023
6 changes: 3 additions & 3 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ repos:
name: flake8
entry: flake8
language: system
types: [python]
types: [python3]
exclude: "^(test/*|examples/*|noxfile.py)"
require_serial: true
args: ["--config=.flake8"]
Expand All @@ -40,13 +40,13 @@ repos:
description: Automatically upgrade syntax for newer versions.
entry: pyupgrade
language: system
types: [python]
types: [python3]
args: [--py39-plus, --keep-runtime-typing]
- id: reorder-python-imports
name: Reorder python imports
entry: reorder-python-imports
language: system
types: [python]
types: [python3]
args: [--application-directories=src]
- id: trailing-whitespace
name: Trim Trailing Whitespace
Expand Down
9 changes: 5 additions & 4 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,12 @@ FROM python:3.11-slim
COPY ./ /opt/pgbelt
WORKDIR /opt/pgbelt

RUN set -e \
&& apt-get -y update \
&& apt-get -y install postgresql-client \
&& apt-get -y install gcc

RUN set -e \
&& python -m pip install --upgrade pip \
&& pip install poetry poetry-dynamic-versioning \
&& poetry install

RUN set -e \
&& apt-get -y update \
&& apt-get -y install postgresql-client
19 changes: 14 additions & 5 deletions docs/quickstart.md
Original file line number Diff line number Diff line change
Expand Up @@ -128,26 +128,34 @@ You can check the status of the migration, database hosts, replication delay, et

$ belt status testdatacenter1

## Step 2: Run ANALYZE on the target database before your application cutover
## Step 2: Create Indexes on the target database before your application cutover

To ensure the bulk COPY phase of the migration runs faster, indexes are not made in the destination database during setup.
They need to be built and this process should be done before the cutover to not prolong your cutover window. You should run
this command during a period of low traffic.

$ belt create-indexes testdatacenter1 database1

## Step 3: Run ANALYZE on the target database before your application cutover

This is typically run some time before your application cutover, so the target database performs better with the dataset
once the application cuts over to the target database.

$ belt analyze testdatacenter1 database1

## Step 3: Stop write traffic to your source database
## Step 4: Stop write traffic to your source database

This would be the beginning of your application downtime. We revoke all login permissions on the source host using `belt` to ensure writes can no longer occur. You may want to do this, then restart Postgres connections on your application to ensure connections can no longer write.

$ belt revoke-logins testdatacenter1 database1

## Step 4: Stop forward replication
## Step 5: Stop forward replication

Once write traffic has stopped on the source database, we need to stop replication in the forward direction.

$ belt teardown-forward-replication testdatacenter1 database1

## Step 5: Sync all the missing bits from source to destination (that could not be done by replication)
## Step 6: Sync all the missing bits from source to destination (that could not be done by replication)

PgLogical (used for the actual replication) can't handle the following:

Expand All @@ -162,14 +170,15 @@ Therefore the next command will do the following:
- Sync sequence values
- Dump and load tables without Primary Keys
- Add NOT VALID constraints to the target schema (they were removed in Step 1 in the target database)
- Create Indexes (as long as this was run in Step 2, this will be glossed over. If step 2 was missed, indexes will build now amnd this will take longer than expected).
- Validate data (take 100 random rows and 100 last rows of each table, and compare data)
- Run ANALYZE to ensure optimal performance

```
$ belt sync testdatacenter1 database1
```

## Step 6: Enable write traffic to the destination host
## Step 7: Enable write traffic to the destination host

This is done outside of PgBelt, with your application. Note -- reverse replication will be ongoing until you feel a rollback is unnecessary. To stop reverse replication, simply run the following:

Expand Down
49 changes: 45 additions & 4 deletions pgbelt/cmd/schema.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,20 +4,25 @@
from pgbelt.config.models import DbupgradeConfig
from pgbelt.util.dump import apply_target_constraints
from pgbelt.util.dump import apply_target_schema
from pgbelt.util.dump import create_target_indexes
from pgbelt.util.dump import dump_dst_not_valid_constraints
from pgbelt.util.dump import dump_source_schema
from pgbelt.util.dump import dump_dst_create_index
from pgbelt.util.dump import remove_dst_not_valid_constraints
from pgbelt.util.dump import remove_dst_indexes
from pgbelt.util.logs import get_logger


@run_with_configs
async def dump_schema(config_future: Awaitable[DbupgradeConfig]) -> None:
"""
Dumps and sanitizes the schema from the source database, then saves it to
a file. Three files will be generated. One contains the entire sanitized
schema, one contains the schema with all NOT VALID constraints removed, and
another contains only the NOT VALID constraints that were removed. These
files will be saved in the schemas directory.
a file. Four files will be generated:
1. The entire sanitized schema
2. The schema with all NOT VALID constraints and CREATE INDEX statements removed,
3. A file that contains only the CREATE INDEX statements
4. A file that contains only the NOT VALID constraints
These files will be saved in the schemas directory.
"""
conf = await config_future
logger = get_logger(conf.db, conf.dc, "schema.src")
Expand Down Expand Up @@ -74,6 +79,42 @@ async def remove_constraints(config_future: Awaitable[DbupgradeConfig]) -> None:
await remove_dst_not_valid_constraints(conf, logger)


@run_with_configs(skip_src=True)
async def dump_indexes(config_future: Awaitable[DbupgradeConfig]) -> None:
"""
Dumps the CREATE INDEX statements from the target database onto disk, in
the schemas directory.
"""
conf = await config_future
logger = get_logger(conf.db, conf.dc, "schema.dst")
await dump_dst_create_index(conf, logger)


@run_with_configs(skip_src=True)
async def remove_indexes(config_future: Awaitable[DbupgradeConfig]) -> None:
"""
Removes indexes from the target database. This must be done
before setting up replication, and should only be used if the schema in the
target database was loaded outside of pgbelt.
"""
conf = await config_future
logger = get_logger(conf.db, conf.dc, "schema.dst")
await remove_dst_indexes(conf, logger)


@run_with_configs(skip_src=True)
async def create_indexes(config_future: Awaitable[DbupgradeConfig]) -> None:
"""
Creates indexes from the file schemas/dc/db/indexes.sql into the destination
as the owner user. This must only be done after most data is synchronized
(at minimum after the initializing phase) from the source to the destination
database.
"""
conf = await config_future
logger = get_logger(conf.db, conf.dc, "schema.dst")
await create_target_indexes(conf, logger, during_sync=False)


COMMANDS = [
dump_schema,
load_schema,
Expand Down
10 changes: 9 additions & 1 deletion pgbelt/cmd/sync.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
from pgbelt.cmd.helpers import run_with_configs
from pgbelt.config.models import DbupgradeConfig
from pgbelt.util.dump import apply_target_constraints
from pgbelt.util.dump import create_target_indexes
from pgbelt.util.dump import dump_source_tables
from pgbelt.util.dump import load_dumped_tables
from pgbelt.util.logs import get_logger
Expand Down Expand Up @@ -214,12 +215,19 @@ async def sync(config_future: Awaitable[DbupgradeConfig]) -> None:
_dump_and_load_all_tables(conf, src_pool, src_logger, dst_logger),
)

# Creating indexes should run before validations and ANALYZE, but after all the data exists
# in the destination database.

await gather(
apply_target_constraints(conf, dst_logger),
create_target_indexes(conf, dst_logger, during_sync=True),
)

await gather(
compare_100_rows(src_pool, dst_owner_pool, conf.tables, validation_logger),
compare_latest_100_rows(
src_pool, dst_owner_pool, conf.tables, validation_logger
),
apply_target_constraints(conf, dst_logger),
run_analyze(dst_owner_pool, dst_logger),
)
finally:
Expand Down
13 changes: 5 additions & 8 deletions pgbelt/cmd/teardown.py
Original file line number Diff line number Diff line change
Expand Up @@ -82,14 +82,11 @@ async def teardown(

if full:
await sleep(15)
async with create_pool(conf.src.owner_uri, min_size=1) as src_owner_pool:
async with create_pool(
conf.dst.owner_uri, min_size=1
) as dst_owner_pool:
await gather(
revoke_pgl(src_owner_pool, conf.tables, src_logger),
revoke_pgl(dst_owner_pool, conf.tables, dst_logger),
)

await gather(
revoke_pgl(src_root_pool, conf.tables, src_logger),
revoke_pgl(dst_root_pool, conf.tables, dst_logger),
)

await gather(
teardown_pgl(src_root_pool, src_logger),
Expand Down
Loading