Skip to content

Use pg_export_snapshot for consistent parallel reads#22

Merged
nafg merged 1 commit intomasterfrom
snapshot-isolation
Feb 12, 2026
Merged

Use pg_export_snapshot for consistent parallel reads#22
nafg merged 1 commit intomasterfrom
snapshot-isolation

Conversation

@nafg
Copy link
Collaborator

@nafg nafg commented Feb 12, 2026

Summary

  • Exports a PostgreSQL snapshot from a coordinator JDBC connection in DbCopier.run() before any table copies begin
  • Each CopyAction imports the snapshot via SET TRANSACTION SNAPSHOT within a REPEATABLE READ transaction on the source, ensuring all parallel table reads see the same consistent point-in-time view
  • Prevents FK violations when copying from a live database where new rows can appear between parent and child table copies

Test plan

  • Compiles on both Scala 2.13 and Scala 3
  • All 7 unit test suites pass
  • Integration tests (require Docker)
  • Manual test against live PostgreSQL database with concurrent writes

🤖 Generated with Claude Code

Copilot AI review requested due to automatic review settings February 12, 2026 21:52
@nafg nafg force-pushed the snapshot-isolation branch from bf655b0 to 872dec9 Compare February 12, 2026 21:56
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements PostgreSQL snapshot-based consistency for parallel table copying to prevent foreign key violations when copying from a live database. It exports a snapshot from a coordinator connection before copying begins and imports that snapshot into each parallel worker transaction.

Changes:

  • Exports a PostgreSQL snapshot at the start of DbCopier.run() via pg_export_snapshot()
  • Passes the snapshot ID through copyTablesByLevel and TableCopier.run to CopyAction
  • Each CopyAction imports the snapshot using SET TRANSACTION SNAPSHOT within a REPEATABLE READ transaction

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
simple-anonymizer/src/scala/simpleanonymizer/DbCopier.scala Creates coordinator connection, exports snapshot, passes snapshot ID to table copiers, closes connection via andThen
simple-anonymizer/src/scala/simpleanonymizer/TableCopier.scala Adds optional snapshotId parameter and passes it to CopyAction; minor whitespace change
simple-anonymizer/src/scala/simpleanonymizer/CopyAction.scala Imports snapshot via SET TRANSACTION SNAPSHOT before executing source SELECT query
Comments suppressed due to low confidence (1)

simple-anonymizer/src/scala/simpleanonymizer/TableCopier.scala:27

  • The snapshotId parameter has a default value of None, which allows TableCopier to be used independently without snapshot consistency. However, when called from DbCopier.copyTablesByLevel, it always passes Some(snapshotId). Document this behavior in the method's scaladoc to clarify that the default None is for standalone usage while DbCopier will always provide a snapshot ID for consistent parallel reads.
          target.db

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@nafg nafg force-pushed the snapshot-isolation branch from 872dec9 to 38cf1d0 Compare February 12, 2026 22:02
Copilot AI review requested due to automatic review settings February 12, 2026 22:17
@nafg nafg force-pushed the snapshot-isolation branch from 38cf1d0 to fd2cc5c Compare February 12, 2026 22:17
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

When copying from a live database, new rows can appear between table
copies, causing FK violations (e.g. child table references a parent row
that was inserted after the parent table was copied). Fix by exporting a
PostgreSQL snapshot from a coordinator connection and importing it in
each worker's source-read transaction via SET TRANSACTION SNAPSHOT.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@nafg nafg merged commit f24defb into master Feb 12, 2026
8 checks passed
@nafg nafg deleted the snapshot-isolation branch February 12, 2026 23:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments