Use pg_export_snapshot for consistent parallel reads#22
Conversation
bf655b0 to
872dec9
Compare
There was a problem hiding this comment.
Pull request overview
This PR implements PostgreSQL snapshot-based consistency for parallel table copying to prevent foreign key violations when copying from a live database. It exports a snapshot from a coordinator connection before copying begins and imports that snapshot into each parallel worker transaction.
Changes:
- Exports a PostgreSQL snapshot at the start of
DbCopier.run()viapg_export_snapshot() - Passes the snapshot ID through
copyTablesByLevelandTableCopier.runtoCopyAction - Each
CopyActionimports the snapshot usingSET TRANSACTION SNAPSHOTwithin aREPEATABLE READtransaction
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| simple-anonymizer/src/scala/simpleanonymizer/DbCopier.scala | Creates coordinator connection, exports snapshot, passes snapshot ID to table copiers, closes connection via andThen |
| simple-anonymizer/src/scala/simpleanonymizer/TableCopier.scala | Adds optional snapshotId parameter and passes it to CopyAction; minor whitespace change |
| simple-anonymizer/src/scala/simpleanonymizer/CopyAction.scala | Imports snapshot via SET TRANSACTION SNAPSHOT before executing source SELECT query |
Comments suppressed due to low confidence (1)
simple-anonymizer/src/scala/simpleanonymizer/TableCopier.scala:27
- The snapshotId parameter has a default value of None, which allows TableCopier to be used independently without snapshot consistency. However, when called from DbCopier.copyTablesByLevel, it always passes Some(snapshotId). Document this behavior in the method's scaladoc to clarify that the default None is for standalone usage while DbCopier will always provide a snapshot ID for consistent parallel reads.
target.db
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
872dec9 to
38cf1d0
Compare
38cf1d0 to
fd2cc5c
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
When copying from a live database, new rows can appear between table copies, causing FK violations (e.g. child table references a parent row that was inserted after the parent table was copied). Fix by exporting a PostgreSQL snapshot from a coordinator connection and importing it in each worker's source-read transaction via SET TRANSACTION SNAPSHOT. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
fd2cc5c to
9bd5864
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Summary
DbCopier.run()before any table copies beginCopyActionimports the snapshot viaSET TRANSACTION SNAPSHOTwithin aREPEATABLE READtransaction on the source, ensuring all parallel table reads see the same consistent point-in-time viewTest plan
🤖 Generated with Claude Code