-
Notifications
You must be signed in to change notification settings - Fork 134
experiment: improve migration performance #1157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
WalkthroughMultiple bucket storage migrations replaced OFFSET/LIMIT paging with deterministic row_number()‑based batching (and associated index/selection changes), introduced a temporary enumerated table for one migration, and one migration was converted to an effectively no-op by commenting out its SQL body. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Runner as MigrationRunner
participant DB as Database
rect rgba(214,234,248,0.35)
note over Runner,DB: General pattern — compute row_number(), batch by row_number range, update/insert per batch
end
Runner->>DB: WITH view AS (\n SELECT ..., row_number() OVER (ORDER BY <deterministic_key>) AS row_number\n FROM ... WHERE ...\n)\nSELECT COUNT(*) / prepare index
DB-->>Runner: view/index ready
loop For each batch (row_number window)
Runner->>DB: SELECT rows WHERE row_number >= start AND row_number < start + batch_size
DB-->>Runner: batch rows
Runner->>DB: UPDATE/INSERT ... WHERE row_number BETWEEN ...
DB-->>Runner: affected rows
Runner->>DB: COMMIT (optional) and PERFORM pg_notify(...) (optional)
end
sequenceDiagram
autonumber
participant Runner as MigrationRunner
participant DB as Database
rect rgba(253,235,208,0.35)
note over Runner,DB: Migration 17 — build temp enumerated table then batch-update via join to temp table
end
Runner->>DB: CREATE TEMP TABLE transactions_ids AS\n SELECT moves.seq AS moves_seq, transactions.id AS id,\n row_number() OVER (ORDER BY moves.seq) AS row_number\n FROM moves LEFT JOIN transactions ON ...\n WHERE moves.transactions_id IS NULL
DB-->>Runner: temp table created
Runner->>DB: CREATE INDEX ON transactions_ids(row_number) INCLUDE (...)\nANALYZE transactions_ids
loop For each batch (row_number window)
Runner->>DB: SELECT batch rows FROM transactions_ids WHERE row_number BETWEEN start AND end
DB-->>Runner: batch rows
Runner->>DB: UPDATE moves SET transactions_id = t.id FROM transactions_ids t WHERE moves.seq = t.moves_seq AND t.row_number BETWEEN ...
DB-->>Runner: update result
Runner->>DB: COMMIT\nPERFORM pg_notify(...)
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
Comment |
dc84997 to
c67137b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
internal/storage/bucket/migrations/18-transactions-fill-inserted-at/up.sql (1)
26-26: Index on logs_transactions may not be efficiently utilized.The index is created on
(ledger, transaction_id) include (date), but the CTE filters the temporary table usingrow_number >= i and row_number < i + _batch_size. Sincerow_numberis not indexed, this range filter executes as a full scan of the temporary table, and the index on(ledger, transaction_id)is not used in the CTE predicate.While this is acceptable for a temporary migration table (which should be small after type filtering), consider whether an index on
row_numberor reordering the logic would be more efficient. For now, this is likely not a blocking issue, but it may warrant measurement.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
internal/storage/bucket/migrations/18-transactions-fill-inserted-at/up.sql(2 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: gfyrag
Repo: formancehq/ledger PR: 1017
File: internal/storage/bucket/migrations/27-fix-invalid-pcv/up.sql:30-31
Timestamp: 2025-07-19T18:35:34.260Z
Learning: In Formance ledger migrations, SHARE ROW EXCLUSIVE locks on migration-specific temporary tables like moves_view are acceptable since these tables are dedicated for the migration. ACCESS SHARE locks on transactions table don't block writes, only schema changes.
📚 Learning: 2025-05-20T13:48:07.455Z
Learnt from: gfyrag
Repo: formancehq/ledger PR: 935
File: internal/controller/system/state_tracker.go:0-0
Timestamp: 2025-05-20T13:48:07.455Z
Learning: In the Formance ledger codebase, sequence reset queries with `select setval` don't require COALESCE around max(id) even for brand new ledgers, as the system handles this case properly.
Applied to files:
internal/storage/bucket/migrations/18-transactions-fill-inserted-at/up.sql
📚 Learning: 2025-04-29T11:24:28.923Z
Learnt from: gfyrag
Repo: formancehq/ledger PR: 892
File: internal/controller/ledger/controller_default.go:196-196
Timestamp: 2025-04-29T11:24:28.923Z
Learning: In the ledger Import function, it's critical to maintain proper log ID tracking by updating lastLogID with the current log.ID after each processed log, rather than setting it to nil. This ensures the system can properly validate the ordering of logs and prevent duplicate or out-of-order processing, which is essential for maintaining data integrity in the ledger.
Applied to files:
internal/storage/bucket/migrations/18-transactions-fill-inserted-at/up.sql
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Tests
- GitHub Check: GoReleaser
- GitHub Check: Dirty
🔇 Additional comments (3)
internal/storage/bucket/migrations/18-transactions-fill-inserted-at/up.sql (3)
33-42: Row_number pagination with intermediate commits is safe.The use of
row_numberfor deterministic pagination is a good improvement over offset/limit, especially for large datasets. Since the temporary tablelogs_transactionsis immutable during the loop androw_numberis computed once during table creation, intermediate commits at line 44 do not affect row numbering consistency. This approach ensures batches are processed deterministically even if the migration is retried.The guard condition at line 42 (
and inserted_at is null) correctly prevents re-updating rows in case the migration is run multiple times, making it idempotent.
8-8: Trigger management approach is sound.The migration correctly drops and recreates the
update_transaction_metadata_historytrigger with a condition on themetadatacolumn. This prevents unnecessary trigger invocations for non-metadata updates (e.g., inserted_at). Based on learnings, ACCESS SHARE locks on the transactions table and SHARE ROW EXCLUSIVE locks on temporary tables are acceptable for this migration context.Also applies to: 14-18
22-24: The transaction type filtering is intentional and correct.The query filters for only
NEW_TRANSACTIONandREVERTED_TRANSACTIONbecause these are the only log types that:
- Contain a
'transaction'object in their JSON data structure- Create or affect transaction records (which need
inserted_atpopulated)Other types like
SET_METADATAandDELETE_METADATAmodify transaction metadata but don't create transaction records—they use'targetType'and'targetId'instead of embedding a'transaction'object. Attempting to extract(data->'transaction'->>'id')from these types would return NULL, defeating the migration's purpose.The filtering aligns with the business logic shown in the trigger handlers (migrations 0 and 4), where each type branches to different operations: transaction creation/reversion vs. metadata management.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1157 +/- ##
==========================================
+ Coverage 82.13% 82.17% +0.04%
==========================================
Files 190 190
Lines 9319 9319
==========================================
+ Hits 7654 7658 +4
+ Misses 1213 1211 -2
+ Partials 452 450 -2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
a0c512f to
a70c5c4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
internal/storage/bucket/migrations/18-transactions-fill-inserted-at/up.sql (1)
22-24: Consider usingINoperator for type filtering instead ofOR.The WHERE clause uses
ORwhich may prevent index usage on thelogs.typecolumn. For better query optimization, consider using theINoperator:- where type = 'NEW_TRANSACTION' or type = 'REVERTED_TRANSACTION'; + where type IN ('NEW_TRANSACTION', 'REVERTED_TRANSACTION');This provides the same semantics but is more amenable to index optimization and is idiomatic in SQL.
internal/storage/bucket/migrations/17-moves-fill-transaction-id/up.sql (1)
21-21: TODO comment indicates incomplete work.Line 21 contains a TODO comment (
-- todo: add in go libs) that suggests future work to move this logic into a Go library. Please clarify:
- Is this TODO part of the current scope, or is it deferred?
- Should this TODO be tracked in an issue or pull request?
If you'd like, I can help create a follow-up issue to track this work.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
internal/storage/bucket/migrations/17-moves-fill-transaction-id/up.sql(1 hunks)internal/storage/bucket/migrations/18-transactions-fill-inserted-at/up.sql(2 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: gfyrag
Repo: formancehq/ledger PR: 1017
File: internal/storage/bucket/migrations/27-fix-invalid-pcv/up.sql:30-31
Timestamp: 2025-07-19T18:35:34.260Z
Learning: In Formance ledger migrations, SHARE ROW EXCLUSIVE locks on migration-specific temporary tables like moves_view are acceptable since these tables are dedicated for the migration. ACCESS SHARE locks on transactions table don't block writes, only schema changes.
📚 Learning: 2025-05-20T13:48:07.455Z
Learnt from: gfyrag
Repo: formancehq/ledger PR: 935
File: internal/controller/system/state_tracker.go:0-0
Timestamp: 2025-05-20T13:48:07.455Z
Learning: In the Formance ledger codebase, sequence reset queries with `select setval` don't require COALESCE around max(id) even for brand new ledgers, as the system handles this case properly.
Applied to files:
internal/storage/bucket/migrations/18-transactions-fill-inserted-at/up.sqlinternal/storage/bucket/migrations/17-moves-fill-transaction-id/up.sql
📚 Learning: 2025-04-29T11:24:28.923Z
Learnt from: gfyrag
Repo: formancehq/ledger PR: 892
File: internal/controller/ledger/controller_default.go:196-196
Timestamp: 2025-04-29T11:24:28.923Z
Learning: In the ledger Import function, it's critical to maintain proper log ID tracking by updating lastLogID with the current log.ID after each processed log, rather than setting it to nil. This ensures the system can properly validate the ordering of logs and prevent duplicate or out-of-order processing, which is essential for maintaining data integrity in the ledger.
Applied to files:
internal/storage/bucket/migrations/18-transactions-fill-inserted-at/up.sql
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: GoReleaser
- GitHub Check: Tests
- GitHub Check: Dirty
🔇 Additional comments (4)
internal/storage/bucket/migrations/18-transactions-fill-inserted-at/up.sql (2)
26-26: Index strategy aligns with batch-processing pattern.The index on
(row_number) include (...)is well-suited for the batching strategy used in the loop (lines 33–47). The covering index eliminates the need to access the base table for the batch range queries.
33-42: Batch paging logic and safety predicate look correct.The row_number–based batching (
row_number >= i and row_number < i + _batch_size) provides deterministic, non-overlapping batches without reliance on date ordering or OFFSET. The additionalinserted_at is nullpredicate (line 42) is a good defensive check that prevents re-updating already-filled rows and guards against idempotency issues.internal/storage/bucket/migrations/17-moves-fill-transaction-id/up.sql (2)
8-13: Verify join condition and table schema.The join on
transactions.seq = moves.transactions_seq(line 12) assumes thatmoves.transactions_seqexists and is properly aligned withtransactions.seq. Additionally, the INNER JOIN (implicit) filters to only moves with valid transaction matches. Please confirm:
- The
moves.transactions_seqfield exists and has the correct domain/type matchingtransactions.seq.- Whether an INNER JOIN is intentional (excluding orphaned
movesrows) or if a LEFT JOIN might be safer for data integrity validation.
25-39: Batch processing structure is sound.The per-batch UPDATE with row_number–based range queries ensures small, deterministic batches without long-running transactions. The COMMIT inside the loop (line 36) prevents transaction bloat and allows concurrent reads between batches. The pattern aligns well with migration 18 and supports the performance optimization goal.
a70c5c4 to
9c5f381
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
internal/storage/bucket/migrations/17-moves-fill-transaction-id/up.sql(1 hunks)internal/storage/bucket/migrations/18-transactions-fill-inserted-at/up.sql(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- internal/storage/bucket/migrations/18-transactions-fill-inserted-at/up.sql
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: gfyrag
Repo: formancehq/ledger PR: 1017
File: internal/storage/bucket/migrations/27-fix-invalid-pcv/up.sql:30-31
Timestamp: 2025-07-19T18:35:34.260Z
Learning: In Formance ledger migrations, SHARE ROW EXCLUSIVE locks on migration-specific temporary tables like moves_view are acceptable since these tables are dedicated for the migration. ACCESS SHARE locks on transactions table don't block writes, only schema changes.
📚 Learning: 2025-05-20T13:48:07.455Z
Learnt from: gfyrag
Repo: formancehq/ledger PR: 935
File: internal/controller/system/state_tracker.go:0-0
Timestamp: 2025-05-20T13:48:07.455Z
Learning: In the Formance ledger codebase, sequence reset queries with `select setval` don't require COALESCE around max(id) even for brand new ledgers, as the system handles this case properly.
Applied to files:
internal/storage/bucket/migrations/17-moves-fill-transaction-id/up.sql
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Tests
- GitHub Check: GoReleaser
- GitHub Check: Dirty
🔇 Additional comments (2)
internal/storage/bucket/migrations/17-moves-fill-transaction-id/up.sql (2)
17-17: Verify index placement and table lifecycle.The index on
transactions_ids(line 17) will be created on a permanent table if the earlier issue is not fixed. If the table is permanent and persists across runs, the index creation will fail on re-runs.Confirm the intended table scope (temporary vs. permanent) before verifying index handling.
25-39: Row-number batch processing logic is sound, but depends on transaction handling fix.The batch loop correctly uses row_number-based paging to process records in chunks of
_batch_size, which aligns with the performance optimization goals mentioned in the PR. However, this depends on resolving the COMMIT statement issue identified above.Once transaction handling is clarified, verify that the batch window calculation (
where row_number >= i and row_number < i + _batch_size) correctly covers all_maxrows without gaps or overlaps, especially at boundaries.
| -- todo: debug | ||
| --create temporary table transactions_ids as | ||
| create table transactions_ids as | ||
| select row_number() over (order by transactions.seq) as row_number, | ||
| moves.seq as moves_seq, transactions.id, transactions.seq as transactions_seq | ||
| from moves | ||
| where transactions_id is null | ||
| join transactions on transactions.seq = moves.transactions_seq | ||
| where transactions_id is null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Critical: Transaction control and permanent table creation incompatible.
Line 36 contains COMMIT; inside a PL/pgSQL block, which will likely fail or break migration atomicity. Additionally, the commented "todo: debug" (line 8) and shift from create temporary table (commented) to create table (line 10) suggests incomplete refactoring. Permanent tables are not cleaned up and will cause failures on re-runs.
PL/pgSQL does not support standard COMMIT statements in most PostgreSQL contexts. Either:
- Use
create temporary table(as originally intended, per the comment) - Perform cleanup with
drop table if exists transactions_ids - Move transaction boundaries outside the PL/pgSQL block if per-batch commits are required
Apply this diff to use a temporary table (recommended for a migration):
- -- todo: debug
- --create temporary table transactions_ids as
- create table transactions_ids as
+ create temporary table transactions_ids asThen remove the commit; statement on line 36, as migrations should be atomic:
- commit;
-
perform pg_notify('migrations-{{ .Schema }}', 'continue: ' || _batch_size);If per-batch commits are genuinely required (e.g., to prevent long locks), you will need to refactor this as separate transaction batches outside the PL/pgSQL block, or verify your PostgreSQL version and migration framework support autonomous transactions.
I need to verify the claims in the review comment. Let me examine the actual migration file and verify PostgreSQL's PL/pgSQL transaction semantics.
Now let me examine the actual migration file to determine whether it uses a DO block or a function definition:
🏁 Script executed:
#!/bin/bash
# Examine the full migration file structure
find . -name "17-moves-fill-transaction-id" -o -name "*17*moves*fill*" -o -path "*/migrations/*17*.sql" | head -20Length of output: 309
🏁 Script executed:
#!/bin/bash
# Search for the migration file
find . -path "*/internal/storage/bucket/migrations/*" -name "*17*.sql" 2>/dev/nullLength of output: 0
🏁 Script executed:
#!/bin/bash
# Broader search in migrations directory
fd -e sql "17.*moves.*fill|moves.*fill.*17" --max-one-resultLength of output: 233
🏁 Script executed:
#!/bin/bash
# Search more broadly
fd ".*migrations.*" -t d | head -5Length of output: 52
🏁 Script executed:
#!/bin/bash
# List migration files
fd -t f "\.sql$" "migrations" | grep -i "17.*moves\|moves.*17" | head -5Length of output: 97
🏁 Script executed:
#!/bin/bash
# Find the exact file
find . -type f -name "*.sql" 2>/dev/null | xargs grep -l "create.*transactions_ids" 2>/dev/null | head -5Length of output: 72
🏁 Script executed:
#!/bin/bash
# Get all migration files in order
ls -la internal/storage/bucket/migrations/ 2>/dev/null | grep -E "^-.*\.sql$|^d"Length of output: 3266
🏁 Script executed:
#!/bin/bash
# Check if internal/storage/bucket/migrations dir exists and list files
if [ -d "internal/storage/bucket/migrations" ]; then
echo "=== Migrations directory found ==="
ls -1 internal/storage/bucket/migrations/ | sort
else
echo "Directory not found. Searching for migrations..."
find . -type d -name migrations 2>/dev/null | head -5
fiLength of output: 1189
🏁 Script executed:
#!/bin/bash
# Find the migration file and output full content
find . -type f -path "*/migrations/17*" -name "*.sql" 2>/dev/null | while read file; do
echo "=== File: $file ==="
cat "$file"
doneLength of output: 1384
🏁 Script executed:
#!/bin/bash
# Search in all SQL files for the specific transaction_ids table creation
rg -l "transactions_ids" --type sqlLength of output: 70
🏁 Script executed:
#!/bin/bash
# Get the actual content with context
rg "create.*table.*transactions_ids" -A 20 -B 5 --type sqlLength of output: 2743
🏁 Script executed:
#!/bin/bash
# Look for DO blocks and COMMIT statements together
rg "DO\s*\$\$|^COMMIT" --type sql -n | head -30Length of output: 66
🏁 Script executed:
#!/bin/bash
# Search for the full migration file structure
rg "transactions_id" -A 30 -B 5 --type sql 2>/dev/null | head -100Length of output: 10027
🏁 Script executed:
#!/bin/bash
# Direct path check
cat internal/storage/bucket/migrations/17-moves-fill-transaction-id/up.sql 2>/dev/null || echo "File not found at expected path"Length of output: 1098
🏁 Script executed:
#!/bin/bash
# Try with different path variations
find . -name "*17*" -path "*migrations*" -name "*.sql" 2>/dev/nullLength of output: 0
🏁 Script executed:
#!/bin/bash
# Check the structure more carefully
find . -type f -name "up.sql" 2>/dev/null | xargs grep -l "transactions_ids" 2>/dev/nullLength of output: 72
🏁 Script executed:
#!/bin/bash
# Output migration file content if found
for file in $(find . -type f -name "up.sql" 2>/dev/null | xargs grep -l "transactions_ids" 2>/dev/null); do
echo "=== File: $file ==="
head -50 "$file"
doneLength of output: 1185
Now let me check for the DOWN migration and verify the table cleanup strategy:
🏁 Script executed:
#!/bin/bash
# Check for down.sql migration file
ls -la internal/storage/bucket/migrations/17-moves-fill-transaction-id/Length of output: 308
🏁 Script executed:
#!/bin/bash
# Check if there's a down.sql file
cat internal/storage/bucket/migrations/17-moves-fill-transaction-id/down.sql 2>/dev/null || echo "No down.sql found"Length of output: 17
🏁 Script executed:
#!/bin/bash
# Search for any DROP of transactions_ids table
rg "drop.*transactions_ids|transactions_ids.*drop" --type sql -iLength of output: 0
🏁 Script executed:
#!/bin/bash
# Check migration 18 to see if it handles cleanup
cat internal/storage/bucket/migrations/18-transactions-fill-inserted-at/up.sql 2>/dev/null | head -50Length of output: 1938
🏁 Script executed:
#!/bin/bash
# Search for table drops in any migration
rg "drop table" --type sql -i | grep -i transactions_idsLength of output: 0
Permanent table creation without cleanup — will fail on re-run. COMMIT semantics are actually valid.
The review's claim about COMMIT failing is incorrect. DO blocks in PostgreSQL support COMMIT statements, and migration 18 in the same codebase demonstrates this exact pattern working correctly.
However, the permanent table issue is real and critical. This migration creates create table transactions_ids (permanent) instead of create temporary table (temporary), and never drops it. Compare to migration 18, which uses create temporary table logs_transactions and explicitly drop table logs_transactions at the end. Without cleanup, re-running this migration will fail.
Fix:
- Restore the commented temporary table approach:
create temporary table transactions_ids as - Add
drop table transactions_ids;before the finalendstatement (after the loop) - Remove the "todo: debug" comment since the intent is now clear
The COMMIT statements are fine and can remain.
🤖 Prompt for AI Agents
In internal/storage/bucket/migrations/17-moves-fill-transaction-id/up.sql around
lines 8-15 the migration mistakenly creates a permanent table transactions_ids
and leaves it behind (and has a "todo: debug" comment); change the statement
back to create temporary table transactions_ids as ..., remove the "todo: debug"
comment, and add a drop table transactions_ids; just before the final end (after
the processing loop) so the temp table is cleaned up on completion; leave
existing COMMIT statements as-is.
9c5f381 to
2af1898
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
internal/storage/bucket/migrations/18-transactions-fill-inserted-at/up.sql (1)
21-42: Critical: Off-by-one error in row_number-based pagination loop.Line 33 loops
for i in 0.._count-1 by _batch_size, butrow_number()in PostgreSQL is 1-indexed. Wheni=0, the filterrow_number >= i and row_number < i + _batch_size(line 37) matches rows 1–999 (only 999 rows), not the intended 1000. Subsequent batches are correct, but the first batch processes one fewer row, potentially leaving the first row unprocessed if the loop terminates early or batches don't align.Fix: Start the loop at 1 instead of 0:
- for i in 0.._count-1 by _batch_size loop + for i in 1.._count by _batch_size loopAlternatively, adjust the filter to:
- where row_number >= i and row_number < i + _batch_size + where row_number > i - 1 and row_number <= i + _batch_size - 1(The first solution is simpler.)
The
inserted_at is nullcondition on line 42 ensures idempotency, which is good.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
internal/storage/bucket/migrations/17-moves-fill-transaction-id/up.sql(1 hunks)internal/storage/bucket/migrations/18-transactions-fill-inserted-at/up.sql(2 hunks)
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: gfyrag
Repo: formancehq/ledger PR: 1017
File: internal/storage/bucket/migrations/27-fix-invalid-pcv/up.sql:30-31
Timestamp: 2025-07-19T18:35:34.260Z
Learning: In Formance ledger migrations, SHARE ROW EXCLUSIVE locks on migration-specific temporary tables like moves_view are acceptable since these tables are dedicated for the migration. ACCESS SHARE locks on transactions table don't block writes, only schema changes.
📚 Learning: 2025-07-19T18:35:34.260Z
Learnt from: gfyrag
Repo: formancehq/ledger PR: 1017
File: internal/storage/bucket/migrations/27-fix-invalid-pcv/up.sql:30-31
Timestamp: 2025-07-19T18:35:34.260Z
Learning: In Formance ledger migrations, SHARE ROW EXCLUSIVE locks on migration-specific temporary tables like moves_view are acceptable since these tables are dedicated for the migration. ACCESS SHARE locks on transactions table don't block writes, only schema changes.
Applied to files:
internal/storage/bucket/migrations/17-moves-fill-transaction-id/up.sql
📚 Learning: 2025-05-20T13:48:07.455Z
Learnt from: gfyrag
Repo: formancehq/ledger PR: 935
File: internal/controller/system/state_tracker.go:0-0
Timestamp: 2025-05-20T13:48:07.455Z
Learning: In the Formance ledger codebase, sequence reset queries with `select setval` don't require COALESCE around max(id) even for brand new ledgers, as the system handles this case properly.
Applied to files:
internal/storage/bucket/migrations/17-moves-fill-transaction-id/up.sqlinternal/storage/bucket/migrations/18-transactions-fill-inserted-at/up.sql
📚 Learning: 2025-04-29T11:24:28.923Z
Learnt from: gfyrag
Repo: formancehq/ledger PR: 892
File: internal/controller/ledger/controller_default.go:196-196
Timestamp: 2025-04-29T11:24:28.923Z
Learning: In the ledger Import function, it's critical to maintain proper log ID tracking by updating lastLogID with the current log.ID after each processed log, rather than setting it to nil. This ensures the system can properly validate the ordering of logs and prevent duplicate or out-of-order processing, which is essential for maintaining data integrity in the ledger.
Applied to files:
internal/storage/bucket/migrations/18-transactions-fill-inserted-at/up.sql
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: GoReleaser
- GitHub Check: Dirty
- GitHub Check: Tests
🔇 Additional comments (1)
internal/storage/bucket/migrations/17-moves-fill-transaction-id/up.sql (1)
8-13: Good: Temporary table creation with proper filtering.The previous review comment flagging permanent table creation has been addressed—the code now correctly uses
create temporary table(line 8) and filters ontransactions_id is null(line 13). Temporary tables are implicitly dropped when the DO block completes, so no explicit cleanup is needed. Thewhere transactions_id is nullcondition also ensures idempotency.
| for i in 0.._max-1 by _batch_size loop | ||
| with _rows as ( | ||
| select * | ||
| from moves | ||
| where transactions_id is null | ||
| limit _batch_size | ||
| from transactions_ids | ||
| where row_number >= i and row_number < i + 1000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical: Off-by-one error in row_number-based pagination loop.
Line 23 loops for i in 0.._max-1 by _batch_size, but row_number() in PostgreSQL is 1-indexed. When i=0, the filter row_number >= i and row_number < i + 1000 (line 27) matches rows 1–999 (only 999 rows), not 1000. The first batch will process one fewer row than intended.
Fix: Start the loop at 1 instead of 0:
- for i in 0.._max-1 by _batch_size loop
+ for i in 1.._max by _batch_size loopOr adjust the filter condition (less preferred):
- where row_number >= i and row_number < i + 1000
+ where row_number > i - 1 and row_number <= i + 1000 - 1(The first solution is simpler.)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| for i in 0.._max-1 by _batch_size loop | |
| with _rows as ( | |
| select * | |
| from moves | |
| where transactions_id is null | |
| limit _batch_size | |
| from transactions_ids | |
| where row_number >= i and row_number < i + 1000 | |
| for i in 1.._max by _batch_size loop | |
| with _rows as ( | |
| select * | |
| from transactions_ids | |
| where row_number >= i and row_number < i + 1000 |
🤖 Prompt for AI Agents
In internal/storage/bucket/migrations/17-moves-fill-transaction-id/up.sql around
lines 23–27, the pagination loop currently starts at 0 causing an off-by-one
because row_number() is 1-indexed; change the loop to start at 1 (e.g., for i in
1.._max by _batch_size) and ensure the WHERE filter uses the same batch size
bounds (row_number >= i AND row_number < i + _batch_size) so each batch
processes exactly _batch_size rows.
2af1898 to
1602b13
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
internal/storage/bucket/migrations/18-transactions-fill-inserted-at/up.sql (1)
33-37: Off-by-one error in row_number-based pagination loop.The loop starts at 0 (line 33), but PostgreSQL's
row_number()is 1-indexed. When i=0, the filterrow_number >= 0 and row_number < 0 + _batch_sizematches rows 1 to _batch_size-1 (only _batch_size - 1 rows, not _batch_size). This causes the first batch to process one fewer row than intended.Apply this diff to fix the off-by-one error:
- for i in 0.._count-1 by _batch_size loop + for i in 1.._count by _batch_size loopThis ensures each batch processes exactly _batch_size rows (1000), starting from the first 1-indexed row.
♻️ Duplicate comments (1)
internal/storage/bucket/migrations/17-moves-fill-transaction-id/up.sql (1)
23-27: Critical: Hardcoded batch size in WHERE clause contradicts _batch_size parameter.Line 27 uses a hardcoded filter
where row_number >= i and row_number < i + 1000, but _batch_size is set to 10000 (line 3). This causes only 1000 rows to be processed per batch instead of 10000, completely undermining the performance improvement from increasing _batch_size.Additionally, this reintroduces the off-by-one error flagged in the previous review: the loop starts at 0, but PostgreSQL's
row_number()is 1-indexed. When i=0, the conditionrow_number >= 0 and row_number < 1000matches rows 1–999 (999 rows, not 1000).Apply this diff to fix both issues:
- for i in 0.._max-1 by _batch_size loop + for i in 1.._max by _batch_size loop with _rows as ( select * from transactions_ids - where row_number >= i and row_number < i + 1000 + where row_number >= i and row_number < i + _batch_size )This ensures each batch processes exactly _batch_size rows (10000), and all rows are included without gaps.
🧹 Nitpick comments (1)
internal/storage/bucket/migrations/17-moves-fill-transaction-id/up.sql (1)
1-40: Add explicit cleanup of temporary table.While PostgreSQL will clean up the temporary table at session end, it's best practice to explicitly drop temporary tables used in migrations for clarity and to avoid resource leaks in long-running migration contexts.
Apply this diff to add explicit cleanup:
perform pg_notify('migrations-{{ .Schema }}', 'continue: ' || _batch_size); end loop; + + drop table transactions_ids; end $$
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
internal/storage/bucket/migrations/17-moves-fill-transaction-id/up.sql(1 hunks)internal/storage/bucket/migrations/18-transactions-fill-inserted-at/up.sql(2 hunks)
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: gfyrag
Repo: formancehq/ledger PR: 1017
File: internal/storage/bucket/migrations/27-fix-invalid-pcv/up.sql:30-31
Timestamp: 2025-07-19T18:35:34.260Z
Learning: In Formance ledger migrations, SHARE ROW EXCLUSIVE locks on migration-specific temporary tables like moves_view are acceptable since these tables are dedicated for the migration. ACCESS SHARE locks on transactions table don't block writes, only schema changes.
📚 Learning: 2025-07-19T18:35:34.260Z
Learnt from: gfyrag
Repo: formancehq/ledger PR: 1017
File: internal/storage/bucket/migrations/27-fix-invalid-pcv/up.sql:30-31
Timestamp: 2025-07-19T18:35:34.260Z
Learning: In Formance ledger migrations, SHARE ROW EXCLUSIVE locks on migration-specific temporary tables like moves_view are acceptable since these tables are dedicated for the migration. ACCESS SHARE locks on transactions table don't block writes, only schema changes.
Applied to files:
internal/storage/bucket/migrations/17-moves-fill-transaction-id/up.sql
📚 Learning: 2025-05-20T13:48:07.455Z
Learnt from: gfyrag
Repo: formancehq/ledger PR: 935
File: internal/controller/system/state_tracker.go:0-0
Timestamp: 2025-05-20T13:48:07.455Z
Learning: In the Formance ledger codebase, sequence reset queries with `select setval` don't require COALESCE around max(id) even for brand new ledgers, as the system handles this case properly.
Applied to files:
internal/storage/bucket/migrations/18-transactions-fill-inserted-at/up.sql
📚 Learning: 2025-04-29T11:24:28.923Z
Learnt from: gfyrag
Repo: formancehq/ledger PR: 892
File: internal/controller/ledger/controller_default.go:196-196
Timestamp: 2025-04-29T11:24:28.923Z
Learning: In the ledger Import function, it's critical to maintain proper log ID tracking by updating lastLogID with the current log.ID after each processed log, rather than setting it to nil. This ensures the system can properly validate the ordering of logs and prevent duplicate or out-of-order processing, which is essential for maintaining data integrity in the ledger.
Applied to files:
internal/storage/bucket/migrations/18-transactions-fill-inserted-at/up.sql
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: GoReleaser
- GitHub Check: Dirty
- GitHub Check: Tests
🔇 Additional comments (1)
internal/storage/bucket/migrations/18-transactions-fill-inserted-at/up.sql (1)
21-42: Verify and approve transaction-type filtering logic.The migration now filters logs by transaction type (
'NEW_TRANSACTION'or'REVERTED_TRANSACTION', line 24) and only updatesinserted_atwhen it is NULL (line 42). This change in filtering strategy from date-based to transaction-type based requires verification to ensure all relevant transactions are captured and the migration correctly handles edge cases.Please confirm:
- Does filtering by only these two transaction types capture all transactions that should have
inserted_atpopulated?- Are there any transactions with existing non-NULL
inserted_atvalues that should also be updated?- Does this align with the domain logic for transaction lifecycle in the ledger system?
1602b13 to
6dde0e0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
internal/storage/bucket/migrations/18-transactions-fill-inserted-at/up.sql (1)
33-42: Critical: Off-by-one error in batch pagination leaves final rows unprocessed.The loop logic has a critical flaw that can leave the last batch of rows with unfilled
inserted_atvalues:
- First batch undersizes:
row_number >= 0 and row_number < 1000matches only rows 1–999 (999 rows, not 1000).- Final batch skipped: With
for i in 0.._count-1 by _batch_size, if_count=2000, the loop runs at i=0 and i=1000, but never at i=2000. This leaves row 2000 unprocessed because row_number values are 1-based.Example trace with
_count=2000,_batch_size=1000:
- i=0: rows 1–999 processed ✗ (should be 1–1000)
- i=1000: rows 1000–1999 processed ✓
- i=2000 never reached (2000 > 1999) → row 2000 skipped ✗
Apply this diff to fix the pagination:
- for i in 0.._count-1 by _batch_size loop + for i in 1..(_count + _batch_size - 1) by _batch_size loop with _rows as ( select * from logs_transactions - where row_number >= i and row_number < i + _batch_size + where row_number >= i and row_number < i + _batch_size )Alternatively, if you prefer keeping i starting at 0:
- for i in 0.._count-1 by _batch_size loop + for i in 0..(_count - 1) by _batch_size loop with _rows as ( select * from logs_transactions - where row_number >= i and row_number < i + _batch_size + where row_number > i and row_number <= i + _batch_size )
♻️ Duplicate comments (1)
internal/storage/bucket/migrations/17-moves-fill-transaction-id/up.sql (1)
23-27: Critical: Off-by-one error in row_number-based loop — first batch processes 9999 rows instead of 10000.Line 23 starts the loop at
i = 0, butrow_number()is 1-indexed. Wheni=0, the filterrow_number >= 0 and row_number < 10000matches rows 1–9999 (only 9999 rows). The loop should start at 1 to align batch boundaries with row numbering.Apply this diff to fix the loop bounds:
- for i in 0.._max-1 by _batch_size loop + for i in 1.._max by _batch_size loop
🧹 Nitpick comments (1)
internal/storage/bucket/migrations/17-moves-fill-transaction-id/up.sql (1)
8-13: Add explicit DROP TABLE for temporary table cleanup.While PostgreSQL automatically drops temporary tables at transaction end, explicitly cleaning up with
drop table transactions_ids;is consistent with migration 18 in this PR (which dropslogs_transactionsat the end) and improves defensive coding.Apply this diff to add explicit cleanup before the final
end:end loop; + drop table transactions_ids; endAlso applies to: 37-39
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
internal/storage/bucket/migrations/17-moves-fill-transaction-id/up.sql(1 hunks)internal/storage/bucket/migrations/18-transactions-fill-inserted-at/up.sql(2 hunks)
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: gfyrag
Repo: formancehq/ledger PR: 1017
File: internal/storage/bucket/migrations/27-fix-invalid-pcv/up.sql:30-31
Timestamp: 2025-07-19T18:35:34.260Z
Learning: In Formance ledger migrations, SHARE ROW EXCLUSIVE locks on migration-specific temporary tables like moves_view are acceptable since these tables are dedicated for the migration. ACCESS SHARE locks on transactions table don't block writes, only schema changes.
📚 Learning: 2025-05-20T13:48:07.455Z
Learnt from: gfyrag
Repo: formancehq/ledger PR: 935
File: internal/controller/system/state_tracker.go:0-0
Timestamp: 2025-05-20T13:48:07.455Z
Learning: In the Formance ledger codebase, sequence reset queries with `select setval` don't require COALESCE around max(id) even for brand new ledgers, as the system handles this case properly.
Applied to files:
internal/storage/bucket/migrations/18-transactions-fill-inserted-at/up.sql
📚 Learning: 2025-04-29T11:24:28.923Z
Learnt from: gfyrag
Repo: formancehq/ledger PR: 892
File: internal/controller/ledger/controller_default.go:196-196
Timestamp: 2025-04-29T11:24:28.923Z
Learning: In the ledger Import function, it's critical to maintain proper log ID tracking by updating lastLogID with the current log.ID after each processed log, rather than setting it to nil. This ensures the system can properly validate the ordering of logs and prevent duplicate or out-of-order processing, which is essential for maintaining data integrity in the ledger.
Applied to files:
internal/storage/bucket/migrations/18-transactions-fill-inserted-at/up.sql
📚 Learning: 2025-07-19T18:35:34.260Z
Learnt from: gfyrag
Repo: formancehq/ledger PR: 1017
File: internal/storage/bucket/migrations/27-fix-invalid-pcv/up.sql:30-31
Timestamp: 2025-07-19T18:35:34.260Z
Learning: In Formance ledger migrations, SHARE ROW EXCLUSIVE locks on migration-specific temporary tables like moves_view are acceptable since these tables are dedicated for the migration. ACCESS SHARE locks on transactions table don't block writes, only schema changes.
Applied to files:
internal/storage/bucket/migrations/17-moves-fill-transaction-id/up.sql
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: GoReleaser
- GitHub Check: Dirty
- GitHub Check: Tests
🔇 Additional comments (1)
internal/storage/bucket/migrations/18-transactions-fill-inserted-at/up.sql (1)
21-26: Indexing strategy verified and consistent across migrations.The row_number-based pagination approach with indexed included columns aligns correctly with migration 17. Both migrations use identical patterns: sequential row numbers via window functions with indexes optimized for range-scan pagination. This is sound for batch processing performance.
6dde0e0 to
2c876d5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
internal/storage/bucket/migrations/31-fix-transaction-updated-at/up.sql (1)
11-11: Logic is correct; minor code style inconsistency.The row_number() pagination and update logic are correct. However, Line 29 uses inconsistent spacing (
_offset+_batch_sizewithout spaces) compared to the style in related migrations (e.g.,_offset + _batch_sizein file 29). Consider aligning spacing for consistency across the migration suite.Apply this minor formatting adjustment for consistency:
- where row_number >= _offset and row_number < _offset+_batch_size + where row_number >= _offset and row_number < _offset + _batch_sizeAlso applies to: 29-29
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (4)
internal/storage/bucket/migrations/29-fix-invalid-metadata-on-reverts/up.sql(2 hunks)internal/storage/bucket/migrations/31-fix-transaction-updated-at/up.sql(2 hunks)internal/storage/bucket/migrations/32-fix-log-data-for-reverted-transactions/up.sql(2 hunks)internal/storage/bucket/migrations/33-fix-invalid-date-format/up.sql(2 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: gfyrag
Repo: formancehq/ledger PR: 1017
File: internal/storage/bucket/migrations/27-fix-invalid-pcv/up.sql:30-31
Timestamp: 2025-07-19T18:35:34.260Z
Learning: In Formance ledger migrations, SHARE ROW EXCLUSIVE locks on migration-specific temporary tables like moves_view are acceptable since these tables are dedicated for the migration. ACCESS SHARE locks on transactions table don't block writes, only schema changes.
📚 Learning: 2025-05-20T13:48:07.455Z
Learnt from: gfyrag
Repo: formancehq/ledger PR: 935
File: internal/controller/system/state_tracker.go:0-0
Timestamp: 2025-05-20T13:48:07.455Z
Learning: In the Formance ledger codebase, sequence reset queries with `select setval` don't require COALESCE around max(id) even for brand new ledgers, as the system handles this case properly.
Applied to files:
internal/storage/bucket/migrations/33-fix-invalid-date-format/up.sqlinternal/storage/bucket/migrations/32-fix-log-data-for-reverted-transactions/up.sql
📚 Learning: 2025-04-29T11:24:28.923Z
Learnt from: gfyrag
Repo: formancehq/ledger PR: 892
File: internal/controller/ledger/controller_default.go:196-196
Timestamp: 2025-04-29T11:24:28.923Z
Learning: In the ledger Import function, it's critical to maintain proper log ID tracking by updating lastLogID with the current log.ID after each processed log, rather than setting it to nil. This ensures the system can properly validate the ordering of logs and prevent duplicate or out-of-order processing, which is essential for maintaining data integrity in the ledger.
Applied to files:
internal/storage/bucket/migrations/32-fix-log-data-for-reverted-transactions/up.sql
🔇 Additional comments (3)
internal/storage/bucket/migrations/29-fix-invalid-metadata-on-reverts/up.sql (1)
20-20: Row_number-based pagination improves performance vs OFFSET/LIMIT.The switch from offset/limit paging to row_number() windowing is a sound performance optimization for large datasets, as it avoids expensive row-skipping operations. The pagination is keyed on transactions.seq for deterministic ordering, which is appropriate for batch processing.
internal/storage/bucket/migrations/33-fix-invalid-date-format/up.sql (1)
19-19: Logic is sound; verify date formatting intent vs. migration 32.The row_number-based pagination is correct and the temporary table strategy is appropriate. However, this migration uses
to_json(data.reverted_at)#>>'{}' || 'Z'to format dates (lines 45–47), while migration 32 uses direct column reference (data.reverted_at) for similar fields. Confirm whether this divergence is intentional or an inconsistency.Please verify that the date formatting difference (JSON extraction + 'Z' suffix in migration 33 vs. direct column reference in migration 32) reflects intentional behavioral differences or should be reconciled for consistency.
Also applies to: 37-37
internal/storage/bucket/migrations/32-fix-log-data-for-reverted-transactions/up.sql (1)
19-20: Core pagination logic is correct; note date formatting consistency with migration 33.The row_number() windowing and pagination are implemented correctly. The direct use of date columns (lines 46–48:
data.reverted_at,data.inserted_at,data.timestamp) differs from the JSON extraction + 'Z' suffix approach in migration 33. Ensure this difference is intentional or reconcile if both should use the same format.Cross-check date handling with migration 33 to confirm the formatting divergence is by design.
Also applies to: 38-38
No description provided.