Skip to content

Conversation

@gfyrag
Copy link
Contributor

@gfyrag gfyrag commented Nov 24, 2025

No description provided.

@gfyrag gfyrag requested a review from a team as a code owner November 24, 2025 14:36
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 24, 2025

Walkthrough

Multiple bucket storage migrations replaced OFFSET/LIMIT paging with deterministic row_number()‑based batching (and associated index/selection changes), introduced a temporary enumerated table for one migration, and one migration was converted to an effectively no-op by commenting out its SQL body.

Changes

Cohort / File(s) Summary
Transactions inserted_at fill (migration 18)
internal/storage/bucket/migrations/18-transactions-fill-inserted-at/up.sql
Removed _date and date-bound filtering; logs_transactions now computes row_number() OVER (ORDER BY ledger, id) and restricts source logs to type IN ('NEW_TRANSACTION','REVERTED_TRANSACTION'); replaced OFFSET/LIMIT paging with WHERE row_number >= i AND row_number < i + _batch_size; update predicate adds inserted_at IS NULL; index changed to (row_number) INCLUDE (ledger,date,transaction_id).
Moves -> transactions_id batching (migration 17)
internal/storage/bucket/migrations/17-moves-fill-transaction-id/up.sql
Creates temporary transactions_ids enumerating missing mappings via row_number() and index; uses the temp table for batch counts; processes updates in row_number-based batches using a FOR loop; updates moves.transactions_id by joining on moves_seq; commits and pg_notify per batch; removes legacy loop control and final NOT NULL constraint changes.
PCV missing asset fix (migration 28)
internal/storage/bucket/migrations/28-fix-pcv-missing-asset/up.sql
Adds row_number() OVER (ORDER BY transactions_seq) in moves_view and replaces OFFSET/LIMIT pagination with WHERE row_number >= _offset AND row_number < _offset + _batch_size, keeping aggregation and join logic intact.
Fix invalid metadata on reverts (migration 29)
internal/storage/bucket/migrations/29-fix-invalid-metadata-on-reverts/up.sql
Adds row_number() in txs_view and replaces offset/limit pagination with row_number range filtering (where row_number >= _offset and row_number < _offset + _batch_size) for batch processing; preserves existing update loop and logic.
Fix transaction updated_at (migration 31)
internal/storage/bucket/migrations/31-fix-transaction-updated-at/up.sql
Introduces row_number() in txs_view and replaces OFFSET/LIMIT paging with row_number range filters for batched updates that set updated_at from inserted_at.
Fix log data for reverted transactions (migration 32)
internal/storage/bucket/migrations/32-fix-log-data-for-reverted-transactions/up.sql
Adds row_number() in txs_view, exposes log_id, and replaces ORDER BY/OFFSET/LIMIT paging with row_number-based batching (where row_number >= _offset and row_number < _offset + _batch_size).
Fix invalid date format (migration 33)
internal/storage/bucket/migrations/33-fix-invalid-date-format/up.sql
Adds row_number() in txs_view (plus log_id) and migrates pagination to row_number range filtering instead of OFFSET/LIMIT; aligns subsequent batch selection accordingly.
Disabled/No-op migration (migration 27)
internal/storage/bucket/migrations/27-fix-invalid-pcv/up.sql
Main migration SQL body commented out; the DO block remains but substantive operations are inert, making the migration effectively a no-op.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Runner as MigrationRunner
    participant DB as Database
    rect rgba(214,234,248,0.35)
    note over Runner,DB: General pattern — compute row_number(), batch by row_number range, update/insert per batch
    end
    Runner->>DB: WITH view AS (\n  SELECT ..., row_number() OVER (ORDER BY <deterministic_key>) AS row_number\n  FROM ... WHERE ...\n)\nSELECT COUNT(*) / prepare index
    DB-->>Runner: view/index ready
    loop For each batch (row_number window)
        Runner->>DB: SELECT rows WHERE row_number >= start AND row_number < start + batch_size
        DB-->>Runner: batch rows
        Runner->>DB: UPDATE/INSERT ... WHERE row_number BETWEEN ...
        DB-->>Runner: affected rows
        Runner->>DB: COMMIT (optional) and PERFORM pg_notify(...) (optional)
    end
Loading
sequenceDiagram
    autonumber
    participant Runner as MigrationRunner
    participant DB as Database
    rect rgba(253,235,208,0.35)
    note over Runner,DB: Migration 17 — build temp enumerated table then batch-update via join to temp table
    end
    Runner->>DB: CREATE TEMP TABLE transactions_ids AS\n  SELECT moves.seq AS moves_seq, transactions.id AS id,\n         row_number() OVER (ORDER BY moves.seq) AS row_number\n  FROM moves LEFT JOIN transactions ON ...\n  WHERE moves.transactions_id IS NULL
    DB-->>Runner: temp table created
    Runner->>DB: CREATE INDEX ON transactions_ids(row_number) INCLUDE (...)\nANALYZE transactions_ids
    loop For each batch (row_number window)
        Runner->>DB: SELECT batch rows FROM transactions_ids WHERE row_number BETWEEN start AND end
        DB-->>Runner: batch rows
        Runner->>DB: UPDATE moves SET transactions_id = t.id FROM transactions_ids t WHERE moves.seq = t.moves_seq AND t.row_number BETWEEN ...
        DB-->>Runner: update result
        Runner->>DB: COMMIT\nPERFORM pg_notify(...)
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Verify ORDER BY keys used with row_number() are deterministic and stable across migrations.
  • Confirm batch window boundaries to avoid skipping/duplicating rows (off-by-one risks).
  • Review index choices and ANALYZE usage (migration 17) for performance and planner behavior.
  • Inspect migration 27 being commented out to ensure it was intentionally disabled.

Suggested reviewers

  • flemzord

Poem

🐰 I counted rows by moonlit code and song,

I hopped through batches tidy, swift, and long,
I nudged timestamps, joined maps that were astray,
Committed softly, sent a tiny yay,
A happy hop — the DB hums along.

Pre-merge checks and finishing touches

❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Description check ❓ Inconclusive No pull request description was provided by the author, only the title was given. Add a description explaining the motivation behind the performance improvements, specific benefits of row_number-based pagination, and any testing performed to validate the changes.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'experiment: improve migration performance' accurately describes the main changes across multiple migration files that replace offset/limit pagination with row_number-based windowing functions.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch experiment/speedup-migration

Comment @coderabbitai help to get the list of available commands and usage tips.

@gfyrag gfyrag force-pushed the experiment/speedup-migration branch from dc84997 to c67137b Compare November 24, 2025 14:37
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
internal/storage/bucket/migrations/18-transactions-fill-inserted-at/up.sql (1)

26-26: Index on logs_transactions may not be efficiently utilized.

The index is created on (ledger, transaction_id) include (date), but the CTE filters the temporary table using row_number >= i and row_number < i + _batch_size. Since row_number is not indexed, this range filter executes as a full scan of the temporary table, and the index on (ledger, transaction_id) is not used in the CTE predicate.

While this is acceptable for a temporary migration table (which should be small after type filtering), consider whether an index on row_number or reordering the logic would be more efficient. For now, this is likely not a blocking issue, but it may warrant measurement.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 10fbf2b and c67137b.

📒 Files selected for processing (1)
  • internal/storage/bucket/migrations/18-transactions-fill-inserted-at/up.sql (2 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: gfyrag
Repo: formancehq/ledger PR: 1017
File: internal/storage/bucket/migrations/27-fix-invalid-pcv/up.sql:30-31
Timestamp: 2025-07-19T18:35:34.260Z
Learning: In Formance ledger migrations, SHARE ROW EXCLUSIVE locks on migration-specific temporary tables like moves_view are acceptable since these tables are dedicated for the migration. ACCESS SHARE locks on transactions table don't block writes, only schema changes.
📚 Learning: 2025-05-20T13:48:07.455Z
Learnt from: gfyrag
Repo: formancehq/ledger PR: 935
File: internal/controller/system/state_tracker.go:0-0
Timestamp: 2025-05-20T13:48:07.455Z
Learning: In the Formance ledger codebase, sequence reset queries with `select setval` don't require COALESCE around max(id) even for brand new ledgers, as the system handles this case properly.

Applied to files:

  • internal/storage/bucket/migrations/18-transactions-fill-inserted-at/up.sql
📚 Learning: 2025-04-29T11:24:28.923Z
Learnt from: gfyrag
Repo: formancehq/ledger PR: 892
File: internal/controller/ledger/controller_default.go:196-196
Timestamp: 2025-04-29T11:24:28.923Z
Learning: In the ledger Import function, it's critical to maintain proper log ID tracking by updating lastLogID with the current log.ID after each processed log, rather than setting it to nil. This ensures the system can properly validate the ordering of logs and prevent duplicate or out-of-order processing, which is essential for maintaining data integrity in the ledger.

Applied to files:

  • internal/storage/bucket/migrations/18-transactions-fill-inserted-at/up.sql
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Tests
  • GitHub Check: GoReleaser
  • GitHub Check: Dirty
🔇 Additional comments (3)
internal/storage/bucket/migrations/18-transactions-fill-inserted-at/up.sql (3)

33-42: Row_number pagination with intermediate commits is safe.

The use of row_number for deterministic pagination is a good improvement over offset/limit, especially for large datasets. Since the temporary table logs_transactions is immutable during the loop and row_number is computed once during table creation, intermediate commits at line 44 do not affect row numbering consistency. This approach ensures batches are processed deterministically even if the migration is retried.

The guard condition at line 42 (and inserted_at is null) correctly prevents re-updating rows in case the migration is run multiple times, making it idempotent.


8-8: Trigger management approach is sound.

The migration correctly drops and recreates the update_transaction_metadata_history trigger with a condition on the metadata column. This prevents unnecessary trigger invocations for non-metadata updates (e.g., inserted_at). Based on learnings, ACCESS SHARE locks on the transactions table and SHARE ROW EXCLUSIVE locks on temporary tables are acceptable for this migration context.

Also applies to: 14-18


22-24: The transaction type filtering is intentional and correct.

The query filters for only NEW_TRANSACTION and REVERTED_TRANSACTION because these are the only log types that:

  1. Contain a 'transaction' object in their JSON data structure
  2. Create or affect transaction records (which need inserted_at populated)

Other types like SET_METADATA and DELETE_METADATA modify transaction metadata but don't create transaction records—they use 'targetType' and 'targetId' instead of embedding a 'transaction' object. Attempting to extract (data->'transaction'->>'id') from these types would return NULL, defeating the migration's purpose.

The filtering aligns with the business logic shown in the trigger handlers (migrations 0 and 4), where each type branches to different operations: transaction creation/reversion vs. metadata management.

@codecov
Copy link

codecov bot commented Nov 24, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 82.17%. Comparing base (10fbf2b) to head (c537715).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1157      +/-   ##
==========================================
+ Coverage   82.13%   82.17%   +0.04%     
==========================================
  Files         190      190              
  Lines        9319     9319              
==========================================
+ Hits         7654     7658       +4     
+ Misses       1213     1211       -2     
+ Partials      452      450       -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@gfyrag gfyrag force-pushed the experiment/speedup-migration branch 4 times, most recently from a0c512f to a70c5c4 Compare November 25, 2025 08:38
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
internal/storage/bucket/migrations/18-transactions-fill-inserted-at/up.sql (1)

22-24: Consider using IN operator for type filtering instead of OR.

The WHERE clause uses OR which may prevent index usage on the logs.type column. For better query optimization, consider using the IN operator:

-		where type = 'NEW_TRANSACTION' or type = 'REVERTED_TRANSACTION';
+		where type IN ('NEW_TRANSACTION', 'REVERTED_TRANSACTION');

This provides the same semantics but is more amenable to index optimization and is idiomatic in SQL.

internal/storage/bucket/migrations/17-moves-fill-transaction-id/up.sql (1)

21-21: TODO comment indicates incomplete work.

Line 21 contains a TODO comment (-- todo: add in go libs) that suggests future work to move this logic into a Go library. Please clarify:

  1. Is this TODO part of the current scope, or is it deferred?
  2. Should this TODO be tracked in an issue or pull request?

If you'd like, I can help create a follow-up issue to track this work.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 0911627 and a70c5c4.

📒 Files selected for processing (2)
  • internal/storage/bucket/migrations/17-moves-fill-transaction-id/up.sql (1 hunks)
  • internal/storage/bucket/migrations/18-transactions-fill-inserted-at/up.sql (2 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: gfyrag
Repo: formancehq/ledger PR: 1017
File: internal/storage/bucket/migrations/27-fix-invalid-pcv/up.sql:30-31
Timestamp: 2025-07-19T18:35:34.260Z
Learning: In Formance ledger migrations, SHARE ROW EXCLUSIVE locks on migration-specific temporary tables like moves_view are acceptable since these tables are dedicated for the migration. ACCESS SHARE locks on transactions table don't block writes, only schema changes.
📚 Learning: 2025-05-20T13:48:07.455Z
Learnt from: gfyrag
Repo: formancehq/ledger PR: 935
File: internal/controller/system/state_tracker.go:0-0
Timestamp: 2025-05-20T13:48:07.455Z
Learning: In the Formance ledger codebase, sequence reset queries with `select setval` don't require COALESCE around max(id) even for brand new ledgers, as the system handles this case properly.

Applied to files:

  • internal/storage/bucket/migrations/18-transactions-fill-inserted-at/up.sql
  • internal/storage/bucket/migrations/17-moves-fill-transaction-id/up.sql
📚 Learning: 2025-04-29T11:24:28.923Z
Learnt from: gfyrag
Repo: formancehq/ledger PR: 892
File: internal/controller/ledger/controller_default.go:196-196
Timestamp: 2025-04-29T11:24:28.923Z
Learning: In the ledger Import function, it's critical to maintain proper log ID tracking by updating lastLogID with the current log.ID after each processed log, rather than setting it to nil. This ensures the system can properly validate the ordering of logs and prevent duplicate or out-of-order processing, which is essential for maintaining data integrity in the ledger.

Applied to files:

  • internal/storage/bucket/migrations/18-transactions-fill-inserted-at/up.sql
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: GoReleaser
  • GitHub Check: Tests
  • GitHub Check: Dirty
🔇 Additional comments (4)
internal/storage/bucket/migrations/18-transactions-fill-inserted-at/up.sql (2)

26-26: Index strategy aligns with batch-processing pattern.

The index on (row_number) include (...) is well-suited for the batching strategy used in the loop (lines 33–47). The covering index eliminates the need to access the base table for the batch range queries.


33-42: Batch paging logic and safety predicate look correct.

The row_number–based batching (row_number >= i and row_number < i + _batch_size) provides deterministic, non-overlapping batches without reliance on date ordering or OFFSET. The additional inserted_at is null predicate (line 42) is a good defensive check that prevents re-updating already-filled rows and guards against idempotency issues.

internal/storage/bucket/migrations/17-moves-fill-transaction-id/up.sql (2)

8-13: Verify join condition and table schema.

The join on transactions.seq = moves.transactions_seq (line 12) assumes that moves.transactions_seq exists and is properly aligned with transactions.seq. Additionally, the INNER JOIN (implicit) filters to only moves with valid transaction matches. Please confirm:

  1. The moves.transactions_seq field exists and has the correct domain/type matching transactions.seq.
  2. Whether an INNER JOIN is intentional (excluding orphaned moves rows) or if a LEFT JOIN might be safer for data integrity validation.

25-39: Batch processing structure is sound.

The per-batch UPDATE with row_number–based range queries ensures small, deterministic batches without long-running transactions. The COMMIT inside the loop (line 36) prevents transaction bloat and allows concurrent reads between batches. The pattern aligns well with migration 18 and supports the performance optimization goal.

@gfyrag gfyrag force-pushed the experiment/speedup-migration branch from a70c5c4 to 9c5f381 Compare November 25, 2025 08:51
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between a70c5c4 and 9c5f381.

📒 Files selected for processing (2)
  • internal/storage/bucket/migrations/17-moves-fill-transaction-id/up.sql (1 hunks)
  • internal/storage/bucket/migrations/18-transactions-fill-inserted-at/up.sql (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • internal/storage/bucket/migrations/18-transactions-fill-inserted-at/up.sql
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: gfyrag
Repo: formancehq/ledger PR: 1017
File: internal/storage/bucket/migrations/27-fix-invalid-pcv/up.sql:30-31
Timestamp: 2025-07-19T18:35:34.260Z
Learning: In Formance ledger migrations, SHARE ROW EXCLUSIVE locks on migration-specific temporary tables like moves_view are acceptable since these tables are dedicated for the migration. ACCESS SHARE locks on transactions table don't block writes, only schema changes.
📚 Learning: 2025-05-20T13:48:07.455Z
Learnt from: gfyrag
Repo: formancehq/ledger PR: 935
File: internal/controller/system/state_tracker.go:0-0
Timestamp: 2025-05-20T13:48:07.455Z
Learning: In the Formance ledger codebase, sequence reset queries with `select setval` don't require COALESCE around max(id) even for brand new ledgers, as the system handles this case properly.

Applied to files:

  • internal/storage/bucket/migrations/17-moves-fill-transaction-id/up.sql
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Tests
  • GitHub Check: GoReleaser
  • GitHub Check: Dirty
🔇 Additional comments (2)
internal/storage/bucket/migrations/17-moves-fill-transaction-id/up.sql (2)

17-17: Verify index placement and table lifecycle.

The index on transactions_ids (line 17) will be created on a permanent table if the earlier issue is not fixed. If the table is permanent and persists across runs, the index creation will fail on re-runs.

Confirm the intended table scope (temporary vs. permanent) before verifying index handling.


25-39: Row-number batch processing logic is sound, but depends on transaction handling fix.

The batch loop correctly uses row_number-based paging to process records in chunks of _batch_size, which aligns with the performance optimization goals mentioned in the PR. However, this depends on resolving the COMMIT statement issue identified above.

Once transaction handling is clarified, verify that the batch window calculation (where row_number >= i and row_number < i + _batch_size) correctly covers all _max rows without gaps or overlaps, especially at boundaries.

Comment on lines 8 to 15
-- todo: debug
--create temporary table transactions_ids as
create table transactions_ids as
select row_number() over (order by transactions.seq) as row_number,
moves.seq as moves_seq, transactions.id, transactions.seq as transactions_seq
from moves
where transactions_id is null
join transactions on transactions.seq = moves.transactions_seq
where transactions_id is null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

Critical: Transaction control and permanent table creation incompatible.

Line 36 contains COMMIT; inside a PL/pgSQL block, which will likely fail or break migration atomicity. Additionally, the commented "todo: debug" (line 8) and shift from create temporary table (commented) to create table (line 10) suggests incomplete refactoring. Permanent tables are not cleaned up and will cause failures on re-runs.

PL/pgSQL does not support standard COMMIT statements in most PostgreSQL contexts. Either:

  1. Use create temporary table (as originally intended, per the comment)
  2. Perform cleanup with drop table if exists transactions_ids
  3. Move transaction boundaries outside the PL/pgSQL block if per-batch commits are required

Apply this diff to use a temporary table (recommended for a migration):

- 		-- todo: debug
- 		--create temporary table transactions_ids as
- 		create table transactions_ids as
+ 		create temporary table transactions_ids as

Then remove the commit; statement on line 36, as migrations should be atomic:

- 		commit;
-
  		perform pg_notify('migrations-{{ .Schema }}', 'continue: ' || _batch_size);

If per-batch commits are genuinely required (e.g., to prevent long locks), you will need to refactor this as separate transaction batches outside the PL/pgSQL block, or verify your PostgreSQL version and migration framework support autonomous transactions.


I need to verify the claims in the review comment. Let me examine the actual migration file and verify PostgreSQL's PL/pgSQL transaction semantics.

Now let me examine the actual migration file to determine whether it uses a DO block or a function definition:


🏁 Script executed:

#!/bin/bash
# Examine the full migration file structure
find . -name "17-moves-fill-transaction-id" -o -name "*17*moves*fill*" -o -path "*/migrations/*17*.sql" | head -20

Length of output: 309


🏁 Script executed:

#!/bin/bash
# Search for the migration file
find . -path "*/internal/storage/bucket/migrations/*" -name "*17*.sql" 2>/dev/null

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Broader search in migrations directory
fd -e sql "17.*moves.*fill|moves.*fill.*17" --max-one-result

Length of output: 233


🏁 Script executed:

#!/bin/bash
# Search more broadly
fd ".*migrations.*" -t d | head -5

Length of output: 52


🏁 Script executed:

#!/bin/bash
# List migration files
fd -t f "\.sql$" "migrations" | grep -i "17.*moves\|moves.*17" | head -5

Length of output: 97


🏁 Script executed:

#!/bin/bash
# Find the exact file
find . -type f -name "*.sql" 2>/dev/null | xargs grep -l "create.*transactions_ids" 2>/dev/null | head -5

Length of output: 72


🏁 Script executed:

#!/bin/bash
# Get all migration files in order
ls -la internal/storage/bucket/migrations/ 2>/dev/null | grep -E "^-.*\.sql$|^d"

Length of output: 3266


🏁 Script executed:

#!/bin/bash
# Check if internal/storage/bucket/migrations dir exists and list files
if [ -d "internal/storage/bucket/migrations" ]; then
  echo "=== Migrations directory found ==="
  ls -1 internal/storage/bucket/migrations/ | sort
else
  echo "Directory not found. Searching for migrations..."
  find . -type d -name migrations 2>/dev/null | head -5
fi

Length of output: 1189


🏁 Script executed:

#!/bin/bash
# Find the migration file and output full content
find . -type f -path "*/migrations/17*" -name "*.sql" 2>/dev/null | while read file; do
  echo "=== File: $file ==="
  cat "$file"
done

Length of output: 1384


🏁 Script executed:

#!/bin/bash
# Search in all SQL files for the specific transaction_ids table creation
rg -l "transactions_ids" --type sql

Length of output: 70


🏁 Script executed:

#!/bin/bash
# Get the actual content with context
rg "create.*table.*transactions_ids" -A 20 -B 5 --type sql

Length of output: 2743


🏁 Script executed:

#!/bin/bash
# Look for DO blocks and COMMIT statements together
rg "DO\s*\$\$|^COMMIT" --type sql -n | head -30

Length of output: 66


🏁 Script executed:

#!/bin/bash
# Search for the full migration file structure
rg "transactions_id" -A 30 -B 5 --type sql 2>/dev/null | head -100

Length of output: 10027


🏁 Script executed:

#!/bin/bash
# Direct path check
cat internal/storage/bucket/migrations/17-moves-fill-transaction-id/up.sql 2>/dev/null || echo "File not found at expected path"

Length of output: 1098


🏁 Script executed:

#!/bin/bash
# Try with different path variations
find . -name "*17*" -path "*migrations*" -name "*.sql" 2>/dev/null

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Check the structure more carefully
find . -type f -name "up.sql" 2>/dev/null | xargs grep -l "transactions_ids" 2>/dev/null

Length of output: 72


🏁 Script executed:

#!/bin/bash
# Output migration file content if found
for file in $(find . -type f -name "up.sql" 2>/dev/null | xargs grep -l "transactions_ids" 2>/dev/null); do
  echo "=== File: $file ==="
  head -50 "$file"
done

Length of output: 1185


Now let me check for the DOWN migration and verify the table cleanup strategy:


🏁 Script executed:

#!/bin/bash
# Check for down.sql migration file
ls -la internal/storage/bucket/migrations/17-moves-fill-transaction-id/

Length of output: 308


🏁 Script executed:

#!/bin/bash
# Check if there's a down.sql file
cat internal/storage/bucket/migrations/17-moves-fill-transaction-id/down.sql 2>/dev/null || echo "No down.sql found"

Length of output: 17


🏁 Script executed:

#!/bin/bash
# Search for any DROP of transactions_ids table
rg "drop.*transactions_ids|transactions_ids.*drop" --type sql -i

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Check migration 18 to see if it handles cleanup
cat internal/storage/bucket/migrations/18-transactions-fill-inserted-at/up.sql 2>/dev/null | head -50

Length of output: 1938


🏁 Script executed:

#!/bin/bash
# Search for table drops in any migration
rg "drop table" --type sql -i | grep -i transactions_ids

Length of output: 0


Permanent table creation without cleanup — will fail on re-run. COMMIT semantics are actually valid.

The review's claim about COMMIT failing is incorrect. DO blocks in PostgreSQL support COMMIT statements, and migration 18 in the same codebase demonstrates this exact pattern working correctly.

However, the permanent table issue is real and critical. This migration creates create table transactions_ids (permanent) instead of create temporary table (temporary), and never drops it. Compare to migration 18, which uses create temporary table logs_transactions and explicitly drop table logs_transactions at the end. Without cleanup, re-running this migration will fail.

Fix:

  1. Restore the commented temporary table approach: create temporary table transactions_ids as
  2. Add drop table transactions_ids; before the final end statement (after the loop)
  3. Remove the "todo: debug" comment since the intent is now clear

The COMMIT statements are fine and can remain.

🤖 Prompt for AI Agents
In internal/storage/bucket/migrations/17-moves-fill-transaction-id/up.sql around
lines 8-15 the migration mistakenly creates a permanent table transactions_ids
and leaves it behind (and has a "todo: debug" comment); change the statement
back to create temporary table transactions_ids as ..., remove the "todo: debug"
comment, and add a drop table transactions_ids; just before the final end (after
the processing loop) so the temp table is cleaned up on completion; leave
existing COMMIT statements as-is.

@gfyrag gfyrag force-pushed the experiment/speedup-migration branch from 9c5f381 to 2af1898 Compare November 25, 2025 08:59
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
internal/storage/bucket/migrations/18-transactions-fill-inserted-at/up.sql (1)

21-42: Critical: Off-by-one error in row_number-based pagination loop.

Line 33 loops for i in 0.._count-1 by _batch_size, but row_number() in PostgreSQL is 1-indexed. When i=0, the filter row_number >= i and row_number < i + _batch_size (line 37) matches rows 1–999 (only 999 rows), not the intended 1000. Subsequent batches are correct, but the first batch processes one fewer row, potentially leaving the first row unprocessed if the loop terminates early or batches don't align.

Fix: Start the loop at 1 instead of 0:

-		for i in 0.._count-1 by _batch_size loop
+		for i in 1.._count by _batch_size loop

Alternatively, adjust the filter to:

-			where row_number >= i and row_number < i + _batch_size
+			where row_number > i - 1 and row_number <= i + _batch_size - 1

(The first solution is simpler.)

The inserted_at is null condition on line 42 ensures idempotency, which is good.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 9c5f381 and 2af1898.

📒 Files selected for processing (2)
  • internal/storage/bucket/migrations/17-moves-fill-transaction-id/up.sql (1 hunks)
  • internal/storage/bucket/migrations/18-transactions-fill-inserted-at/up.sql (2 hunks)
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: gfyrag
Repo: formancehq/ledger PR: 1017
File: internal/storage/bucket/migrations/27-fix-invalid-pcv/up.sql:30-31
Timestamp: 2025-07-19T18:35:34.260Z
Learning: In Formance ledger migrations, SHARE ROW EXCLUSIVE locks on migration-specific temporary tables like moves_view are acceptable since these tables are dedicated for the migration. ACCESS SHARE locks on transactions table don't block writes, only schema changes.
📚 Learning: 2025-07-19T18:35:34.260Z
Learnt from: gfyrag
Repo: formancehq/ledger PR: 1017
File: internal/storage/bucket/migrations/27-fix-invalid-pcv/up.sql:30-31
Timestamp: 2025-07-19T18:35:34.260Z
Learning: In Formance ledger migrations, SHARE ROW EXCLUSIVE locks on migration-specific temporary tables like moves_view are acceptable since these tables are dedicated for the migration. ACCESS SHARE locks on transactions table don't block writes, only schema changes.

Applied to files:

  • internal/storage/bucket/migrations/17-moves-fill-transaction-id/up.sql
📚 Learning: 2025-05-20T13:48:07.455Z
Learnt from: gfyrag
Repo: formancehq/ledger PR: 935
File: internal/controller/system/state_tracker.go:0-0
Timestamp: 2025-05-20T13:48:07.455Z
Learning: In the Formance ledger codebase, sequence reset queries with `select setval` don't require COALESCE around max(id) even for brand new ledgers, as the system handles this case properly.

Applied to files:

  • internal/storage/bucket/migrations/17-moves-fill-transaction-id/up.sql
  • internal/storage/bucket/migrations/18-transactions-fill-inserted-at/up.sql
📚 Learning: 2025-04-29T11:24:28.923Z
Learnt from: gfyrag
Repo: formancehq/ledger PR: 892
File: internal/controller/ledger/controller_default.go:196-196
Timestamp: 2025-04-29T11:24:28.923Z
Learning: In the ledger Import function, it's critical to maintain proper log ID tracking by updating lastLogID with the current log.ID after each processed log, rather than setting it to nil. This ensures the system can properly validate the ordering of logs and prevent duplicate or out-of-order processing, which is essential for maintaining data integrity in the ledger.

Applied to files:

  • internal/storage/bucket/migrations/18-transactions-fill-inserted-at/up.sql
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: GoReleaser
  • GitHub Check: Dirty
  • GitHub Check: Tests
🔇 Additional comments (1)
internal/storage/bucket/migrations/17-moves-fill-transaction-id/up.sql (1)

8-13: Good: Temporary table creation with proper filtering.

The previous review comment flagging permanent table creation has been addressed—the code now correctly uses create temporary table (line 8) and filters on transactions_id is null (line 13). Temporary tables are implicitly dropped when the DO block completes, so no explicit cleanup is needed. The where transactions_id is null condition also ensures idempotency.

Comment on lines 23 to 27
for i in 0.._max-1 by _batch_size loop
with _rows as (
select *
from moves
where transactions_id is null
limit _batch_size
from transactions_ids
where row_number >= i and row_number < i + 1000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: Off-by-one error in row_number-based pagination loop.

Line 23 loops for i in 0.._max-1 by _batch_size, but row_number() in PostgreSQL is 1-indexed. When i=0, the filter row_number >= i and row_number < i + 1000 (line 27) matches rows 1–999 (only 999 rows), not 1000. The first batch will process one fewer row than intended.

Fix: Start the loop at 1 instead of 0:

-		for i in 0.._max-1 by _batch_size loop
+		for i in 1.._max by _batch_size loop

Or adjust the filter condition (less preferred):

-			where row_number >= i and row_number < i + 1000
+			where row_number > i - 1 and row_number <= i + 1000 - 1

(The first solution is simpler.)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for i in 0.._max-1 by _batch_size loop
with _rows as (
select *
from moves
where transactions_id is null
limit _batch_size
from transactions_ids
where row_number >= i and row_number < i + 1000
for i in 1.._max by _batch_size loop
with _rows as (
select *
from transactions_ids
where row_number >= i and row_number < i + 1000
🤖 Prompt for AI Agents
In internal/storage/bucket/migrations/17-moves-fill-transaction-id/up.sql around
lines 23–27, the pagination loop currently starts at 0 causing an off-by-one
because row_number() is 1-indexed; change the loop to start at 1 (e.g., for i in
1.._max by _batch_size) and ensure the WHERE filter uses the same batch size
bounds (row_number >= i AND row_number < i + _batch_size) so each batch
processes exactly _batch_size rows.

@gfyrag gfyrag force-pushed the experiment/speedup-migration branch from 2af1898 to 1602b13 Compare November 25, 2025 09:08
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
internal/storage/bucket/migrations/18-transactions-fill-inserted-at/up.sql (1)

33-37: Off-by-one error in row_number-based pagination loop.

The loop starts at 0 (line 33), but PostgreSQL's row_number() is 1-indexed. When i=0, the filter row_number >= 0 and row_number < 0 + _batch_size matches rows 1 to _batch_size-1 (only _batch_size - 1 rows, not _batch_size). This causes the first batch to process one fewer row than intended.

Apply this diff to fix the off-by-one error:

-		for i in 0.._count-1 by _batch_size loop
+		for i in 1.._count by _batch_size loop

This ensures each batch processes exactly _batch_size rows (1000), starting from the first 1-indexed row.

♻️ Duplicate comments (1)
internal/storage/bucket/migrations/17-moves-fill-transaction-id/up.sql (1)

23-27: Critical: Hardcoded batch size in WHERE clause contradicts _batch_size parameter.

Line 27 uses a hardcoded filter where row_number >= i and row_number < i + 1000, but _batch_size is set to 10000 (line 3). This causes only 1000 rows to be processed per batch instead of 10000, completely undermining the performance improvement from increasing _batch_size.

Additionally, this reintroduces the off-by-one error flagged in the previous review: the loop starts at 0, but PostgreSQL's row_number() is 1-indexed. When i=0, the condition row_number >= 0 and row_number < 1000 matches rows 1–999 (999 rows, not 1000).

Apply this diff to fix both issues:

-		for i in 0.._max-1 by _batch_size loop
+		for i in 1.._max by _batch_size loop
 			with _rows as (
 				select *
 				from transactions_ids
-				where row_number >= i and row_number < i + 1000
+				where row_number >= i and row_number < i + _batch_size
 			)

This ensures each batch processes exactly _batch_size rows (10000), and all rows are included without gaps.

🧹 Nitpick comments (1)
internal/storage/bucket/migrations/17-moves-fill-transaction-id/up.sql (1)

1-40: Add explicit cleanup of temporary table.

While PostgreSQL will clean up the temporary table at session end, it's best practice to explicitly drop temporary tables used in migrations for clarity and to avoid resource leaks in long-running migration contexts.

Apply this diff to add explicit cleanup:

 		perform pg_notify('migrations-{{ .Schema }}', 'continue: ' || _batch_size);
 	end loop;
+
+	drop table transactions_ids;
 end
 $$
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 2af1898 and 1602b13.

📒 Files selected for processing (2)
  • internal/storage/bucket/migrations/17-moves-fill-transaction-id/up.sql (1 hunks)
  • internal/storage/bucket/migrations/18-transactions-fill-inserted-at/up.sql (2 hunks)
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: gfyrag
Repo: formancehq/ledger PR: 1017
File: internal/storage/bucket/migrations/27-fix-invalid-pcv/up.sql:30-31
Timestamp: 2025-07-19T18:35:34.260Z
Learning: In Formance ledger migrations, SHARE ROW EXCLUSIVE locks on migration-specific temporary tables like moves_view are acceptable since these tables are dedicated for the migration. ACCESS SHARE locks on transactions table don't block writes, only schema changes.
📚 Learning: 2025-07-19T18:35:34.260Z
Learnt from: gfyrag
Repo: formancehq/ledger PR: 1017
File: internal/storage/bucket/migrations/27-fix-invalid-pcv/up.sql:30-31
Timestamp: 2025-07-19T18:35:34.260Z
Learning: In Formance ledger migrations, SHARE ROW EXCLUSIVE locks on migration-specific temporary tables like moves_view are acceptable since these tables are dedicated for the migration. ACCESS SHARE locks on transactions table don't block writes, only schema changes.

Applied to files:

  • internal/storage/bucket/migrations/17-moves-fill-transaction-id/up.sql
📚 Learning: 2025-05-20T13:48:07.455Z
Learnt from: gfyrag
Repo: formancehq/ledger PR: 935
File: internal/controller/system/state_tracker.go:0-0
Timestamp: 2025-05-20T13:48:07.455Z
Learning: In the Formance ledger codebase, sequence reset queries with `select setval` don't require COALESCE around max(id) even for brand new ledgers, as the system handles this case properly.

Applied to files:

  • internal/storage/bucket/migrations/18-transactions-fill-inserted-at/up.sql
📚 Learning: 2025-04-29T11:24:28.923Z
Learnt from: gfyrag
Repo: formancehq/ledger PR: 892
File: internal/controller/ledger/controller_default.go:196-196
Timestamp: 2025-04-29T11:24:28.923Z
Learning: In the ledger Import function, it's critical to maintain proper log ID tracking by updating lastLogID with the current log.ID after each processed log, rather than setting it to nil. This ensures the system can properly validate the ordering of logs and prevent duplicate or out-of-order processing, which is essential for maintaining data integrity in the ledger.

Applied to files:

  • internal/storage/bucket/migrations/18-transactions-fill-inserted-at/up.sql
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: GoReleaser
  • GitHub Check: Dirty
  • GitHub Check: Tests
🔇 Additional comments (1)
internal/storage/bucket/migrations/18-transactions-fill-inserted-at/up.sql (1)

21-42: Verify and approve transaction-type filtering logic.

The migration now filters logs by transaction type ('NEW_TRANSACTION' or 'REVERTED_TRANSACTION', line 24) and only updates inserted_at when it is NULL (line 42). This change in filtering strategy from date-based to transaction-type based requires verification to ensure all relevant transactions are captured and the migration correctly handles edge cases.

Please confirm:

  1. Does filtering by only these two transaction types capture all transactions that should have inserted_at populated?
  2. Are there any transactions with existing non-NULL inserted_at values that should also be updated?
  3. Does this align with the domain logic for transaction lifecycle in the ledger system?

@gfyrag gfyrag force-pushed the experiment/speedup-migration branch from 1602b13 to 6dde0e0 Compare November 25, 2025 09:15
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
internal/storage/bucket/migrations/18-transactions-fill-inserted-at/up.sql (1)

33-42: Critical: Off-by-one error in batch pagination leaves final rows unprocessed.

The loop logic has a critical flaw that can leave the last batch of rows with unfilled inserted_at values:

  1. First batch undersizes: row_number >= 0 and row_number < 1000 matches only rows 1–999 (999 rows, not 1000).
  2. Final batch skipped: With for i in 0.._count-1 by _batch_size, if _count=2000, the loop runs at i=0 and i=1000, but never at i=2000. This leaves row 2000 unprocessed because row_number values are 1-based.

Example trace with _count=2000, _batch_size=1000:

  • i=0: rows 1–999 processed ✗ (should be 1–1000)
  • i=1000: rows 1000–1999 processed ✓
  • i=2000 never reached (2000 > 1999) → row 2000 skipped ✗

Apply this diff to fix the pagination:

-	for i in 0.._count-1 by _batch_size loop
+	for i in 1..(_count + _batch_size - 1) by _batch_size loop
 		with _rows as (
 			select *
 			from logs_transactions
-			where row_number >= i and row_number < i + _batch_size
+			where row_number >= i and row_number < i + _batch_size
 		)

Alternatively, if you prefer keeping i starting at 0:

-	for i in 0.._count-1 by _batch_size loop
+	for i in 0..(_count - 1) by _batch_size loop
 		with _rows as (
 			select *
 			from logs_transactions
-			where row_number >= i and row_number < i + _batch_size
+			where row_number > i and row_number <= i + _batch_size
 		)
♻️ Duplicate comments (1)
internal/storage/bucket/migrations/17-moves-fill-transaction-id/up.sql (1)

23-27: Critical: Off-by-one error in row_number-based loop — first batch processes 9999 rows instead of 10000.

Line 23 starts the loop at i = 0, but row_number() is 1-indexed. When i=0, the filter row_number >= 0 and row_number < 10000 matches rows 1–9999 (only 9999 rows). The loop should start at 1 to align batch boundaries with row numbering.

Apply this diff to fix the loop bounds:

-		for i in 0.._max-1 by _batch_size loop
+		for i in 1.._max by _batch_size loop
🧹 Nitpick comments (1)
internal/storage/bucket/migrations/17-moves-fill-transaction-id/up.sql (1)

8-13: Add explicit DROP TABLE for temporary table cleanup.

While PostgreSQL automatically drops temporary tables at transaction end, explicitly cleaning up with drop table transactions_ids; is consistent with migration 18 in this PR (which drops logs_transactions at the end) and improves defensive coding.

Apply this diff to add explicit cleanup before the final end:

  		end loop;
+		drop table transactions_ids;
  	end

Also applies to: 37-39

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 1602b13 and 6dde0e0.

📒 Files selected for processing (2)
  • internal/storage/bucket/migrations/17-moves-fill-transaction-id/up.sql (1 hunks)
  • internal/storage/bucket/migrations/18-transactions-fill-inserted-at/up.sql (2 hunks)
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: gfyrag
Repo: formancehq/ledger PR: 1017
File: internal/storage/bucket/migrations/27-fix-invalid-pcv/up.sql:30-31
Timestamp: 2025-07-19T18:35:34.260Z
Learning: In Formance ledger migrations, SHARE ROW EXCLUSIVE locks on migration-specific temporary tables like moves_view are acceptable since these tables are dedicated for the migration. ACCESS SHARE locks on transactions table don't block writes, only schema changes.
📚 Learning: 2025-05-20T13:48:07.455Z
Learnt from: gfyrag
Repo: formancehq/ledger PR: 935
File: internal/controller/system/state_tracker.go:0-0
Timestamp: 2025-05-20T13:48:07.455Z
Learning: In the Formance ledger codebase, sequence reset queries with `select setval` don't require COALESCE around max(id) even for brand new ledgers, as the system handles this case properly.

Applied to files:

  • internal/storage/bucket/migrations/18-transactions-fill-inserted-at/up.sql
📚 Learning: 2025-04-29T11:24:28.923Z
Learnt from: gfyrag
Repo: formancehq/ledger PR: 892
File: internal/controller/ledger/controller_default.go:196-196
Timestamp: 2025-04-29T11:24:28.923Z
Learning: In the ledger Import function, it's critical to maintain proper log ID tracking by updating lastLogID with the current log.ID after each processed log, rather than setting it to nil. This ensures the system can properly validate the ordering of logs and prevent duplicate or out-of-order processing, which is essential for maintaining data integrity in the ledger.

Applied to files:

  • internal/storage/bucket/migrations/18-transactions-fill-inserted-at/up.sql
📚 Learning: 2025-07-19T18:35:34.260Z
Learnt from: gfyrag
Repo: formancehq/ledger PR: 1017
File: internal/storage/bucket/migrations/27-fix-invalid-pcv/up.sql:30-31
Timestamp: 2025-07-19T18:35:34.260Z
Learning: In Formance ledger migrations, SHARE ROW EXCLUSIVE locks on migration-specific temporary tables like moves_view are acceptable since these tables are dedicated for the migration. ACCESS SHARE locks on transactions table don't block writes, only schema changes.

Applied to files:

  • internal/storage/bucket/migrations/17-moves-fill-transaction-id/up.sql
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: GoReleaser
  • GitHub Check: Dirty
  • GitHub Check: Tests
🔇 Additional comments (1)
internal/storage/bucket/migrations/18-transactions-fill-inserted-at/up.sql (1)

21-26: Indexing strategy verified and consistent across migrations.

The row_number-based pagination approach with indexed included columns aligns correctly with migration 17. Both migrations use identical patterns: sequential row numbers via window functions with indexes optimized for range-scan pagination. This is sound for batch processing performance.

@gfyrag gfyrag force-pushed the experiment/speedup-migration branch from 6dde0e0 to 2c876d5 Compare November 25, 2025 12:42
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
internal/storage/bucket/migrations/31-fix-transaction-updated-at/up.sql (1)

11-11: Logic is correct; minor code style inconsistency.

The row_number() pagination and update logic are correct. However, Line 29 uses inconsistent spacing (_offset+_batch_size without spaces) compared to the style in related migrations (e.g., _offset + _batch_size in file 29). Consider aligning spacing for consistency across the migration suite.

Apply this minor formatting adjustment for consistency:

-				where row_number >= _offset and row_number < _offset+_batch_size
+				where row_number >= _offset and row_number < _offset + _batch_size

Also applies to: 29-29

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between c537715 and 3644c91.

📒 Files selected for processing (4)
  • internal/storage/bucket/migrations/29-fix-invalid-metadata-on-reverts/up.sql (2 hunks)
  • internal/storage/bucket/migrations/31-fix-transaction-updated-at/up.sql (2 hunks)
  • internal/storage/bucket/migrations/32-fix-log-data-for-reverted-transactions/up.sql (2 hunks)
  • internal/storage/bucket/migrations/33-fix-invalid-date-format/up.sql (2 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: gfyrag
Repo: formancehq/ledger PR: 1017
File: internal/storage/bucket/migrations/27-fix-invalid-pcv/up.sql:30-31
Timestamp: 2025-07-19T18:35:34.260Z
Learning: In Formance ledger migrations, SHARE ROW EXCLUSIVE locks on migration-specific temporary tables like moves_view are acceptable since these tables are dedicated for the migration. ACCESS SHARE locks on transactions table don't block writes, only schema changes.
📚 Learning: 2025-05-20T13:48:07.455Z
Learnt from: gfyrag
Repo: formancehq/ledger PR: 935
File: internal/controller/system/state_tracker.go:0-0
Timestamp: 2025-05-20T13:48:07.455Z
Learning: In the Formance ledger codebase, sequence reset queries with `select setval` don't require COALESCE around max(id) even for brand new ledgers, as the system handles this case properly.

Applied to files:

  • internal/storage/bucket/migrations/33-fix-invalid-date-format/up.sql
  • internal/storage/bucket/migrations/32-fix-log-data-for-reverted-transactions/up.sql
📚 Learning: 2025-04-29T11:24:28.923Z
Learnt from: gfyrag
Repo: formancehq/ledger PR: 892
File: internal/controller/ledger/controller_default.go:196-196
Timestamp: 2025-04-29T11:24:28.923Z
Learning: In the ledger Import function, it's critical to maintain proper log ID tracking by updating lastLogID with the current log.ID after each processed log, rather than setting it to nil. This ensures the system can properly validate the ordering of logs and prevent duplicate or out-of-order processing, which is essential for maintaining data integrity in the ledger.

Applied to files:

  • internal/storage/bucket/migrations/32-fix-log-data-for-reverted-transactions/up.sql
🔇 Additional comments (3)
internal/storage/bucket/migrations/29-fix-invalid-metadata-on-reverts/up.sql (1)

20-20: Row_number-based pagination improves performance vs OFFSET/LIMIT.

The switch from offset/limit paging to row_number() windowing is a sound performance optimization for large datasets, as it avoids expensive row-skipping operations. The pagination is keyed on transactions.seq for deterministic ordering, which is appropriate for batch processing.

internal/storage/bucket/migrations/33-fix-invalid-date-format/up.sql (1)

19-19: Logic is sound; verify date formatting intent vs. migration 32.

The row_number-based pagination is correct and the temporary table strategy is appropriate. However, this migration uses to_json(data.reverted_at)#>>'{}' || 'Z' to format dates (lines 45–47), while migration 32 uses direct column reference (data.reverted_at) for similar fields. Confirm whether this divergence is intentional or an inconsistency.

Please verify that the date formatting difference (JSON extraction + 'Z' suffix in migration 33 vs. direct column reference in migration 32) reflects intentional behavioral differences or should be reconciled for consistency.

Also applies to: 37-37

internal/storage/bucket/migrations/32-fix-log-data-for-reverted-transactions/up.sql (1)

19-20: Core pagination logic is correct; note date formatting consistency with migration 33.

The row_number() windowing and pagination are implemented correctly. The direct use of date columns (lines 46–48: data.reverted_at, data.inserted_at, data.timestamp) differs from the JSON extraction + 'Z' suffix approach in migration 33. Ensure this difference is intentional or reconcile if both should use the same format.

Cross-check date handling with migration 33 to confirm the formatting divergence is by design.

Also applies to: 38-38

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants