Skip to content

feat: implement streaming export for large databases#99

Open
769066112-ops wants to merge 1 commit intoouterbase:mainfrom
769066112-ops:fix/streaming-export-large-databases
Open

feat: implement streaming export for large databases#99
769066112-ops wants to merge 1 commit intoouterbase:mainfrom
769066112-ops:fix/streaming-export-large-databases

Conversation

@769066112-ops
Copy link

Summary

Fixes #59 — Database dumps do not work on large databases.

Problem

All export endpoints (/export/dump, /export/csv/:table, /export/json/:table) load the entire dataset into memory before creating the response. For large databases (up to 10GB on Durable Objects), this causes:

  • Memory overflow (OOM)
  • Request timeouts (30-second Cloudflare Workers limit)
  • Partial or failed exports

Solution

Replace in-memory buffering with ReadableStream-based streaming combined with paginated database queries (LIMIT/OFFSET, 1000 rows per page).

Changes

src/export/index.ts — New streaming utilities:

  • forEachPage() — Paginated query helper that fetches rows in batches via LIMIT/OFFSET and invokes a callback per page
  • tableExists() — Lightweight table existence check
  • createStreamingExportResponse() — Creates a chunked-transfer Response from a ReadableStream
  • Original getTableData() and createExportResponse() preserved for backward compatibility

src/export/dump.ts — Streaming SQL dump:

  • Streams schema + INSERT statements per table
  • Each table's rows are fetched page-by-page, never holding the full dataset in memory

src/export/csv.ts — Streaming CSV export:

  • Streams header row + data rows with proper CSV escaping
  • Extracted escapeCsvValue() helper for cleaner escaping logic

src/export/json.ts — Streaming JSON export:

  • Streams a valid JSON array incrementally ([, row, ,\n, row, ... ])
  • Each page of rows is serialized and enqueued immediately

Design Decisions

  • Page size of 1000 rows — Balances memory usage vs. DB round-trips
  • LIMIT/OFFSET pagination — Simple, works with any SQLite table, no cursor/rowid dependency
  • No external dependencies — Uses Web Streams API (native to Cloudflare Workers)
  • Backward compatible — Original helper functions retained; no changes to route signatures in handler.ts

Testing

The streaming approach can be verified by:

  1. Creating a large test table (e.g., 1M+ rows)
  2. Hitting /export/dump, /export/csv/:table, /export/json/:table
  3. Confirming the response streams back without timeout or OOM

Bounty: $250 · Issue #59

Replace in-memory data loading with ReadableStream-based streaming
for all export endpoints (SQL dump, CSV, JSON).

Changes:
- Add forEachPage() helper for paginated LIMIT/OFFSET queries (1000 rows/page)
- Add tableExists() and createStreamingExportResponse() utilities
- Rewrite dump.ts to stream SQL dump page-by-page per table
- Rewrite csv.ts to stream CSV rows with proper escaping
- Rewrite json.ts to stream JSON array incrementally
- Preserve backward-compatible getTableData() and createExportResponse()

This prevents memory overflow and request timeouts (30s limit) when
exporting large databases (up to 10GB on Durable Objects).

Fixes outerbase#59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Database dumps do not work on large databases

1 participant