feat: Add support for ignoring tables during Zero Cache replication #4727

Karavil · 2025-08-12T21:41:36Z

Summary

Adds support for ignoring specific tables from PostgreSQL publications during replication. Tables in IGNORED_PUBLICATION_TABLES get created in SQLite but stay empty - all changes are dropped. Useful for excluding audit logs or other high-volume tables.

Stored the ignored tables in the database with publications in InternalShardConfig. They define the replication boundary together - publications control what Postgres sends, ignored tables control what Zero Cache accepts.

Important: Changing ignored tables triggers a full resync (like changing publications). This prevents stale data from remaining in SQLite when a table becomes ignored.

Design Decision

Ignored tables are like publications - both define what data replicates. Publications say what Postgres sends, ignored tables say what Zero Cache keeps. If nodes disagree on either during deployment, you get inconsistent data (some nodes have audit_logs, others don't).

Wrestled with where to store the config:

Option 1: In-memory from env vars (problematic)
================================================

PostgreSQL publishes: [users, logs, temp]
                           ↓

Rolling deployment:
┌─────────────┐    ┌─────────────┐    ┌─────────────┐
│   Node A    │    │   Node B    │    │   Node C    │
│   (v1.0)    │    │   (v1.0)    │    │   (v1.1)    │ ← new version\!
│             │    │             │    │             │
│ ENV: logs   │    │ ENV: logs   │    │ ENV: logs,  │
│              │    │              │    │      temp   │
└─────────────┘    └─────────────┘    └─────────────┘
      ↓                   ↓                   ↓
Replicates:          Replicates:          Replicates:
[users, temp]        [users, temp]        [users]      ← INCONSISTENT\!


Option 2: Database storage (what I did)
========================================

PostgreSQL publishes: [users, logs, temp]
                           ↓
                    ┌──────────────┐
                    │  shardConfig │
                    │  ignored:     │
                    │  [logs]       │
                    └──────────────┘
                           ↓

Rolling deployment (all nodes read same config):
┌─────────────┐    ┌─────────────┐    ┌─────────────┐
│   Node A    │    │   Node B    │    │   Node C    │
│   (v1.0)    │    │   (v1.0)    │    │   (v1.1)    │
└─────────────┘    └─────────────┘    └─────────────┘
      ↓                   ↓                   ↓
Replicates:          Replicates:          Replicates:
[users, temp]        [users, temp]        [users, temp] ← CONSISTENT\!

The database approach ensures all nodes agree on what to replicate, even mid-deployment. This matters because ignored tables (like publications) are replication config, not app config. App config affects how a node runs (ports, log levels) - fine to vary. Replication config affects what data exists - must be consistent or you get different data on different nodes.

First PR here, might be missing context. Happy to refactor if storing in database seems overkill - just felt like ignored tables belong with publications since they both define the replication boundary.

Testing

export ZERO_UPSTREAM_DB="postgresql://user:pass@localhost:5432/mydb"
export ZERO_IGNORED_PUBLICATION_TABLES='["public.audit_logs"]'

Check:

Logs show "Skipping initial sync for ignored table"
Tables exist but empty
Changes don't replicate to ignored tables
Changing ignored tables triggers resync: "Dropping shard to change ignored tables"

Adds ZERO_IGNORED_PUBLICATION_TABLES environment variable to exclude specific tables from replication while preserving schema compatibility. Key features: • 🎯 Tables are created but remain empty (schema preserved) • ⚡ Initial sync skips data copying for ignored tables • 🔄 Replication changes are dropped for ignored tables • ✅ Works because SQLite foreign keys are disabled by default Usage: export ZERO_IGNORED_PUBLICATION_TABLES='["audit_logs", "staging.imports"]' This minimal implementation (~50 lines) provides table filtering without modifying SQL queries, making it safer and simpler than deep integration.

vercel · 2025-08-12T21:41:40Z

@Karavil is attempting to deploy a commit to the Rocicorp Team on Vercel.

A member of the Team first needs to authorize it.

- Add #isTableIgnored() helper method in ChangeMaker - Add helper functions in initial-sync for table filtering - Remove redundant checks across all operations - Remove useless test file that was testing Set behavior

- Create ignored-tables.ts with shared utilities - Remove duplicated table expansion logic - Single source of truth for table filtering logic

- Remove confusing auto-expansion of table names - Use direct Set matching: 'users' matches any schema, 'public.users' matches specific - Eliminates bugs with table names containing dots - Much simpler and more predictable behavior

- Require schema.table format (e.g., 'public.users') - Add validation to reject simple table names - Remove ambiguity - you must specify exactly which schema - Simpler implementation with only exact matches

- Remove class method redefinition (#isTableIgnored) - Use shared isTableIgnored function from ignored-tables.ts - Pass ignoredTables Set as parameter for all calls - Improves code consistency and maintainability

- Move ignoredTables from Zero config layer to shard config in database - Eliminates layering violation where pg abstraction imported Zero config - Add ignoredTables to InternalShardConfig schema and shardConfig table - Include migration (v11) to add column for existing shards - Pass ignoredTables from Zero config during shard initialization - ChangeMaker and initial-sync now read from InternalShardConfig - Provides consistency across distributed nodes - Cleaner architecture with proper separation of concerns

…sing - Add .map() transform to internalShardConfigSchema to convert array to Set - Remove buildIgnoredTablesSet function as it's no longer needed - ChangeMaker and initial-sync now use the Set directly from config - More efficient - Set is created once during schema parsing - Cleaner code with less manual conversion

- Add check to compare requested vs replicated ignored tables - Drop shard and throw AutoResetSignal when mismatch detected - Prevents stale data from remaining in SQLite when tables become ignored - Follows same pattern as publication changes

- Use equals() from set-utils instead of deepEqual with arrays - Create Sets from both sides for proper Set comparison - Cleaner and more consistent with existing Set operations in codebase

- Use .optional(() => []) pattern consistently for default empty array - Remove unnecessary fallbacks since value is always defined - ShardConfig.ignoredTables is now always an array, never undefined - Cleaner code without || [] checks everywhere

- Filter out ignored tables before processing instead of inside map - Cleaner code - no fake Promise.resolve for ignored tables - More efficient - only process tables that actually need copying - Same behavior - ignored tables still logged and contribute 0 to totals

- Remove obvious comments that just repeat what the code does - Keep comments that explain why or provide important context

- Code is self-explanatory, no need for comment

Added three focused integration tests to verify ignored tables behavior: • Ignored tables excluded from initial sync - verifies tables are created but remain empty • Changes to ignored tables are dropped - confirms replication filters out changes • AutoReset on changed ignored tables - ensures resync when config changes Tests verify the minimal implementation approach where filtering happens at the application level rather than modifying SQL queries.

Successfully added three comprehensive integration tests that verify: • Ignored tables are created but remain empty during initial sync • Changes to ignored tables are properly dropped during replication • Configuration changes to ignored tables trigger a full resync All tests pass on PostgreSQL 16 using testcontainers. The tests confirm that the minimal implementation correctly filters data at the application level while maintaining table schema for consistency.

• Refactored startReplication to use optional object parameter • Updated all test calls to use { ignoredTables: [...] } pattern • Added ignoredTables to all ShardConfig objects in initial-sync tests test: add comprehensive initial-sync tests for ignored tables Added two targeted tests directly in initial-sync.pg-test.ts: • 'ignored tables are created but not synced' - verifies that: - Regular tables get their data synced during initial sync - Ignored tables are created but remain empty - Ignored tables configuration is persisted in shardConfig • 'multiple ignored tables' - verifies that: - Multiple tables can be ignored simultaneously - Only non-ignored tables receive data during sync - All tables (ignored and regular) have their schema created These tests verify the behavior at the initialSync function level, complementing the integration tests in change-source.pg-test.ts.

Implements comprehensive support for ignoring specific tables from PostgreSQL publication-based replication. Key features: • Tables defined in IGNORED_PUBLICATION_TABLES env var are created but data is skipped • Changes to ignored tables are dropped during replication • Changing the ignored tables list triggers automatic full resync • Named arguments pattern for better API ergonomics Tests added: • Ignored tables excluded from initial sync • Changes to ignored tables are dropped during replication • AutoReset triggered when ignored tables list changes • Multiple ignored tables handled correctly • Integration tests in both change-source and initial-sync Implementation follows application-level filtering approach for compatibility with existing publication infrastructure.

Added tests for complex scenarios involving ignored tables and publications: • Ignored table with row filter in publication - ensures ignored takes precedence • Ignored table in multiple publications - verifies table stays empty regardless • Exact table name matching - confirms no partial matching (test_table vs test_table_2) • Schema qualification - tests ignoring tables in specific schemas only All tests verify that ignored tables are created but remain empty, even when: - Row filters would normally allow some data through - Multiple publications reference the same table - Table names are similar but not exact matches - Tables with same name exist in different schemas

…ion-tables

Made ignoredTables optional in ShardConfig to avoid breaking existing code: • Changed ShardConfig type to make ignoredTables optional • Updated all usages to handle undefined with fallback to empty array/set • Fixed PostgreSQL ARRAY[] type casting with explicit ::TEXT[] cast • Fixed duplicate import in init.ts • Added missing ignoredPublicationTables to pusher.test.ts config All type checks and tests now pass successfully.

Re-enabled the change-source/pg test suite that was previously skipped. All tests pass including the new ignored tables functionality tests.

Updated documentation in multiple places: • Added ZERO_APP_IGNORED_PUBLICATION_TABLES to zbugs/.env.example • Enhanced zero-config.ts description with env var name and resync note • Created Configuration section in zero-cache README with detailed usage Documentation covers: • Environment variable format (JSON array) • Requirement for fully qualified table names • Behavior (tables created but empty) • Use cases (audit logs, temp data, analytics) • Important note about full resync on changes

Added ZERO_APP_IGNORED_PUBLICATION_TABLES environment variable wherever ZERO_APP_PUBLICATIONS is referenced: • GitHub Actions workflows (prod, sandbox, gigabugs) • SST config for deployment • zbugs .env.example file • Simplified config description to be concise

Restored comprehensive documentation in zero-config.ts including: • Clear format and examples • Multiple use cases (audit logs, temp data, analytics, etc.) • Important notes about schema qualification and resync behavior This documentation helps users understand how to effectively use the feature.

The ignoredTables field should remain optional in the ShardConfig when ignoredPublicationTables is not provided in the config. This ensures backward compatibility and cleaner type handling. Using conditional spread operator to only include ignoredTables when ignoredPublicationTables is present in the config.

This repository uses npm, not bun. The bun.lock file was accidentally added.

The migration intentionally leaves ignoredTables empty to trigger a resync. This ensures the SQLite replica is completely rebuilt without stale data from newly-ignored tables.

Fixed migration rocicorp#11 to use sql() wrapper for proper identifier quoting. Also updated test expectations to match new schema version 11.

Karavil · 2025-08-14T03:47:01Z

packages/zero-cache/src/services/change-source/pg/change-source.pg-test.ts

 const SHARD_NUM = 1;

-describe.skip('change-source/pg', {timeout: 30000, retry: 3}, () => {
+describe('change-source/pg', {timeout: 30000, retry: 3}, () => {


ah, whoops? not sure why this was being skipped... I can undo it? but it seemed useful to run these tests.

aboodman · 2025-08-14T03:48:28Z

Curious about this design decision:

Tables in IGNORED_PUBLICATION_TABLES get created in SQLite but stay empty - all changes are dropped.

Why create the table in SQLite?

aboodman · 2025-08-14T03:52:53Z

Thank you very much for the contribution. Exciting!

@darkgnotic is best person to review this but he is on vacation right now. That is why this hasn't been reviewed so far.

From a product pov can you explain a bit more why the publication approach is hard to use? I believe you that it is, I'm just curious...

Is it because you don't want to have to remember to update the PG publication when you add a table? Why is it not possible to maintain the publication as part of the rest of your schema management?

Karavil · 2025-08-14T03:56:20Z

Curious about this design decision:

Tables in IGNORED_PUBLICATION_TABLES get created in SQLite but stay empty - all changes are dropped.

Why create the table in SQLite?

I was mostly worried about:

Client schema defined, table A exists
Zero deploy, table A added to ignore list, table no longer exists in sqlite
Client breaks?

Not super committed to this behavior though—can make changes!

@aboodman

Karavil · 2025-08-14T04:11:41Z

Thank you very much for the contribution. Exciting!

@darkgnotic is best person to review this but he is on vacation right now. That is why this hasn't been reviewed so far.

From a product pov can you explain a bit more why the publication approach is hard to use? I believe you that it is, I'm just curious...

Is it because you don't want to have to remember to update the PG publication when you add a table? Why is it not possible to maintain the publication as part of the rest of your schema management?

Honestly, it's just more to think about! We have to use custom publications right now because of the sheer amount of data we have on file (initial syncs take ages if we don't do this). And I like that Zero is simple! I'd rather not create and maintain a publication for a few tables I want to ignore.

I do agree that a better abstraction around generating publications could work for this (we use Drizzle Zero; it would plug in nicely there). But it's just another point of friction for experimenting with Zero. Now I have to migrate my database before I can deploy Zero!

@aboodman

Karavil · 2025-08-14T04:15:23Z

Thank you very much for the contribution. Exciting!
@darkgnotic is best person to review this but he is on vacation right now. That is why this hasn't been reviewed so far.
From a product pov can you explain a bit more why the publication approach is hard to use? I believe you that it is, I'm just curious...
Is it because you don't want to have to remember to update the PG publication when you add a table? Why is it not possible to maintain the publication as part of the rest of your schema management?

Honestly, it's just more to think about! We have to use custom publications right now because of the sheer amount of data we have on file (initial syncs take ages if we don't do this). And I like that Zero is simple! I'd rather not create and maintain a publication for a few tables I want to ignore.

I do agree that a better abstraction around generating publications could work for this (we use Drizzle Zero; it would plug in nicely there). But it's just another point of friction for experimenting with Zero. Now I have to migrate my database before I can deploy Zero!

@aboodman

I could also see this playing nicely with your cloud offering in the future? It'd be pretty much plug and play:

-> Create a publication for all tables
-> Select tables to ignore (don't worry, you can edit this later!)
-> Zero instance ready

Replaced 'as string[]' type casts with proper ShardConfig type annotations in test files. This follows TypeScript best practices and improves type safety.

Karavil · 2025-08-14T04:27:28Z

The naming here is also a bit weird. Should it be ZERO_APP_IGNORED_TABLES instead of ZERO_APP_PUBLICATION_IGNORED_TABLES? Naming it that way (without the Postgres context) would imply that this has to be supported for all databases in the future, so I was wary of it. Happy to change it though.

darkgnotic · 2025-08-19T09:34:18Z

Hi @Karavil. Thank you for a great proposal and well thought out implementation. I appreciate (and agree with) the design decisions you detailed in the PR description.

I also agree that this would be a useful feature to provide in a cloud offering, and went through the exercise of what that might look like at a high level, were we to implement it.

At the end of the day, the problem boils down to a deficiency in the Postgres API (e.g. for CREATE PUBLICATION). However, Postgres does provide a way to achieve this: you can create an EVENT TRIGGER on the CREATE TABLE event, and add the table to your publication (when desired) in the triggered function.

The advantage of implementing it at the Postgres layer is that:

There would be less code and logic to maintain. It would a matter of adding a table to a publication, rather than intercepting multiple replication points / commands.
It would be more efficient, by avoiding the bandwidth and serialization cost of the data of ignored tables (which, as you point out, can be large).

Would you be up for trying this approach? And if you're willing, we'd love to figure out how to make it available for other users, whether it be through documented setup examples, a cli, or something that zero-cache does under the covers during the initial setup.

Karavil added 18 commits August 12, 2025 18:44

refactor: eliminate code duplication and improve readability

0025bb6

- Add #isTableIgnored() helper method in ChangeMaker - Add helper functions in initial-sync for table filtering - Remove redundant checks across all operations - Remove useless test file that was testing Set behavior

refactor: extract shared utilities for ignored tables

b6d82a1

- Create ignored-tables.ts with shared utilities - Remove duplicated table expansion logic - Single source of truth for table filtering logic

feat: enforce fully qualified table names for ignored tables

c11d572

- Require schema.table format (e.g., 'public.users') - Add validation to reject simple table names - Remove ambiguity - you must specify exactly which schema - Simpler implementation with only exact matches

refactor: use imported isTableIgnored function consistently

fed79ce

- Remove class method redefinition (#isTableIgnored) - Use shared isTableIgnored function from ignored-tables.ts - Pass ignoredTables Set as parameter for all calls - Improves code consistency and maintainability

refactor: use set-utils equals for comparing ignored tables

079e8c3

- Use equals() from set-utils instead of deepEqual with arrays - Create Sets from both sides for proper Set comparison - Cleaner and more consistent with existing Set operations in codebase

style: remove unnecessary comments

ef5e74b

- Remove obvious comments that just repeat what the code does - Keep comments that explain why or provide important context

style: remove obvious comment from isTableIgnored

a7c2e4e

- Code is self-explanatory, no need for comment

Karavil marked this pull request as ready for review August 13, 2025 14:31

Karavil added 9 commits August 13, 2025 10:32

Merge remote-tracking branch 'origin/main' into feat/ignored-publicat…

74df276

…ion-tables

test: enable change-source.pg-test.ts tests

91f9466

Re-enabled the change-source/pg test suite that was previously skipped. All tests pass including the new ignored tables functionality tests.

chore: remove bun.lock file

5766f08

This repository uses npm, not bun. The bun.lock file was accidentally added.

refactor: format initial-sync.ts for consistency

0037786

Karavil added 2 commits August 13, 2025 23:35

docs: explain why ignoredTables migration doesn't populate values

6e34d17

The migration intentionally leaves ignoredTables empty to trigger a resync. This ensures the SQLite replica is completely rebuilt without stale data from newly-ignored tables.

fix: use proper SQL identifier quoting in ignoredTables migration

ee2d0bb

Fixed migration rocicorp#11 to use sql() wrapper for proper identifier quoting. Also updated test expectations to match new schema version 11.

Karavil commented Aug 14, 2025

View reviewed changes

refactor: replace type casting with proper type annotations

da72a10

Replaced 'as string[]' type casts with proper ShardConfig type annotations in test files. This follows TypeScript best practices and improves type safety.

aboodman closed this Oct 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Add support for ignoring tables during Zero Cache replication #4727

feat: Add support for ignoring tables during Zero Cache replication #4727

Uh oh!

Karavil commented Aug 12, 2025 •

edited

Loading

Uh oh!

vercel bot commented Aug 12, 2025

Uh oh!

Karavil Aug 14, 2025

Uh oh!

aboodman commented Aug 14, 2025

Uh oh!

aboodman commented Aug 14, 2025

Uh oh!

Karavil commented Aug 14, 2025 •

edited

Loading

Uh oh!

Karavil commented Aug 14, 2025 •

edited

Loading

Uh oh!

Karavil commented Aug 14, 2025 •

edited

Loading

Uh oh!

Karavil commented Aug 14, 2025

Uh oh!

darkgnotic commented Aug 19, 2025

Uh oh!

Uh oh!

feat: Add support for ignoring tables during Zero Cache replication #4727

feat: Add support for ignoring tables during Zero Cache replication #4727

Uh oh!

Conversation

Karavil commented Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Design Decision

Testing

Uh oh!

vercel bot commented Aug 12, 2025

Uh oh!

Karavil Aug 14, 2025

Choose a reason for hiding this comment

Uh oh!

aboodman commented Aug 14, 2025

Uh oh!

aboodman commented Aug 14, 2025

Uh oh!

Karavil commented Aug 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Karavil commented Aug 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Karavil commented Aug 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Karavil commented Aug 14, 2025

Uh oh!

darkgnotic commented Aug 19, 2025

Uh oh!

Uh oh!

Karavil commented Aug 12, 2025 •

edited

Loading

Karavil commented Aug 14, 2025 •

edited

Loading

Karavil commented Aug 14, 2025 •

edited

Loading

Karavil commented Aug 14, 2025 •

edited

Loading