-
Notifications
You must be signed in to change notification settings - Fork 96
feat: Add support for ignoring tables during Zero Cache replication #4727
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Adds ZERO_IGNORED_PUBLICATION_TABLES environment variable to exclude specific tables from replication while preserving schema compatibility. Key features: • 🎯 Tables are created but remain empty (schema preserved) • ⚡ Initial sync skips data copying for ignored tables • 🔄 Replication changes are dropped for ignored tables • ✅ Works because SQLite foreign keys are disabled by default Usage: export ZERO_IGNORED_PUBLICATION_TABLES='["audit_logs", "staging.imports"]' This minimal implementation (~50 lines) provides table filtering without modifying SQL queries, making it safer and simpler than deep integration.
@Karavil is attempting to deploy a commit to the Rocicorp Team on Vercel. A member of the Team first needs to authorize it. |
- Add #isTableIgnored() helper method in ChangeMaker - Add helper functions in initial-sync for table filtering - Remove redundant checks across all operations - Remove useless test file that was testing Set behavior
- Create ignored-tables.ts with shared utilities - Remove duplicated table expansion logic - Single source of truth for table filtering logic
- Remove confusing auto-expansion of table names - Use direct Set matching: 'users' matches any schema, 'public.users' matches specific - Eliminates bugs with table names containing dots - Much simpler and more predictable behavior
- Require schema.table format (e.g., 'public.users') - Add validation to reject simple table names - Remove ambiguity - you must specify exactly which schema - Simpler implementation with only exact matches
- Remove class method redefinition (#isTableIgnored) - Use shared isTableIgnored function from ignored-tables.ts - Pass ignoredTables Set as parameter for all calls - Improves code consistency and maintainability
- Move ignoredTables from Zero config layer to shard config in database - Eliminates layering violation where pg abstraction imported Zero config - Add ignoredTables to InternalShardConfig schema and shardConfig table - Include migration (v11) to add column for existing shards - Pass ignoredTables from Zero config during shard initialization - ChangeMaker and initial-sync now read from InternalShardConfig - Provides consistency across distributed nodes - Cleaner architecture with proper separation of concerns
…sing - Add .map() transform to internalShardConfigSchema to convert array to Set - Remove buildIgnoredTablesSet function as it's no longer needed - ChangeMaker and initial-sync now use the Set directly from config - More efficient - Set is created once during schema parsing - Cleaner code with less manual conversion
- Add check to compare requested vs replicated ignored tables - Drop shard and throw AutoResetSignal when mismatch detected - Prevents stale data from remaining in SQLite when tables become ignored - Follows same pattern as publication changes
- Use equals() from set-utils instead of deepEqual with arrays - Create Sets from both sides for proper Set comparison - Cleaner and more consistent with existing Set operations in codebase
- Use .optional(() => []) pattern consistently for default empty array - Remove unnecessary fallbacks since value is always defined - ShardConfig.ignoredTables is now always an array, never undefined - Cleaner code without || [] checks everywhere
- Filter out ignored tables before processing instead of inside map - Cleaner code - no fake Promise.resolve for ignored tables - More efficient - only process tables that actually need copying - Same behavior - ignored tables still logged and contribute 0 to totals
- Remove obvious comments that just repeat what the code does - Keep comments that explain why or provide important context
- Code is self-explanatory, no need for comment
Added three focused integration tests to verify ignored tables behavior: • Ignored tables excluded from initial sync - verifies tables are created but remain empty • Changes to ignored tables are dropped - confirms replication filters out changes • AutoReset on changed ignored tables - ensures resync when config changes Tests verify the minimal implementation approach where filtering happens at the application level rather than modifying SQL queries.
Successfully added three comprehensive integration tests that verify: • Ignored tables are created but remain empty during initial sync • Changes to ignored tables are properly dropped during replication • Configuration changes to ignored tables trigger a full resync All tests pass on PostgreSQL 16 using testcontainers. The tests confirm that the minimal implementation correctly filters data at the application level while maintaining table schema for consistency.
• Refactored startReplication to use optional object parameter • Updated all test calls to use { ignoredTables: [...] } pattern • Added ignoredTables to all ShardConfig objects in initial-sync tests test: add comprehensive initial-sync tests for ignored tables Added two targeted tests directly in initial-sync.pg-test.ts: • 'ignored tables are created but not synced' - verifies that: - Regular tables get their data synced during initial sync - Ignored tables are created but remain empty - Ignored tables configuration is persisted in shardConfig • 'multiple ignored tables' - verifies that: - Multiple tables can be ignored simultaneously - Only non-ignored tables receive data during sync - All tables (ignored and regular) have their schema created These tests verify the behavior at the initialSync function level, complementing the integration tests in change-source.pg-test.ts.
Implements comprehensive support for ignoring specific tables from PostgreSQL publication-based replication. Key features: • Tables defined in IGNORED_PUBLICATION_TABLES env var are created but data is skipped • Changes to ignored tables are dropped during replication • Changing the ignored tables list triggers automatic full resync • Named arguments pattern for better API ergonomics Tests added: • Ignored tables excluded from initial sync • Changes to ignored tables are dropped during replication • AutoReset triggered when ignored tables list changes • Multiple ignored tables handled correctly • Integration tests in both change-source and initial-sync Implementation follows application-level filtering approach for compatibility with existing publication infrastructure.
Added tests for complex scenarios involving ignored tables and publications: • Ignored table with row filter in publication - ensures ignored takes precedence • Ignored table in multiple publications - verifies table stays empty regardless • Exact table name matching - confirms no partial matching (test_table vs test_table_2) • Schema qualification - tests ignoring tables in specific schemas only All tests verify that ignored tables are created but remain empty, even when: - Row filters would normally allow some data through - Multiple publications reference the same table - Table names are similar but not exact matches - Tables with same name exist in different schemas
Made ignoredTables optional in ShardConfig to avoid breaking existing code: • Changed ShardConfig type to make ignoredTables optional • Updated all usages to handle undefined with fallback to empty array/set • Fixed PostgreSQL ARRAY[] type casting with explicit ::TEXT[] cast • Fixed duplicate import in init.ts • Added missing ignoredPublicationTables to pusher.test.ts config All type checks and tests now pass successfully.
Re-enabled the change-source/pg test suite that was previously skipped. All tests pass including the new ignored tables functionality tests.
Updated documentation in multiple places: • Added ZERO_APP_IGNORED_PUBLICATION_TABLES to zbugs/.env.example • Enhanced zero-config.ts description with env var name and resync note • Created Configuration section in zero-cache README with detailed usage Documentation covers: • Environment variable format (JSON array) • Requirement for fully qualified table names • Behavior (tables created but empty) • Use cases (audit logs, temp data, analytics) • Important note about full resync on changes
Added ZERO_APP_IGNORED_PUBLICATION_TABLES environment variable wherever ZERO_APP_PUBLICATIONS is referenced: • GitHub Actions workflows (prod, sandbox, gigabugs) • SST config for deployment • zbugs .env.example file • Simplified config description to be concise
Restored comprehensive documentation in zero-config.ts including: • Clear format and examples • Multiple use cases (audit logs, temp data, analytics, etc.) • Important notes about schema qualification and resync behavior This documentation helps users understand how to effectively use the feature.
The ignoredTables field should remain optional in the ShardConfig when ignoredPublicationTables is not provided in the config. This ensures backward compatibility and cleaner type handling. Using conditional spread operator to only include ignoredTables when ignoredPublicationTables is present in the config.
This repository uses npm, not bun. The bun.lock file was accidentally added.
The migration intentionally leaves ignoredTables empty to trigger a resync. This ensures the SQLite replica is completely rebuilt without stale data from newly-ignored tables.
Fixed migration rocicorp#11 to use sql() wrapper for proper identifier quoting. Also updated test expectations to match new schema version 11.
const SHARD_NUM = 1; | ||
|
||
describe.skip('change-source/pg', {timeout: 30000, retry: 3}, () => { | ||
describe('change-source/pg', {timeout: 30000, retry: 3}, () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, whoops? not sure why this was being skipped... I can undo it? but it seemed useful to run these tests.
Curious about this design decision:
Why create the table in SQLite? |
Thank you very much for the contribution. Exciting! @darkgnotic is best person to review this but he is on vacation right now. That is why this hasn't been reviewed so far. From a product pov can you explain a bit more why the publication approach is hard to use? I believe you that it is, I'm just curious... Is it because you don't want to have to remember to update the PG publication when you add a table? Why is it not possible to maintain the publication as part of the rest of your schema management? |
I was mostly worried about:
Not super committed to this behavior though—can make changes! |
Honestly, it's just more to think about! We have to use custom publications right now because of the sheer amount of data we have on file (initial syncs take ages if we don't do this). And I like that Zero is simple! I'd rather not create and maintain a publication for a few tables I want to ignore. I do agree that a better abstraction around generating publications could work for this (we use Drizzle Zero; it would plug in nicely there). But it's just another point of friction for experimenting with Zero. Now I have to migrate my database before I can deploy Zero! |
I could also see this playing nicely with your cloud offering in the future? It'd be pretty much plug and play: -> Create a publication for all tables |
Replaced 'as string[]' type casts with proper ShardConfig type annotations in test files. This follows TypeScript best practices and improves type safety.
The naming here is also a bit weird. Should it be |
Hi @Karavil. Thank you for a great proposal and well thought out implementation. I appreciate (and agree with) the design decisions you detailed in the PR description. I also agree that this would be a useful feature to provide in a cloud offering, and went through the exercise of what that might look like at a high level, were we to implement it. At the end of the day, the problem boils down to a deficiency in the Postgres API (e.g. for The advantage of implementing it at the Postgres layer is that:
Would you be up for trying this approach? And if you're willing, we'd love to figure out how to make it available for other users, whether it be through documented setup examples, a cli, or something that zero-cache does under the covers during the initial setup. |
Summary
Adds support for ignoring specific tables from PostgreSQL publications during replication. Tables in
IGNORED_PUBLICATION_TABLES
get created in SQLite but stay empty - all changes are dropped. Useful for excluding audit logs or other high-volume tables.Stored the ignored tables in the database with publications in
InternalShardConfig
. They define the replication boundary together - publications control what Postgres sends, ignored tables control what Zero Cache accepts.Important: Changing ignored tables triggers a full resync (like changing publications). This prevents stale data from remaining in SQLite when a table becomes ignored.
Design Decision
Ignored tables are like publications - both define what data replicates. Publications say what Postgres sends, ignored tables say what Zero Cache keeps. If nodes disagree on either during deployment, you get inconsistent data (some nodes have audit_logs, others don't).
Wrestled with where to store the config:
The database approach ensures all nodes agree on what to replicate, even mid-deployment. This matters because ignored tables (like publications) are replication config, not app config. App config affects how a node runs (ports, log levels) - fine to vary. Replication config affects what data exists - must be consistent or you get different data on different nodes.
First PR here, might be missing context. Happy to refactor if storing in database seems overkill - just felt like ignored tables belong with publications since they both define the replication boundary.
Testing
Check: