Replicates PostgreSQL primary
database into Elasticsearch/OpenSearch index read-only replica
.
Does not rely on additional database queries or external queues, using exclusively logical replication protocol, allowing almost realtime consistent synchronisation without external dependencies.
Focused on speed and efficiency.
- Consistent and fault tolerant, without dependencies
Thanks to PostgreSQL replication slots mechanisms. - Initial (re)indexing
UsingCOPY
command - Uses native PG protocol
Both Text and Binary form. - Full DB types support
json fields, composite types, arrays, enums... Except of arrays of composite types. - Native Parent/Child join
Including document
_id
androuting
control - Limited denormalization & document modifications. check inlining
- Bulk requests Data is flushed to Elasticsearch/OpenSearch in bulk.
You can
- set
routing
and document_id
fields; - rename or skip fields;
- define parent/child
join
field; - inline rows as object into parent document;
- set custom inlining script;
set templated fields(planned)json-path names(planned)
Using COMMENT
s in your database schema. Check
syntax and description
Variable | Default | Description |
---|---|---|
PG_SLOT | pg2es | replication slot name |
PG_PUBLICATION | search | publication name |
PGHOST | localhost | |
PGPORT | 5432 | |
PGDATABASE | - | |
PGUSER | - | |
PGPASSWORD | - | |
SEARCH_HOST | - | URL or host of ElasticSearch/OpenSearch |
SEARCH_USERNAME | - | optional |
SEARCH_PASSWORD | - | optional |
SEARCH_BULK_SIZE | 4 | (MB) Bulk request size limit. |
SEARCH_PUSH_INTERVAL | 30s | idle push interval, when there is no enough rows for full bulk request. |
SEARCH_PUSH_THROTTLE | 500ms | hard limit. At most one request during this period. |
SEARCH_PUSH_DEBOUNCE | 500ms | delays bulk after idle, to fetch related data. |
LOG_FORMAT | json | json or cli |
LOG_LEVEL | warn | from debug to fatal |
- The script is single threaded* (not a bottleneck)... Separate goroutine is used to make ES requests.
- Links between Database <-> Schema <-> Table <-> Column, shoudld be considered read only, and safe for multithread use... (not yet)
- It's fast. All the the efforts shuld be towards readability, reliability and functionality.
- No 1:1 inlines (yet)
- Delete document deletes all inlines (AKA DELETE CASCADE), and they can not be restored.