v1.2.0
Upgrade Guide
NOTE: this guide is only used for upgrading CeresDB v1.1.0 to CeresDB v1.2.0, ignore it if you want to deploy a brand new CeresDB cluster with v1.2.0.
In v1.2.0, some incompatible changes are contained, so it's important to upgrade carefully:
- First, stop all the instances of CeresDB and CeresMeta;
- Upgrade the CeresMeta first by referring to the Upgrade Guide of CeresMeta;
- When upgrade the CeresDB, the config should be updated:
- Change the config section
[analytic.compaction_config]
to[analytic.compaction]
if you use it; - Add the config section about the
[cluster_deployment.etcd_client]
if your CeresDB cluster is inWithMeta
mode:
[cluster_deployment.etcd_client]
server_addrs = ['127.0.0.1:2379']
root_path = "/rootPath"
NOTE: the root_path
must be /rootPath
if upgrade from v1.1.0.
4. After updating CeresDB config, start the CeresDB server;
Major Features
- Enhancement on InfluxQL support:
- Support query with aggregators;
- #854 optimize influxql planner to load all tables on demand instead of loading them when initializing the planner;
- Replace influxdb_iox with CeresDB/influxql to remove unnecessary dependencies introduced by
influxdb_iox
;
- Enhancement on proxy module:
- Implement the proxy as a separate module;
- Support forward table requests in proxy;
- Support read and write on partition table in proxy;
- Recover the metadata of partition table from CeresMeta instead of CeresDB in proxy;
- Improvement of write performance:
- #822 solves the problem that compaction schedule triggered by flush procedure may block the write procedure;
- #814 is a big change set, and replaces the write queue with the lock on table level for less write contentions;
- #843 adjusts the flush strategy to avoid frequent write stall;
- #861 brings the level 1 to SSTs, and currently the SST of the level 0, which is generated by flushing, won't contain complex indexes, e.g. xor-filter, leading to faster flushing and less write stall;
- Enhancement on observability:
- #774 introduces the hotspot recorder that can be used to find out the top tables with the highest write/read throughput in a specific time window;
- #827 #831 provides more metrics for all the stages of writing procedure, which can be used to troubleshoot write performance problems, and the grafana dashboard config has been already updated.
- #817 introduces the CPU profiler, and the flamegraph of CPU can be generated easily just by an HTTP request to CeresDB server;
- Support the new mechanism of failover and load balancing, more details can refer to the [Release Note v1.2.0] of CeresMeta:
- #706 #853 implements the distributed locks for shard based on ETCD, and opening and closing of shards is only allowed with the shard lock held, and after that, data corruption caused by multiple shard leaders will be avoided completely;
- Support automatic failover of CeresDB nodes, that is to say, the service recovery can be handled automatically without any manual intervention;
- Support automatic load balance based on consistent hashing, which can ensure that shards are evenly distributed on each node of the cluster when the number of the cluster nodes increases or decreases;
Thanks
Heartfelt thanks for @zouxiang1993's effort in helping troubleshooting write performance issues.
What's Changed
- fix: simplify the logs in query path (#770) by @zouxiang1993 in #776
- fix: remove FixedSizeArena by @ShiKaiWi in #772
- chore(deps): bump time from 0.1.44 to 0.3.15 by @dependabot in #761
- feat: add default schema config by @jiacai2050 in #782
- fix: remove body limit for influxql request by @jiacai2050 in #783
- feat: add integration tests for influxql request by @jiacai2050 in #784
- feat: add java integration tests by @jiacai2050 in #786
- chore(deps): bump log4j-core from 2.8.2 to 2.17.1 in /integration_tests/sdk/java by @dependabot in #789
- chore(deps): bump junit from 4.12 to 4.13.1 in /integration_tests/sdk/java by @dependabot in #788
- fix: timestamp column should not be auto added by @chunshao90 in #787
- chore: route use read_runtime by @chunshao90 in #794
- feat: influxql support show measurements by @jiacai2050 in #795
- chore: bump version to 1.1.0 by @jiacai2050 in #797
- feat: impl getTableInfo in remoteEngine service by @chunshao90 in #793
- feat: add rust sdk test by @Rachelint in #791
- fix: avoid error when disk cache miss by @ShiKaiWi in #790
- feat: impl get_table_info in remote_engine_client by @chunshao90 in #798
- fix: avoid send empty record batch to client by @ShiKaiWi in #796
- chore: remove useless cluster_version by @chunshao90 in #804
- refactor: make tsbs more configurable by @ShiKaiWi in #805
- fix: avoid break when drop wal table failed by @MachaelLee in #806
- feat: implement route interface in http protocol by @MachaelLee in #803
- refactor: bump datafusion, add influxql aggregator support by @jiacai2050 in #778
- fix: add router when build request context for mysql by @jiacai2050 in #809
- feat: hotspot recorder by @MachaelLee in #774
- feat: introduce
TableOperator
to encasulate operation of tables by @Rachelint in #808 - feat: expose rocksdb background jobs option by @jiacai2050 in #812
- feat: integration test support env filter by @jiacai2050 in #811
- chore: bump datafusion by @jiacai2050 in #810
- feat: convert nanoseconds to milliseconds automatically by @dust1 in #780
- feat: add cpu profiler by @jiacai2050 in #817
- feat: upgrade rust-rocksdb by @ShiKaiWi in #821
- feat: avoid blocking the write procedure because of compaction schedule by @ShiKaiWi in #822
- feat: query partition table with proxy in grpc service by @chunshao90 in #802
- feat: influxql support fill syntax by @jiacai2050 in #824
- feat: install dev dependencies in make file by @MachaelLee in #815
- chore: remove unused dependency by @chunshao90 in #823
- feat: replace bg runtime with default and compact runtime by @ShiKaiWi in #826
- chore: add commit id of nightly docker image by @chunshao90 in #829
- chore: add write batch metrics by @jiacai2050 in #827
- feat: http query with proxy by @chunshao90 in #807
- feat: add metrics for write procedure by @ShiKaiWi in #831
- feat: impl prom query with proxy by @chunshao90 in #833
- feat: support write partition table in grpc service by @chunshao90 in #828
- fix: improve remote write performance by using separate runtime by @ShiKaiWi in #837
- chore: update ob client version by @MachaelLee in #835
- chore: remove unnecessary deps by @jiacai2050 in #838
- chore(deps): bump h2 from 0.3.16 to 0.3.17 by @dependabot in #841
- feat: tsbs support more write options by @ShiKaiWi in #839
- feat: support write batch in remote engine by @Rachelint in #840
- feat: serialize table operations by lock rather than queue by @ShiKaiWi in #814
- feat: avoid frequent write stall by @ShiKaiWi in #843
- fix: wrong default write batch size for run_tsbs by @ShiKaiWi in #845
- chore: clean forward configs by @jiacai2050 in #847
- feat: refactor manifest to get snapshot in memory by @Rachelint in #825
- chore: rename module
sql
toquery_frontend
by @Rachelint in #849 - feat: forward request in grpc write by @chunshao90 in #844
- chore: bump obkv client version by @MachaelLee in #850
- feat: support domain name as the ceresdb node addr by @ShiKaiWi in #852
- refactor: implement the distributed lock of shard by @ZuLiangWang in #706
- feat: compaction support different level by @jiacai2050 in #848
- fix: avoid panic when convert prom result by @jiacai2050 in #851
- feat: only collecting all tables on demand in influxql planner by @Rachelint in #854
- refactor: shard lock module by @ShiKaiWi in #853
- feat: support prom remote query forward by @jiacai2050 in #855
- feat: support querying partition table in prom query and http query by @chunshao90 in #857
- fix: build filter when needed by @jiacai2050 in #861
- feat: rename the compaction_config to compaction and adjust interval by @ShiKaiWi in #862
- refactor: implement prom remote query by convert to datafusion plan directly by @jiacai2050 in #860
- refactor: remove runtime from request context by @jiacai2050 in #859
- test: add prometheus integration tests by @jiacai2050 in #864
- chore: proxy as a separate module by @chunshao90 in #865
- fix: fix write partition table by @chunshao90 in #869
- chore: add some commands in Makefile by @chunshao90 in #866
- fix: fix evict logic in remote client by @Rachelint in #872
- fix: drop partition table by @chunshao90 in #871
- chore!: rename table name in table_kv based wal by @Rachelint in #868
- Revert "chore!: rename table name in table_kv based wal" by @ShiKaiWi in #873
Full Changelog: v1.1.0...v1.2.0