Releases: StarRocks/starrocks
Release notes 2.1.7
Release date: May 26, 2022
Improvements
For window functions in which the frame is set to ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, if the partition involved in a calculation is large, StarRocks caches all data of the partition before it performs the calculation. In this situation, a large number of memory resources are consumed. StarRocks has been optimized not to cache all data of the partition in this situation. 5829
Bug Fixes
The following bugs are fixed:
- When data is loaded into a table that uses the Primary Key model, data processing errors may occur if the creation time of each data version stored in the system does not monotonically increase due to reasons such as backward-moved system time and related unknown bugs. Such data processing errors cause backends (BEs) to stop. #6046
- Some graphical user interface (GUI) tools automatically configure the set_sql_limit variable. As a result, the SQL statement ORDER BY LIMIT is ignored, and consequently an incorrect number of rows are returned for queries. #5966
- If the DROP SCHEMA statement is executed on a database, the database is forcibly deleted and cannot be restored. #6201
- When JSON-formatted data is loaded, BEs stop if the data contains JSON format errors. For example, key-value pairs are not separated by commas (,). #6098
- When a large amount of data is being loaded in a highly concurrent manner, tasks that are run to write data to disks are piled up on BEs. In this situation, the BEs may stop. #3877
- StarRocks estimates the amount of memory that is required before it performs a schema change on a table. If the table contains a large number of STRING fields, the memory estimation result may be inaccurate. In this situation, if the estimated amount of memory that is required exceeds the maximum memory that is allowed for a single schema change operation, schema change operations that are supposed to be properly run encounter errors. #6322
- After a schema change is performed on a table that uses the Primary Key model, a "duplicate key xxx" error may occur when data is loaded into that table. #5878
- If low-cardinality optimization is performed during Shuffle Join operations, partitioning errors may occur. #4890
- If a colocation group (CG) contains a large number of tables and data is frequently loaded into the tables, the CG may not be able to stay in the stable state. In this case, the JOIN statement does not support Colocate Join operations. StarRocks has been optimized to wait for a little longer during data loading. This way, the integrity of the tablet replicas to which data is loaded can be maximized.
Full Changelog: 2.1.6...2.1.7
Thanks to
@Astralidea, @HangyuanLiu, @Linkerist, @Youngwb, @chaoyli, @decster, @dirtysalt, @gengjun-git, @meegoo, @rickif, @sevev, @stdpain, @trueeyu, @xiaoyong-z
Release notes 2.0.6
Release date: May 25, 2022
Bug Fixes
The following bugs are fixed:
- Some graphical user interface (GUI) tools automatically configure the set_sql_limit variable. As a result, the SQL statement ORDER BY LIMIT is ignored, and consequently an incorrect number of rows are returned for queries. #5966
- If a colocation group (CG) contains a large number of tables and data is frequently loaded into the tables, the CG may not be able to stay in the stable state. In this case, the JOIN statement does not support Colocate Join operations. StarRocks has been optimized to wait for a little longer during data loading. This way, the integrity of the tablet replicas to which data is loaded can be maximized.
- If a few replicas fail to be loaded due to reasons such as heavy loads or high network latencies, cloning on these replicas is triggered. In this case, deadlocks may occur, which may cause a situation in which the loads on processes are low but a large number of requests time out. #5646 #6290
- After the schema of a table that uses the Primary Key model is changed, a "duplicate key xxx" error may occur when data is loaded into that table. #5878
- If the DROP SCHEMA statement is executed on a database, the database is forcibly deleted and cannot be restored. #6201
Full Changelog: 2.0.5...2.0.6
Thanks to
@Astralidea, @Linkerist, @chaoyli, @decster, @dirtysalt, @gengjun-git, @sevev, @stdpain
Release notes 2.2.0
New Features
- [Preview] Resource groups are supported. By using resource groups to control CPU and memory usage, StarRocks can achieve resource isolation and rational use of resources when different tenants perform complex and simple queries in the same cluster.
- [Preview] Java UDFs (user-defined functions) are supported. StarRocks supports writing UDFs in Java, extending StarRocks' functions.
- [Preview] Primary key model supports partial updates when data is loaded to the primary key model using Stream Load, Broker Load, and Routine Load. In real-time data update scenarios such as updating orders and joining multiple streams, partial updates allow users to update only a few columns.
- [Preview] JSON data types and JSON functions are supported.
- External tables based on Apache Hudi are supported, which further improves data lake analytics experience.
- The following functions are supported:
- ARRAY functions, including array_agg, array_sort, array_distinct, array_join, reverse, array_slice, array_concat, array_difference, array_overlap, and array_intersect
- BITMAP functions, including bitmap_max and bitmap_min
- Other functions, including retention and square
Improvement
- CBO's Parser and Analyzer are reconstructed, code structure is optimized and syntax such as Insert with CTE is supported. So the performance of complex queries is optimized, such as those queries reusing common table expression (CTE).
- The query performance of object storage-based (AWS S3, Alibaba Cloud OSS, Tencent COS) Apache Hive external table is optimized. After optimization, the performance of object storage-based queries is comparable to that of HDFS-based queries. Also, late materialization of ORC files is supported, improving query performance of small files.
- When external tables are used to query Apache Hive, StarRocks supports automatic and incremental updating of cached metastore data by consuming Hive Metastore events, such as data changes and partition changes. Moreover, it also supports querying DECIMAL and ARRAY data in Apache Hive.
- The performance of UNION ALL operator is optimized, delivering improvement of up to 2-25 times.
- The pipeline engine which can adaptively adjust query parallelism is released, and its profile is optimized. The pipeline engine can improve performance for small queries in high concurrent scenarios.
- StarRocks supports the loading of CSV files with multi-character row delimiters.
Bug Fixes
The following bugs are fixed:
- Deadlocks occur when data is loaded and changes are committed into tables based on Primary Key model. #4998
- Some FE (including BDBJE) stability issues. #4428, #4666, #2
- The return value overflows when the SUM function is used to calculate a large amount of data. #3944
- The return values of ROUND and TRUNCATE functions have precision issues. #4256
Some bugs detected by SQLancer. Please see SQLancer related issues.
Others
- The Flink connector flink-connector-starrocks supports Flink 1.14.
Release notes 2.0.5
Release date: May 13, 2022
Upgrade recommendation: Some critical bugs related to the correctness of stored data or data queries have been fixed in this version. It is recommended that you upgrade your StarRocks cluster in time.
Bug Fixes
The following bugs are fixed:
- [Critical Bug] Data may be lost as a result of BE failures. This bug is fixed by introducing a mechanism that is used to publish a specific version to multiple BEs at a time. #3140
- [Critical Bug] If tablets are migrated in specific data ingestion phases, data continues to be written to the original disk on which the tablets are stored. As a result, data is lost, and queries cannot be run properly. #5160
- [Critical Bug] When you run queries after you perform multiple DELETE operations, you may obtain incorrect query results if optimization on low-cardinality columns is performed for the queries. #5712
- [Critical Bug] If a query contains a JOIN clause that is used to combine a column with DOUBLE values and a column with VARCHAR values, the query result may be incorrect. #5809
- In certain circumstances, when you load data into your StarRocks cluster, some replicas of specific versions are marked as valid by the FEs before taking effect. At this time, if you query data of the specific versions, StarRocks cannot find the data and reports errors. #5153
- If a parameter in the SPLIT function is set to NULL, the BEs of your StarRocks cluster may stop running. #4092
- After your cluster is upgraded from Apache Doris 0.13 to StarRocks 1.19.x and keeps running for a period of time, a further upgrade to StarRocks 2.0.1 may fail. #5309
Thanks to:
@ABingHuang, @Astralidea, @HangyuanLiu, @Pslydhh, @Seaven, @Youngwb, @adzfolc, @decster, @gengjun-git, @kangkaisen, @mergify, @miomiocat, @mofeiatwork, @rickif, @satanson, @sevev, @stdpain
Release notes 2.1.6
Release date: May 10, 2022
Bug Fixes
The following bugs are fixed:
- When you run queries after you perform multiple DELETE operations, you may obtain incorrect query results if optimization on low-cardinality columns is performed for the queries. #5712
- If tablets are migrated in specific data ingestion phases, data continues to be written to the original disk on which the tablets are stored. As a result, data is lost, and queries cannot be run properly. #5160
- If you covert values between the DECIMAL and STRING data types, the return values may be in an unexpected precision. #5608
- If you multiply a DECIMAL value by a BIGINT value, an arithmetic overflow may occur. A few adjustments and optimizations are made to fix this bug. #4211
Thanks to
@ABingHuang, @Astralidea, @HangyuanLiu, @Seaven, @ZiheLiu, @caneGuy, @gengjun-git, @mergify, @satanson, @sevev, @silverbullet233, @stdpain
Release notes 2.1.5
Release date: April 27, 2022
BugFix
The following bugs are fixed:
- The calculation result is not correct when decimal multiplication overflows. After the bug is fixed, NULL is returned when decimal multiplication overflows.
- When statistics have a considerable deviation from the actual statistics, the priority of Collocate Join can be lower than Broadcast Join. As a result, the query planner may not choose Colocate Join as the more appropriate Join strategy. #4817
- Query fails because the plan for complex expressions is wrong when there are more than 4 tables to join.
- BEs may stop working under Shuffle Join when the shuffle column is a low-cardinality column. #4890
- BEs may stop working when the SPLIT function uses a NULL parameter. #4092
Thanks to:
@ABingHuang, @Astralidea, @HangyuanLiu, @Linkerist, @Seaven, @Youngwb, @adzfolc, @chaoyli, @decster, @gengjun-git, @kangkaisen, @liuyehcf, @meegoo, @mergify, @miomiocat, @mofeiatwork, @rickif, @satanson, @sevev, @stdpain, @trueeyu, @wyb
Release notes 2.0.4
Release date: April 18, 2022
Bug Fixes
The following bugs are fixed:
- After deleting columns, adding new partitions, and cloning tablets, the columns' unique ids in old and new tablets may not be the same, which may cause BE to stop working because the system uses a shared tablet schema. #4514
- When data is loading to a StarRocks external table, if the configured FE of the target StarRocks cluster is not a Leader, it will cause the FE to stop working. #4573
- Query results may be incorrect, when a Duplicate Key table performs schema change and creates materialized view at the same time. #4839
- The problem of possible data loss due to BE failure (solved by using Batch publish version). #3140
Release notes 2.1.4
Release date: April 8, 2022
New Feature
- The
UUID_NUMERIC
function is supported, which returns a LARGEINT value. Compared withUUID
function, the performance ofUUID_NUMERIC
function can be improved by nearly 2 orders of magnitude.
BugFix
The following bugs are fixed:
- After deleting columns, adding new partitions, and cloning tablets, the columns' unique ids in old and new tablets may not be the same, which may cause BE to stop working because the system uses a shared tablet schema. #4514
- When data is loading to a StarRocks external table, if the configured FE of the target StarRocks cluster is not a Leader, it will cause the FE to stop working. #4573
- The results of
CAST
function are different in StarRocks version 1.19 and 2.1. #4701 - Query results may be incorrect, when a Duplicate Key table performs schema change and creates materialized view at the same time. #4839
Release notes 2.1.3
Release date: March 19, 2022
Bug Fixes
The following bugs are fixed:
- The problem of possible data loss due to BE failure (solved by using Batch publish version). #3140
- Some queries may cause memory limit exceeded errors due to inappropriate execution plans.
- The checksum between replicas may be inconsistent in different compaction processes. #3438
- Query may fail in some situation when JSON reorder projection is not processed correctly. #4056
Release notes 2.0.3
Release date: March 14, 2022
BugFix
The following bugs are fixed: