Release date: March 9, 2023
Optimized the inference of storage_medium
. When BEs use both SSD and HDD as storage devices, if the property storage_cooldown_time
is specified, StarRocks sets storage_medium
to SSD
. Otherwise, StarRocks sets storage_medium
to HDD
. #18649
The following bugs are fixed:
- A query may fail if ARRAY data from Parquet files in data lakes is queried. #17626 #17788 #18051
- The Stream Load job initiated by a program is hung and the FE does not receive the HTTP request sent by the program. #18559
- An error may occur when an Elasticsearch external table is queried. #13727
- BEs may crash if an expression encounters an error during initialization. #11396
- A query may fail if the SQL statement uses an empty array literal
[]
. #18550 - After StarRocks is upgraded from version 2.2 and later to version 2.3.9 and later, an error
No match for <expr> with operand types xxx and xxx
may occur when a Routine Load job is created with a calculation expression specified in theCOLUMN
parameter. #17856 - A load job is hung after a BE restarts. #18488
- When a SELECT statement uses an OR operator in the WHERE clause, extra partitions are scanned. #18610
Release date: February 20, 2023
- During a schema change, if a tablet clone is triggered and the BE nodes on which the tablet replicas reside change, the schema change fails. #16948
- The string returned by the group_concat() function is truncated. #16948
- When you use Broker Load to load data from HDFS through Tencent Big Data Suite (TBDS), an error
invalid hadoop.security.authentication.tbds.securekey
occurs, indicating that StarrRocks cannot access HDFS by using the authentication information provided by TBDS. #14125 #15693 - In some cases, CBO may use incorrect logic to compare whether two operators are equivalent. #17227 #17199
- When you connect to a non-Leader FE node and send the SQL statement
USE <catalog_name>.<database_name>
, the non-Leader FE node forwards the SQL statement, with<catalog_name>
excluded, to the Leader FE node. As a result, the Leader FE node chooses to use thedefault_catalog
and eventually fails to find the specified database. #17302
Release date: February 2, 2023
The following bugs are fixed:
- When resources are released after a large query finishes, there is a low probability that other queries are slowed down. This issue is more likely to occur if resource groups are enabled or the large query ends unexpectedly. #16454 #16602
- For a primary key table, if a replica's metadata version falls behind, StarRocks incrementally clones the missing metadata from other replicas to this replica. In this process, StarRocks pulls a large number of versions of metadata, and if too many versions of metadata accumulate without timely GC, excessive memory may be consumed and consequently the BEs may encounter OOM exceptions. #15935
- If an FE sends an occasional heartbeat to a BE, and the heartbeat connection times out, the FE considers the BE unavailable, leading to transaction failures on the BE. # 16386
- When you use a StarRocks external table to load data between StarRocks clusters, if the source StarRocks cluster is in an earlier version and the target StarRocks cluster is in a later version (2.2.8 ~ 2.2.11, 2.3.4 ~ 2.3.7, 2.4.1 or 2.4.2), the data loading fails. #16173
- BEs crash when multiple queries run concurrently and memory usage is relatively high. #16047
- When dynamic partitioning is enabled for a table and some partitions are dynamically deleted, if you execute TRUNCATE TABLE, an error
NullPointerException
is returned. Meanwhile, if you load data into the table, the FEs crash and can not restart. #16822
Release date: December 30, 2022
The following bugs are fixed:
- The column that is allowed to be NULL in a StarRocks table is incorrectly set to NOT NULL in a view created from that table. #15749
- A new tablet version is generated when data is loaded into StarRocks. However, the FE may not yet detect the new tablet version and still requires BEs to read the historical version of the tablet. If the garbage collection mechanism removes the historical version, the query cannot find the historical version and an error "Not found: get_applied_rowsets(version xxxx) failed tablet:xxx #version:x [xxxxxxx]" is returned. #15726
- FE takes up too much memory when data is frequently loaded. #15377
- For aggregate queries and multi-table JOIN queries, the statistics are not collected accurately and CROSS JOIN occurs in the execution plans, resulting in long query latency. #12067 #14780
Release date: December 22, 2022
- The Pipeline execution engine supports INSERT INTO statements. To enable it, set the FE configuration item
enable_pipeline_load_for_insert
totrue
. #14723 - The memory used by Compaction for the primary key table is reduced. #13861 #13862
- Deprecated the FE parameter
default_storage_medium
. The storage medium of a table is automatically inferred by the system. #14394
The following bugs are fixed:
- BEs may hang up when the resource group feature is enabled and multiple resource groups run queries at the same time. #14905
- When you create a materialized view by using CREATE MATERIALIZED VIEW AS SELECT, if the SELECT clause does not use aggregate functions, and uses GROUP BY, for example
CREATE MATERIALIZED VIEW test_view AS SELECT a,b from test group by b,a order by a;
, then the BE nodes all crash. #13743 - If you restart the BE immediately after you use INSERT INTO to frequently load data into the primary key table to make data changes, the BE may restart very slowly. #15128
- If only JRE is installed on the environment and JDK is not installed, queries fail after FE restarts. After the bug is fixed, FE cannot restart in that environment and it returns error
JAVA_HOME can not be jre
. To successfully restart FE, you need to install JDK on the environment. #14332 - Queries cause BE crashes. #14221
exec_mem_limit
cannot be set to an expression. #13647- You cannot create a sync refreshed materialized view based on subquery results. #13507
- The comments for columns are deleted after you refresh the Hive external table. #13742
- During a correlated JOIN, the right table is processed before the left table and the right table is very large. If compaction is performed on the left table while the right table is being processed, the BE node crashes. #14070
- If the Parquet file column names are case-sensitive, and the query condition uses upper-case column names from the Parquet file, the query returns no result. #13860 #14773
- During bulk loading, if the number of connections to Broker exceeds the default maximum number of connections, Broker is disconnected and the loading job fails with an error message
list path error
. #13911 - When BEs are highly loaded, the metric for resource groups
starrocks_be_resource_group_running_queries
may be incorrect. #14043 - If the query statement uses OUTER JOIN, it may cause the BE node to crash. #14840
- After you create an asynchronous materialized view by using StarRocks 2.4, and you roll back it to 2.3, you may find FE fails to start. #14400
- When the primary key table uses delete_range, and the performance is not good, it may slow down data reading from RocksDB and cause high CPU usage. #15130
Release date: November 30, 2022
- Colocate Join supports Equi Join. #13546
- Fix the problem that primary key index files are too large due to continuously appending WAL records when data is frequently loaded. #12862
- FE scans all tablets in batches so that FE releases db.readLock at scanning intervals in case of holding db.readLock for too long. #13070
The following bugs are fixed:
- When a view is created based directly on the result of UNION ALL, and the UNION ALL operator's input columns include NULL values, the schema of the view is incorrect since the data type of columns is NULL_TYPE rather than UNION ALL's input columns. #13917
- The query result of
SELECT * FROM ...
andSELECT * FROM ... LIMIT ...
is inconsistent. #13585 - External tablet metadata synchronized to FE may overwrite local tablet metadata, which causes data loading from Flink to fail. #12579
- BE nodes crash when null filter in Runtime Filter handles literal constants. #13526
- An error is returned when you execute CTAS. #12388
- The metrics
ScanRows
collected by pipeline engine in audit log may be wrong. #12185 - The query result is incorrect when you query compressed HIVE data. #11546
- Queries are timeout and StarRocks responds slowly after a BE node crashes. #12955
- The error of Kerberos authentication failure occurs when you use Broker Load to load data. #13355
- Too many OR predicates cause statistics estimation to take too long. #13086
- BE node crashes if Broker Load loads ORC files (Snappy compression) contain uppercase column names. #12724
- An error is returned when unloading or querying Primary Key table takes more than 30 minutes. #13403
- The backup task fails when you back up large data volumes to HDFS by using a broker. #12836
- The data StarRocks read from Iceberg may be incorrect, which is caused by the
parquet_late_materialization_enable
parameter. #13132 - An error
failed to init view stmt
is returned when a view is created. #13102 - An error is returned when you use JDBC to connect StarRock and execute SQL statements. #13526
- The query is timeout because the query involves too many buckets and uses tablet hint. #13272
- A BE node crashes and cannot be restarted, and in the meantime, the loading job into a newly built table reports an error. #13701
- All BE nodes crash when a materialized view is created. #13184
- When you execute ALTER ROUTINE LOAD to update the offset of consumed partitions, an error
The specified partition 1 is not in the consumed partitions
may be returned, and followers eventually crash. #12227
Release date: November 10, 2022
- The error message provides a solution when StarRocks fails to create a Routine Load job because the number of running Routine Load job exceeds the limit. #12204
- The query fails when StarRocks queries data from Hive and fails to parse CSV files. #13013
The following bugs are fixed:
- The query may fail if HDFS files paths contain
()
. #12660 - The result of ORDER BY ... LIMIT ... OFFSET is incorrect when the subquery contains LIMIT. #9698
- StarRocks is case-insensitive when querying ORC files. #12724
- BE may crash when RuntimeFilter is closed without invoking the prepare method. #12906
- BE may crash because of memory leak. #12906
- The query result may be incorrect after you add a new column and immediately delete data. #12907
- BE may crash because of sorting data. #11185
- If StarRocks and MySQL client are not on the same LAN, the loading job created by using INSERT INTO SELECT can not be terminated successfully by executing KILL only once. #11879
- The metrics
ScanRows
collected by pipeline engine in audit log may be wrong. #12185
Release date: September 27, 2022
The following bugs are fixed:
- Query result may be inaccurate when you query an Hive external table stored as a text file. #11546
- Nested arrays are not supported when you query Parquet files. #10983
- Queries or a query may time out if concurrent queries that read data from StarRocks and external data sources are routed to the same resource group, or a query reads data from StarRocks and external data sources. #10983
- When the Pipeline execution engine is enabled by default, the parameter parallel_fragment_exec_instance_num is changed to 1. It will cause data loading by using INSERT INTO to be slow. #11462
- BE may crash if there are mistakes when a expression is initialized. #11396
- The error heap-buffer-overflow may occur if you execute ORDER BY LIMIT. #11185
- Schema change fails if you restart Leader FE in the meantime. #11561
Release date: September 7, 2022
- Late materialization is supported to accelerate range filter-based queries on external tables in Parquet format. #9738
- The SHOW AUTHENTICATION statement is added to display user authentication-related information. #9996
- A configuration item is provided to control whether StarRocks recursively traverses all data files for the bucketed Hive table from which StarRocks queries data. #10239
- The resource group type
realtime
is renamed asshort_query
. #10247 - StarRocks no longer distinguishes between uppercase letters and lowercase letters in Hive external tables by default. #10187
The following bugs are fixed:
- Queries on an Elasticsearch external table may unexpectedly exit when the table is divided into multiple shards. #10369
- StarRocks throws errors when sub-queries are rewritten as common table expressions (CTEs). #10397
- StarRocks throws errors when a large amount of data is loaded. #10370 #10380
- When the same Thrift service IP address is configured for multiple catalogs, deleting one catalog invalidates the incremental metadata updates in the other catalogs. #10511
- The statistics of memory consumption from BEs are inaccurate. #9837
- StarRocks throws errors for queries on Primary Key tables. #10811
- Queries on logical views are not allowed even when you have SELECT permissions on these views. #10563
- StarRocks does not impose limits on the naming of logical views. Now logical views need to follow the same naming conventions as tables. #10558
- Add BE configuration
max_length_for_bitmap_function
with a default value 1000000 for bitmap function, and addmax_length_for_to_base64
with a default value 200000 for base64 to prevent crash. #10851
Release date: August 22, 2022
- Broker Load supports transforming the List type in Parquet files into non-nested ARRAY data type. #9150
- Optimized the performance of JSON-related functions (json_query, get_json_string, and get_json_int). #9623
- Optimized the error message: During a query on Hive, Iceberg, or Hudi, if the data type of the column to query is not supported by StarRocks, the system throws an exception on the column. #10139
- Reduced the scheduling latency of resource groups to optimize resource isolation performance. #10122
The following bugs are fixed:
- Wrong result is returned from the query on Elasticsearch external tables due to incorrect pushdown of the
limit
operator. #9952 - Query on Oracle external tables fails when the
limit
operator is used. #9542 - BE is blocked when all Kafka Brokers are stopped during a Routine Load. #9935
- BE crashes during a query on a Parquet file whose data type mismatches that of the corresponding external table. #10107
- Query times out because the scan range of external tables is empty. #10091
- The system throws an exception when the ORDER BY clause is included in a sub-query. #10180
- Hive Metastore hangs when Hive metadata is reloaded asynchronously. #10132
Release date: July 29, 2022
-
The Primary Key model supports complete DELETE WHERE syntax. For more information, see DELETE.
-
The Primary Key model supports persistent primary key indexes. You can choose to persist the primary key index on disk rather than in memory, significantly reducing memory usage. For more information, see Primary Key model.
-
Global dictionary can be updated during real-time data ingestion,optimizing query performance and delivering 2X query performance for string data.
-
The CREATE TABLE AS SELECT statement can be executed asynchronously. For more information, see CREATE TABLE AS SELECT.
-
Support the following resource group-related features:
- Monitor resource groups: You can view the resource group of the query in the audit log and obtain the metrics of the resource group by calling APIs. For more information, see Monitor and Alerting.
- Limit the consumption of large queries on CPU, memory, and I/O resources: You can route queries to specific resource groups based on the classifiers or by configuring session variables. For more information, see Resource group.
-
JDBC external tables can be used to conveniently query data in Oracle, PostgreSQL, MySQL, SQLServer, ClickHouse, and other databases. StarRocks also supports predicate pushdown, improving query performance. For more information, see External table for a JDBC-compatible database.
-
[Preview] A new Data Source Connector framework is released to support external catalogs. You can use external catalogs to directly access and query Hive data without creating external tables. For more information, see Use catalogs to manage internal and external data.
-
Added the following functions:
- The compaction mechanism can merge large volume of metadata more quickly. This prevents metadata squeezing and excessive disk usage that can occur shortly after frequent data updates.
- Optimized the performance of loading Parquet files and compressed files.
- Optimized the mechanism of creating materialized views. After the optimization, materialized views can be created at a speed up to 10 times faster than before.
- Optimized the performance of the following operators:
- TopN and sort operators
- Equivalence comparison operators that contain functions can use Zone Map indexes when these operators are pushed down to scan operators.
- Optimized Apache Hive™ external tables.
- When Apache Hive™ tables are stored in Parquet, ORC, or CSV format, schema changes caused by ADD COLUMN or REPLACE COLUMN on Hive can be synchronized to StarRocks when you execute the REFRESH statement on the corresponding Hive external table. For more information, see Hive external table.
hive.metastore.uris
can be modified for Hive resources. For more information, see ALTER RESOURCE.
- Optimized the performance of Apache Iceberg external tables. A custom catalog can be used to create an Iceberg resource. For more information, see Apache Iceberg external table.
- Optimized the performance of Elasticsearch external tables. Sniffing the addresses of the data nodes in an Elasticsearch cluster can be disabled. For more information, see Elasticsearch external table.
- When the sum() function accepts a numeric string, it implicitly converts the numeric string.
- The year(), month(), and day() functions support the DATE data type.
Fixed the following bugs:
- CPU utilization surges due to an excessive number of tablets.
- Issues that cause "fail to prepare tablet reader" to occur.
- The FEs fail to restart.#5642 #4969 #5580
- The CTAS statement cannot be run successfully when the statement includes a JSON function. #6498
- StarGo, a cluster management tool, can deploy, start, upgrade, and roll back clusters and manage multiple clusters. For more information, see Deploy StarRocks with StarGo.
- The pipeline engine is enabled by default when you upgrade StarRocks to version 2.3 or deploy StarRocks. The pipeline engine can improve the performance of simple queries in high concurrency scenarios and complex queries. If you detect significant performance regressions when using StarRocks 2.3, you can disable the pipeline engine by executing the
SET GLOBAL
statement to setenable_pipeline_engine
tofalse
. - The SHOW GRANTS statement is compatible with the MySQL syntax and displays the privileges assigned to a user in the form of GRANT statements.
- It is recommended that the memory_limitation_per_thread_for_schema_change ( BE configuration item) use the default value 2 GB, and data is written to disk when data volume exceeds this limit. Therefore, if you have previously set this parameter to a larger value, it is recommended that you set it to 2 GB, otherwise a schema change task may take up a large amount of memory.
To roll back to the previous version that was used before the upgrade, add the ignore_unknown_log_id
parameter to the fe.conf file of each FE and set the parameter to true
. The parameter is required because new types of logs are added in StarRocks v2.2.0. If you do not add the parameter, you cannot roll back to the previous version. We recommend that you set the ignore_unknown_log_id
parameter to false
in the fe.conf file of each FE after checkpoints are created. Then, restart the FEs to restore the FEs to the previous configurations.