-
Notifications
You must be signed in to change notification settings - Fork 3.5k
Open
Labels
kind/featureCategorizes issue or PR as related to a new feature.Categorizes issue or PR as related to a new feature.
Description
Behavior Changes
- Add the environment variable SKIP_CHECK_ULIMIT to skip the ulimit value verification check within the BE process. This is only applicable to applications in the Docker quick - start scenario. [feat](docker)Add a BE ENV item 'SKIP_CHECK_ULIMIT' for Docker to start quickly #45267
- Add the enable_cooldown_replica_affinity session variable to control the selection of replica affinity for queries under cold - hot seperation.
- In FE, add the configurations restore_job_compressed_serialization and backup_job_compressed_serialization to solve the OOM problem of FE during backup and restore operations when the number of db tablets is extremely large. Downgrading is not possible after enabling these configurations.
New Features - The Arrowflight protocol supports accessing BE through a load - balancing device. [fix](arrow-flight-sql) Arrow flight server supports data forwarding when BE uses public vip #43281
- Now lambda expressions support capturing external columns ([opt](lambda) let lambda expression support refer outer slot #45186).
Improvements
Lakehouse
- Update the Hudi version to 0.15. And optimize the query planning performance of Hudi tables.
- Optimize the read performance of MaxCompute partitioned tables ([enchement](mc)Optimize reading of maxcompute partition tables. #45148).
- Support the session variable enable_text_validate_utf8, which can ignore the UTF8 encoding detection in CSV format. ([enchement](utf8)import enable_text_validate_utf8 session var #45537)
- Optimize the performance of Parquet file lazy materialization under high - filtering - rate conditions. (branch-2.1: [fix](parquet-reader) Fixed the issue of excessive scanning data in late materialization case of parquet reader #46121 #46183)
Asynchronous Materialized Views - Now it supports manually refreshing partitions that do not exist in an asynchronous materialized view ([enhance](mtmv)Change the way to verify the existence of partition names when refreshing MTMV #45290).
- Optimize the performance of transparent rewrite planning ([opt](mtmv) Optimize plan generate when create mtmv and use mtmv cache when collect table of mtmv #44786).
Query Optimizer
- Improve the adaptive ability of runtime filters ([feat](nereids)set runtime filter wait time according to table row count and table type #42640).
- Add the ability to generate original column filter conditions from filter conditions on max/min aggregate function columns ([Feat](nereids) add max/min filter push down rewrite rule #39252).
- Add the ability to extract single - side filter conditions from join predicates ([improvement](nereids) support extract from disjunction in join on condition #38479).
- Optimize the ability of predicate derivation on set operators to better generate filter predicates ([Feat](nereids) support pull up predicate from set operator #39450).
- Optimize the exception handling ability of statistic information collection and usage to avoid generating unexpected execution plans when collection exceptions occur ([improvement](statistics)External table getRowCount return -1 when row count is not available or row count is 0. #43009 [fix](nereids) fix bug: convert stringLikeLiteral to double #43776 [improvement](statistics)Skip auto analyze empty table. #43865 [improvement](statistics)Change auto analyze max width to 300 and health threshold to 90. #42104 [improvement](statistics)Support auto analyze columns that haven't been analyzed for a long time. #42399 [feat](nereids) adjust min/max for partition key #41729).
Query Execution Engine - Optimize the execution of queries with limit to end faster and avoid unnecessary data scanning ([fix](planner) query should be cancelled if limit reached (#44338) #45222).
Storage Management
- CCR supports more comprehensive operations, such as rename table, rename column, modify comment, drop view, drop rollup, etc.
- Improve the accuracy of the broker load import progress and the performance when importing multiple compressed files.
- Improve the routine load timeout strategy and thread - pool usage to prevent routine load timeout failures and impacts on queries.
Others
- The Docker quick - start image supports starting without setting environment parameters. Add the environment variable SKIP_CHECK_ULIMIT to skip the start_be.sh script and the swap, max_map_count, ulimit - related verification checks within the BE process. This is only applicable to applications in the Docker quick - start scenario. [feat](docker)Modify the init_be and start_be scripts to meet the requirements for rapid Docker startup. #45269
- Add the new LDAP configuration ldap_group_filter for custom group filtering. branch-3.0: [Improvement](LDAP Auth)Enhance LDAP authentication with a configurable group filter #43292
- Optimize the performance when using ranger ([enhance](auth)Optimize the authentication logic of Ranger Doris #41207).
- Fix the inaccurate statistics of scan bytes in the audit log ([opt](scan) unify the local and remote scan bytes stats for all scanners for 2.1 #45167).
- Now, the default values of columns can be correctly displayed in the COLUMNS system table ([improvement](information_schema)Support show default value in information_schema. #44849).
- Now, the definition of views can be correctly displayed in the VIEWS system table ([improvement](information_schema)Show view definition in information_schema.views. #45857).
- Now, the admin user cannot be deleted ([fix](auth)Prohibit deleting admin user #44751).
Bug Fixes
Lakehouse
Hive
- Fix the problem of being unable to query Hive views created by Spark (branch-2.1: [fix](hive) support query hive view created by spark #43553).
- Fix the problem of being unable to correctly read some Hive Transaction tables ([fix](hive)fix hive insert only translaction table. #45753).
- Fix the problem of incorrect partition pruning when Hive table partitions contain special characters ([fix](hive)fix hive catalog miss partition that have special characters. #42906).
Iceberg
- Fix the problem of being unable to create Iceberg tables in a Kerberos - authenticated environment ([feat](catalog)Support Pre-Execution Authentication for HMS Type Iceberg Catalog Operations. #43445).
- Fix the problem of inaccurate count(*) queries when there are dangling deletes in Iceberg tables in some cases ([fix](iceberg)Fix count(*) error with dangling delete problem #44039).
- Fix the problem of query errors due to column name mismatches in Iceberg tables in some cases ([fix](iceberg)Bring field_id with parquet files And fix map type's key optional #44470).
- Fix the problem of being unable to read Iceberg tables when their partitions are modified in some cases ([enchement](iceberg)support read iceberg partition evolution table. #45367).
Paimon
- Fix the problem that the Paimon Catalog cannot access Alibaba Cloud OSS - HDFS ([Fix](PaimonCatalog) fix the problem that paimon catalog can not access to OSS-HDFS #42585).
Hudi - Fix the problem of ineffective partition pruning in Hudi tables in some cases ([fix](hudi)Add hudi catalog read partition table partition prune #44669).
JDBC - Fix the problem of being unable to obtain tables using the JDBC Catalog after enabling the case - insensitive table name feature in some cases ([2.1][improvement](jdbc catalog) Optimize JdbcCatalog case mapping stability #43256).
MaxCompute
- Fix the problem of ineffective partition pruning in MaxCompute tables in some cases ([enchement](maxcompute)add mc catalog read partition table partition prune #44508).
Others
- Fix the problem of FE memory leaks caused by Export tasks in some cases ([fix](Export) fix a memory leak in the FE because of the ExportJob #44019).
- Fix the problem of being unable to access S3 object storage using the https protocol in some cases ([fix](s3) do not replace https scheme if specified #44242).
- Fix the problem of the inability to automatically refresh Kerberos authentication tickets in some cases ([feat](catalog)Replace HadoopUGI with HadoopKerberosAuthenticator to Support Kerberos Ticket Auto-Renewal #44916).
- Fix the problem of errors when reading Hadoop Block compressed format files in some cases ([fix](hive) fix block decompressor bug #45289).
- When querying ORC - formatted data, no longer push down CHAR - type predicates to avoid possible result errors ([Fix](ORC) Not push down fixed char type in orc reader #45484).
Asynchronous Materialized Views
- Fix the problem that when there is a CTE in the materialized view definition, it cannot be refreshed ([fix](mtmv) Fix refresh materialized view fail when mv def contains cte #44857).
- Fix the problem that when columns are added to the base table, the asynchronous materialized view cannot hit the transparent rewrite ([fix](mtmv) Fix mv rewrite fail when base table add column #44867).
- Fix the problem that when the same filter predicate is included in different positions in a query, the transparent rewrite fails ([fix](mtmv) Fix filter position different but same causing rewritten by materialized view fail #44575).
- Fix the problem that when column aliases are used in filter predicates or join predicates, the transparent rewrite cannot be performed ([fix](mtmv) Fix rewrite fail by materialized view when filter or join condition has alias #44779).
Inverted Index
- Fix the problem of abnormal handling of inverted index compaction [fix](inverted index) Modify Error Handling for File Open Failure #45773
- Fix the problem that inverted index construction fails due to lock - waiting timeout [improvement](build index)Optimize failed task check on same tablet (#42295) #43589
- Fix the problem of inverted index write crashes in abnormal situations [opt](inverted index)Optimize code to get rid of heap use after free (#45745) #46075
- Fix the null - pointer problem of the match function with special parameters [fix](inverted index) Fix Null Pointer Exception in function match #45774
- Fix problems related to the variant inverted index and disable the use of the index v1 format for variants [fix](variant) fix index in variant (#43375) #43971 [fix] (inverted index) Disallow variant columns from using inverted index format v1 (#43599) #45179
- Fix the problem of crashes when setting gram_size = 65535 for the ngram bloomfilter index [fix](ngram bloomfilter) fix narrow conversion for ngram bf_size #43480 #43654
- Fix the problem of incorrect calculation of DATE and DATETIME for the bloomfilter index [fix] (bloom filter) Fix the bloom filter calculation for date and datetime (#43351) #43622
- Fix the problem that dropping a column does not automatically drop the bloomfilter index [fix](bloom filter)Fix drop column with bloom filter index (#44361) #44478
- Reduce the memory footprint when writing the bloomfilter index [opt](bloomfilter index) optimize memory usage for bloom filter index writer #45833 #46047
Semi Structure Data
- Optimize memory usage and reduce the memory consumption of the variant data type [Opt](TabletSchema) reuse TabletColumn info to reduce mem (#42448) #43349, [opt](Variant) avoid unnecessary mem for variant extracted columns (#… #44585, [Opt](SegmentIterator) clear and release iterators memory footprint in advance when EOF (#44768) #45734
- Optimize the performance of variant schema copy [Optimize](Variant) optimize schema update performance (#45480) #45731
- Do not use variant as a key when automatically inferring tablet keys [Fix](Variant) create table should not automatically add variant to key #44736
- Fix the problem of changing variant from NOT NULL to NULL [Opt](SegmentIterator) clear and release iterators memory footprint in advance when EOF (#44768) #45734
- Fix the problem of incorrect type inference of lambda functions [fix](function) fixed some nested type func's param type which is not suitable and make result wrong #45798
- Fix the coredump problem at the boundary conditions of the ipv6_cidr_to_range function [fix](ip) fix ip nullable param without check (#44700) #46252
Query Optimizer
- Fix the potential deadlock problem caused by mutual exclusion of table read locks and optimize the lock - using logic ([opt](Nereids) lock table in ascending order of table IDs #45045 [fix](mtmv)fix mtmv deadlock issue #43376 [fix](mtmv) Fix get mv read lock too late when rewritten by materialized view #44164 [enhance](mtmv)Optimize MTMV lock logic #44967 [enhance](mtmv)When drop MTMV, no longer wait for task cancel to complete #45995).
- Fix the problem that the SQL Cache function incorrectly uses constant folding, resulting in incorrect results when using functions containing time formats ([fix](sql_cache) fix sql cache result wrong of from_unixtime(col, 'yyyy-MM-dd HH:mm:ss') #44631).
- Fix the problem of incorrect optimization of comparison expressions in edge cases, which may lead to incorrect results ([fix](Nereids) simplify comparison predicate do wrong cast #44054 [fix](nereids) fix months_add/ months_sub/ years_add/years_sub compute wrong result because SimplifyArithmeticComparisonRule #44725 branch-2.1: [fix](nereids) fix compare with long min for simplify comparison rule #44922 [fix](nereids) fix comparison with date like #45735 [fix](nereids) fix compare with date like overflow #45868).
- Fix the problem of incorrect audit logs for high - concurrent point queries [fix](auditlog) set isQuery to true when query is short circuited (#42647) #43345 [Improve](auditlog) audit log print real sql in prepared statement #44588
- Fix the problem of continuous error reporting after an exception occurs in high - concurrent point queries [Opt](ShortCircuit) opt some serialization and fix error when prepare… #44582
- Fix the problem of incorrect prepared statements for some fields [Fix](ShortCircuit) fix prepared statement with partial arguments prepared (#45371) #45732
Query Execution Engine
- Fix the problem of incorrect results of regular expressions and like functions for special characters. [fix](hyperscan) Fix hyper scan fall back to re2 #44547
- Fix the problem that the SQL Cache may have incorrect results when switching databases. [fix](cache) fix same sql return wrong result when switch database with
use db
and enable sql cache #44782 - Fix the problem of incorrect results of the cut_ipv6 function. [Bug](function) fix cut_ipv6 function error about modify the input column data #43921
- Fix the problem of casting from numeric types to bool types. [fix](DECIMAL) error DECIMAL cat to BOOLEAN (#44326) #46275
- Fix a series of problems related to arrow flight. [Fix](branch-2.1) Fix arrow-flight-sql to use pipeline #45661 [fix](arrow-flight-sql) Fix query result is empty and not return query error message #45023 [fix](arrow-flight-sql) Fix FE not found arrow flight schema #43960 [fix](arrow-flight-sql) Fix Doris NULL column conversion to arrow batch #43929
- Fix the problem of incorrect results in some cases when the hash table of hashjoin exceeds 4G. https://github.com/apache/doris/pull/46461/files
- Fix the overflow problem of the convert_to function for Chinese characters. [fix](mem) heap-buffer-overflow for function convert_to #46405
Storage Management
- Fix the problem that high - concurrent DDL may cause FE startup failure.
- Fix the problem that auto - increment columns may have duplicate values.
- Fix the problem that routine load cannot use the newly expanded BE during expansion.
Permission Management - Fix the problem of frequent access to the Ranger service when using Ranger as the authentication plugin ([fix](ranger) make RangerDorisAccessController as singleton to avoid more and more ranger policy refresher #45645).
Others
- Fix the potential memory leak problem when enable_jvm_monitor=true is enabled on the BE side ([fix](jvm)fix jvm metrics memory leak. #44311).
morningman
Metadata
Metadata
Assignees
Labels
kind/featureCategorizes issue or PR as related to a new feature.Categorizes issue or PR as related to a new feature.