[Good First Issue]StarRocks Hands-on Tasks 2024 #40894
Description
Hi Rockstars,
This is a list of proposed Hands-on tasks. If you're new to StarRocks and eager to engage with the community, here are some issues that are well-suited for you to dive into :) These issues are suitable for gaining hands-on experience and becoming familiar with StarRocks development. Also this is an open list, you are welcome to propose more tasks.
Please @kateshaowanjou or @wangsimo0 to book the issue, and add a comment in the issue you picked, so the issue won't be assigned to others. And always discuss with the community about the design before actually developing, some of the issues are really big, don't hesitate to seek help from the community.
External Catalog related issues
Information Schema
External Catalog
In version 3.2 and later, StarRocks enhances compatibility with more BI tools by supporting the information_schema database in External Catalog. This feature serves as a valuable tool for obtaining structured information. While several views within information_schema currently return empty, efforts are underway to optimize support for these views to ensure comprehensive coverage.
StarRocks aligns with MySQL's pattern in supporting information_schema, as it follows the MySQL protocol. We better maintain the compatibility with MySQL, provide as much information as we can, and optimize for efficiency to minimize time consumption. consumed.
- Columns view
- Views view
Default Catalog
- View in information_schema.tables_config [Enhancement] VIEW in information_schema.tables_config #49447
Trino's Compatibility Issues
In version 3.0 and later, StarRocks supports Trino's SQL_dialect mode; however, ongoing enhancements are necessary to further optimize this functionality.
New Functions
- inverse_normal_cdf and normal_cdf @amoghmargoor
[function] inverse_normal_cdf and normal_cd #38989 - typeof @MicePilot
[function] typeof #36245 - regexp_split
Better Trino compatibility request #37089 - boolor_agg,boolxor_agg,booland_agg
Support for BOOLOR_AGG, BOOLXOR_AGG, BOOLAND_AGG aggregate function #22949 - from_iso8601_date(string),from_iso8601_timestamp(string)
[function]from_iso8601_date,from_iso8601_timestamp #40877 - array_agg in window function
[window function] array_agg in window function #40881 @mygrsun - cardinality in HLL data type
[function]cardinality(HLL) #40879 - count(distinct) window function
[Feature Request] windows funciton support statement count(distinct col) over #46105 @yangzho12138 - from_unixtime_milliseconds
[function] Support millisecond unix to datetime #48634 - inet_aton
[Function] inet_acton. #49664 - array_generate to support date/datetime as input and interval as step
[Function enhancement] array_generate to support date/datetime as input and interval as step. #49575 - least and greatest function enhancement
[Feature] least and greatest function enhanced #50570 - Use qualify row_number() ... not support select *
Use qualify row_number() ... not support select * #51703 - min(x, n), max(x, n)
Support trino min(x, n), max(x,n) function #52591 - strops enhancement
Suport trino strpos(string, substring, instance) function #52604 - regexp_count
Support trino regexp_count(string, pattern) function #52603 - try
Support Trino try expression #54268
Function Mapping
Trino's function/expression | StarRocks' function/expression | comment | assginee | |
---|---|---|---|---|
map_agg(key, value) → map<K,V> | map() | @Jcnessss | ||
show schemas from <catalog_name> | Show databases from <catalog_name> #40868 | |||
array_sort(array(T), function(T, T, int)) -> array(T) | array_sortby(, array0 [, array1...]) | This one needs to pay attention to the input order. | ||
sequence(start, stop)sequence(start, stop, step)In integers data type | array_generate([start,] end [, step]) | |||
last_day_of_month(x) → date | last_day(x,'month'); | |||
map_from_entries(array(row(K, V))) -> map(K, V) | map_from_arrays. | This one needs to pay attention to the transformation. SELECT map_from_entries(ARRAY[(1, 'x'), (2, 'y')]); equals to SELECT map_from_arrays([1,2],['x','y']); | ||
current_catalog | catalog() | thanks to @macroguo-ghy | ||
current_schema | database() | thanks to @macroguo-ghy | ||
slice(x, start, length) → array | array_slice(input, offset, length) | |||
approx_set(x) → HyperLogLog | HLL_HASH(column_name) | |||
empty_approx_set() → HyperLogLog | HLL_EMPTY() | |||
merge(HyperLogLog) → HyperLogLog | HLL_RAW_AGG(hll) |
Other Enhancements
- Apache Ranger's policy translator
StarRocks support using Hive service in Ranger to control access towards hive tables. However we discover there are still some community users want to manage all the privs in StarRocks ranger service. So we need a translator(maybe a script)
- Add catalog information in FE's query_detail @happut
After enabling collect query details using admin set frontend config("enable_collect_query_detail_info"="true") user can get query detail using curl -uroot: http://172.26.81.138:8030/api/query_detail?event_time=<unixtimestamp_value> , the information is like ...."database":"simo","sql":"insert into abc values (1,2),(2,3)","user":"root"....
There is no catalog information. Like "catalog":"defaut_catalog"
Apache Hudi & Delta Lake Capabilities
- Add Hudi sink (✨ HIGH priority)
- Add Delta Lake sink (✨ HIGH priority)
More Connectors
- Oracle catalog
- Kudu catalog @predator4ann
- StarRocks catalog
- Greenplum catalog
- SQLSever catalog
- Clickhouse catalog
- Trino catalog
- DB2 catalog
- Druid catalog
- Oceanbase catalog
- SAP Hana catalog
More Capabilities
- Hive UDF compatible
- Spark SQL compatible structure
- Hive SQL compatible structure
- Impala SQL compatible structure