Skip to content

[Good First Issue]StarRocks Hands-on Tasks 2024 #40894

Open
@wangsimo0

Description

Hi Rockstars,

This is a list of proposed Hands-on tasks. If you're new to StarRocks and eager to engage with the community, here are some issues that are well-suited for you to dive into :) These issues are suitable for gaining hands-on experience and becoming familiar with StarRocks development. Also this is an open list, you are welcome to propose more tasks.

Please @kateshaowanjou or @wangsimo0 to book the issue, and add a comment in the issue you picked, so the issue won't be assigned to others. And always discuss with the community about the design before actually developing, some of the issues are really big, don't hesitate to seek help from the community.

External Catalog related issues

Information Schema

External Catalog

In version 3.2 and later, StarRocks enhances compatibility with more BI tools by supporting the information_schema database in External Catalog. This feature serves as a valuable tool for obtaining structured information. While several views within information_schema currently return empty, efforts are underway to optimize support for these views to ensure comprehensive coverage.
StarRocks aligns with MySQL's pattern in supporting information_schema, as it follows the MySQL protocol. We better maintain the compatibility with MySQL, provide as much information as we can, and optimize for efficiency to minimize time consumption. consumed.

  • Columns view
  • Views view

Default Catalog

Trino's Compatibility Issues

In version 3.0 and later, StarRocks supports Trino's SQL_dialect mode; however, ongoing enhancements are necessary to further optimize this functionality.

New Functions

Function Mapping

  Trino's function/expression StarRocks' function/expression comment assginee
map_agg(key, value) → map<K,V> map()  @Jcnessss
show schemas from <catalog_name> Show databases from <catalog_name> #40868  
array_sort(array(T), function(T, T, int)) -> array(T) array_sortby(, array0 [, array1...]) This one needs to pay attention to the input order.
sequence(start, stop)sequence(start, stop, step)In integers data type array_generate([start,] end [, step])  
last_day_of_month(x) → date last_day(x,'month');  
map_from_entries(array(row(K, V))) -> map(K, V) map_from_arrays. This one needs to pay attention to the transformation. SELECT map_from_entries(ARRAY[(1, 'x'), (2, 'y')]); equals to SELECT map_from_arrays([1,2],['x','y']);
current_catalog catalog()   thanks to @macroguo-ghy
current_schema database()   thanks to @macroguo-ghy
slice(x, start, length) → array array_slice(input, offset, length)  
approx_set(x) → HyperLogLog HLL_HASH(column_name)  
empty_approx_set() → HyperLogLog HLL_EMPTY()  
merge(HyperLogLog) → HyperLogLog HLL_RAW_AGG(hll)  

Other Enhancements

  • Apache Ranger's policy translator

StarRocks support using Hive service in Ranger to control access towards hive tables. However we discover there are still some community users want to manage all the privs in StarRocks ranger service. So we need a translator(maybe a script)

  • Add catalog information in FE's query_detail @happut

After enabling collect query details using admin set frontend config("enable_collect_query_detail_info"="true") user can get query detail using curl -uroot: http://172.26.81.138:8030/api/query_detail?event_time=<unixtimestamp_value> , the information is like ...."database":"simo","sql":"insert into abc values (1,2),(2,3)","user":"root"....
There is no catalog information. Like "catalog":"defaut_catalog"

Apache Hudi & Delta Lake Capabilities

  • Add Hudi sink (✨ HIGH priority)
  • Add Delta Lake sink (✨ HIGH priority)

More Connectors

  • Oracle catalog
  • Kudu catalog @predator4ann
  • StarRocks catalog
  • Greenplum catalog
  • SQLSever catalog
  • Clickhouse catalog
  • Trino catalog
  • DB2 catalog
  • Druid catalog
  • Oceanbase catalog
  • SAP Hana catalog

More Capabilities

  • Hive UDF compatible
  • Spark SQL compatible structure
  • Hive SQL compatible structure
  • Impala SQL compatible structure

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions