如果有limit的情况下,根据source row count来判断是否一次取数 #96

hn5092 · 2020-01-07T02:07:11Z

https://github.com/Kyligence/KAP/issues/17384 因为bitmap很大但是行数很少,如果每次还是一个分区一个分区取数很导致查询变慢

…in parquet ### What changes were proposed in this pull request? Spark should remove check field name when reading/writing parquet files. ### Why are the changes needed? Support spark reading existing parquet files with special chars in column names. ### Does this PR introduce _any_ user-facing change? Such as parquet, user can use spark to read existing files with special chars in column names. And then can use back quote to wrap special column name such as &Kyligence#96;max(t)&Kyligence#96; or use &Kyligence#96;max(t)&Kyligence#96; as &Kyligence#96;max_t&Kyligence#96;, then user can use `max_t`. ### How was this patch tested? Added UT Closes apache#35229 from AngersZhuuuu/SPARK-27442. Authored-by: Angerszhuuuu <angers.zhu@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

hn5092 self-assigned this Jan 7, 2020

hn5092 added a commit that referenced this issue Jan 13, 2020

#96 optimiize limit when query contain count distinct measure

89ad847

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

如果有limit的情况下,根据source row count来判断是否一次取数 #96

如果有limit的情况下,根据source row count来判断是否一次取数 #96

hn5092 commented Jan 7, 2020

如果有limit的情况下,根据source row count来判断是否一次取数 #96

如果有limit的情况下,根据source row count来判断是否一次取数 #96

Comments

hn5092 commented Jan 7, 2020