-
Notifications
You must be signed in to change notification settings - Fork 546
[GLUTEN-1897][CH] Respect hdfs-site.xml configs for clickhouse libhdfs3 #1900
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks for opening a pull request! Could you open an issue for this pull request on Github Issues? https://github.com/oap-project/gluten/issues Then could you also rename commit message and pull request title in the following format? See also: |
|
Run Gluten Clickhouse CI |
|
Firstly we apply 4 spark hadoop configs to clickhouse backend:
Above spark.hadoop.* configs maybe from spark-submit configs or hdfs-site.xml file(the former has higher priority). If no config key is specified, default value is applied in CH libhdfs3. |
|
Run Gluten Clickhouse CI |
1 similar comment
|
Run Gluten Clickhouse CI |
|
@zzcclp do you hive time to review this pr, thanks! |
|
download files failed, can you help retrigger the failed uts @zzcclp |
backends-clickhouse/src/main/java/io/glutenproject/vectorized/CHNativeExpressionEvaluator.java
Outdated
Show resolved
Hide resolved
|
@taiyang-li Could you rebase to the main branch? This issue should be fixed. |
|
Run Gluten Clickhouse CI |
|
Run Gluten Clickhouse CI |
zzcclp
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…s3 (apache#1900) Respect hdfs-site.xml configs for clickhouse libhdfs3
* [VL] CI: Set environment variables (e.g. http proxy settings) for all shell types (#1924) * [CH] Enable ch DecimalExpressionSuite (#1426) Enable ch DecimalExpressionSuite * [VL][BUILD] Update vcpkg scripts for development workflow (#1913) * Fix missing vcpkg depends for velox stack trace Add depened libelf Downgrade libdwarf to 20210528 Upgrade folly to 2022.11.14.00 * Enable debug build for vcpkg * Ignore clangd * Upgrade maven version for vcpkg setup script * [UT] Fix a config overwritten issue in UT (#1936) * [MINOR] change logInfo to logDebug in RowToArrowColumnarExec#javaConvert (#1915) * [GLUTEN-1780][CH] Add CH backend UT statistic (#1781) * Add UT statistic * add more expression ut * fix review code * fix style * fix rebase * fix review * [GLUTEN-1934][CH] separate spark shims from gluten jar (#1935) * [GLUTEN-1928][Core] Support to add the custom expressions transformer by Spark conf (#1937) Support to add the custom expressions transformer by Spark conf: Add a spark.gluten.sql.columnar.extended.expressions.transformer to specify the extended expression transformer class; Close #1928. * [GLUTEN-1929][CH][TEST]Fix: Enable C++ Unit test (#1930) * [GLUTEN-1929][CH][TEST]Fix: Enable C++ Unit test * split benchmark testing and unit testing * set ENABLE_CPP_TEST to OFF, so that don't affect benchmark pipeline * Fix Build * it looks like there is an OOM issue when building benchmark * [GLUTEN-1932][CH] Support excel format parser (#1940) * [GLUTEN-1897][CH] Respect hdfs-site.xml configs for clickhouse libhdfs3 (#1900) Respect hdfs-site.xml configs for clickhouse libhdfs3 * [GLUTEN-1336][VL] add more spark3.3 UT (#1846) This patch adds more spark 3.3 unit tests to gluten. The excluded ones will be fixed in a following patch. ``` - Various partition value types - Various inferred partition value types - Various partition value types - Various inferred partition value types - SPARK-32908: maximum target error in percentile_approx - SPARK-36825, SPARK-36854: year-month/day-time intervals written and read as INT32/INT64 - support batch reads for schema - SPARK-36182: read TimestampNTZ as TimestampLTZ - SPARK-36797: Union should resolve nested columns as top-level columns - SPARK-37371: UnionExec should support columnar if all children support columnar - SPARK-36280: Remove redundant aliases after RewritePredicateSubquery - SPARK-36182: read TimestampNTZ as TimestampLTZ - SPARK-39833: pushed filters with project without filter columns - SPARK-36825, SPARK-36854: year-month/day-time intervals written and read as INT32/INT64 - support batch reads for schema - SPARK-36794: Ignore duplicated key when building relation for semi/anti hash join ``` * [VL][BUILD] Pop!_OS as Ubuntu alias * [VL] Doc: Update VeloxNotSupport.md about limitations of spilling * Workaround: set ENABLE_CAPNP to OFF for working with latest clickhouse code see ClickHouse/ClickHouse#50963. (#1960) we can reopen it either #50963 is fixed or upgrading compitler to clang 16 * [VL] Refactor VeloxMemoryPool and VeloxInitializer (#1952) * [GLUTEN-1957][CH] Add -am param to build dependent modules of shims/spark33 (#1958) * [GLUTEN-1336][VL] CI: move slow tests into another job for Spark3.3 (#1961) * CI: move slow tests into another job for Spark3.3 * [GLUTEN-1964][CH]Fix: support non-ASCIIString (#1965) * [VL][Doc] Add a troubleshooting document (#1971) * [GLUTEN-1980][DOCS] Update the documentation for compiling Gluten+Velox with Docker (#1981) - add missing sudo installation on centos7 - update maven to 3.8.8 as 3.6.3 binary is removed - add missing instructions on enable HDFS * [GLUTEN-1947][VL] Parquet should respect user-specified write options (#1948) * [GLUTEN-1725][CH] Fix temporary storage path to be fixed as '/tmp/libch' (#1737) * [GLUTEN-1725][CH] Change temporary storage path from '/tmp/libch' to the current working dir * fix * fix * [VL] [BUILD] Allow use custom ARROW_HOME (#1967) * [GLUTEN-1928][Core] Followup: fix the custom expressions transformer ut bug (#1969) Support to add the custom expressions transformer by Spark conf: Add a spark.gluten.sql.columnar.extended.expressions.transformer to specify the extended expression transformer class; * [GLUTEN-1632][CH]Daily Update Clickhouse Version (20230616) (#1984) Co-authored-by: kyligence-git <gluten@kyligence.io> * [GLUTEN-1938][CH] Fix hive reader read empty as null (#1993) * minor, add jvm xss configuration in gluten-ut to avoid stackoverflow (#1987) * [UT][VL] Exclude unstable test (#1996) * add config use_current_directory_as_tmp (#1992) * [GLUTEN-1898][CH] S3 client support per bucket configs and support as… (#1899) * [GLUTEN-1898][CH] S3 client support per bucket configs and support assume role access * fix bug * update clickhouse commit id * fix stash * allow endpoint not contain https prefix * [GLUTEN-2002][VL] Build: add boost sort lib in static linking job (#1995) * add missing boost sort lib in static linking job Signed-off-by: Yuan Zhou <yuan.zhou@intel.com> * trigger github tests if change dev scripts Signed-off-by: Yuan Zhou <yuan.zhou@intel.com> --------- Signed-off-by: Yuan Zhou <yuan.zhou@intel.com> * [VL] Disable printing conf by default (#1968) * [GLUTEN-1963][CH] fix clang16 compile error (#2003) * [GLUTEN-1500][VL] Integrate with Velox arbitration API (#1741) Co-authored-by: Hongze Zhang <hongze.zhang@intel.com> * [VL] fix the crash problem caused by SIMD instructions execution (#1990) * Memory Aligned alloc --------- Co-authored-by: zuochunwei <zuochunwei@meituan.com> * [GLUTEN-1945] Support summarizing the supported spark built-in functions (#1946) * [GLUTEN-CORE][VL] Minor refactor c2r codes to improve readability (#2010) * [GLUTEN-1476][VL] Enable scan on struct and map types (#1824) * [VL] Adopt Spark local-cluster run mode in gluten-it (#1988) * [VL]Doc: How to prioritize loading Gluten jars in Spark (#1933) * Revert "[VL] fix the crash problem caused by SIMD instructions execution (#1990)" (#2016) This reverts commit 35cfbf3. * [GLUTEN-1972][CORE] Log gluten build info (#1973) This patch logs the current build information, thus developers can find the binary is based on which commit and also the compilation env. ====================================== Gluten build info: Gluten version: 0.5.0-SNAPSHOT GCC version: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0 Java version: 1.8.0_362 Scala version: 2.12.15 Spark version: 3.3.1 Hadoop version: 2.7.4 Build branch: build-info Build revision: 0e21fe8a96ff16973436a1b094e5c50be7c19a52 Build revision time: 2023-06-15 20:02:22 +0800 Build date: 2023-06-15T12:03:39Z Velox branch: main Velox revision: f258c6f8e2b8efa7a55a811151715cadd9cf50c3 Velox revision time: 2023-05-30 11:23:21 +0800 ====================================== * [GLUTEN-2025][VL] disable sort in window operations * Minor: upgrade guava to 32.0.1-jre (#2026) This patch upgrades guava to 32.0.1-jre to fix the security issue Signed-off-by: Yuan Zhou <yuan.zhou@intel.com> * Fix get velox code from branch-1.0 --------- Signed-off-by: Yuan Zhou <yuan.zhou@intel.com> Co-authored-by: Hongze Zhang <hongze.zhang@intel.com> Co-authored-by: Shuai li <loneylee@live.cn> Co-authored-by: Lingfeng Zhang <lingfeng.zhang@intel.com> Co-authored-by: PHILO-HE <feilong.he@intel.com> Co-authored-by: leesf <490081539@qq.com> Co-authored-by: Wenzheng Liu <lwz9103@163.com> Co-authored-by: Zhichao Zhang <zhangzc@apache.org> Co-authored-by: Chang chen <chang.chen@kyligence.io> Co-authored-by: 李扬 <654010905@qq.com> Co-authored-by: Yan Ma <yan.ma@intel.com> Co-authored-by: Cheng Pan <pan3793@gmail.com> Co-authored-by: zuochunwei <zchw100@qq.com> Co-authored-by: Yuan <yuan.zhou@intel.com> Co-authored-by: wang-zhun <61445191+wang-zhun@users.noreply.github.com> Co-authored-by: ulysses <ulyssesyou18@gmail.com> Co-authored-by: exmy <xumovens@gmail.com> Co-authored-by: Yang Zhang <34979747+Yohahaha@users.noreply.github.com> Co-authored-by: kyligence-git <gluten@kyligence.io> Co-authored-by: Rui Mo <rui.mo@intel.com> Co-authored-by: Hongbin Ma <mahongbin@apache.org> Co-authored-by: Kerwin Zhang <xiyu.zk@alibaba-inc.com> Co-authored-by: zuochunwei <zuochunwei@meituan.com> Co-authored-by: JiaKe <ke.a.jia@intel.com> Co-authored-by: Zhen Li <10524738+zhli1142015@users.noreply.github.com>


What changes were proposed in this pull request?
(Fixes: #1898)