SPARK-1121 Only add avro if the build is for Hadoop 0.23.X and SPARK_YARN is set by ScrapCodes · Pull Request #6 · apache/spark

ScrapCodes · 2014-02-26T13:01:48Z

No description provided.

AmplabJenkins · 2014-02-26T18:16:22Z

Merged build triggered.

AmplabJenkins · 2014-02-26T18:16:22Z

Merged build started.

pwendell · 2014-02-26T19:42:34Z

docs/building-with-maven.md

Mind being more specific here. "You will have to manually add a dependency on (org.apache.avro, avro, 1.7.4)."

pwendell · 2014-02-26T21:53:01Z

Overall looks good but gave some minor comments.

pwendell · 2014-02-26T22:14:51Z

Jenkins, test this please.

AmplabJenkins · 2014-02-26T22:18:50Z

Build triggered.

AmplabJenkins · 2014-02-26T22:18:50Z

Build started.

pwendell · 2014-02-26T22:19:07Z

project/SparkBuild.scala

Would you mind restructuring this to be called maybeAvro and have it return a sequence of dependencies (that might be empty)? I'm just asking because @sryza will need to do something similar for Hadoop dependencies and it will be cleaner to have something like:

libraryDependencies ++= maybeAvro libraryDependencies ++= maybeHadoop

rather than a bunch of if statements.

pwendell · 2014-02-26T22:19:59Z

@ScrapCodes thanks for looking into this! Added some suggestions inline.

AmplabJenkins · 2014-02-26T22:46:39Z

Build finished.

AmplabJenkins · 2014-02-26T22:46:39Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/12887/

AmplabJenkins · 2014-02-26T22:53:54Z

Build triggered.

Merge 0.8.0-candidate-csd branch to master-csd

SPY-287 updated streaming iterable

Minor changes to get more tests passing.

ScrapCodes · 2014-02-27T05:26:59Z

project/SparkBuild.scala

No, since sbt does not have it by default thought we can have it for convenience.

AmplabJenkins · 2014-02-27T05:38:55Z

Build triggered.

AmplabJenkins · 2014-02-27T05:38:55Z

Build started.

AmplabJenkins · 2014-02-27T05:39:03Z

Build triggered.

AmplabJenkins · 2014-02-27T05:39:59Z

Build finished.

AmplabJenkins · 2014-02-27T05:40:00Z

One or more automated tests failed
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/12904/

AmplabJenkins · 2014-02-27T05:58:55Z

Build triggered.

AmplabJenkins · 2014-02-27T05:58:55Z

Build started.

AmplabJenkins · 2014-02-27T05:59:03Z

Build triggered.

AmplabJenkins · 2014-02-27T06:27:37Z

Build finished.

AmplabJenkins · 2014-02-27T06:27:38Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/12905/

pwendell · 2014-02-27T06:54:49Z

Thanks @ScrapCodes looks good.

pwendell · 2014-02-27T06:55:13Z

hmm... apears it does not merge cleanly

…YARN is set

…nce-enhancement change executors requests policy

### What changes were proposed in this pull request? This PR aims to fix `semanticEquals` works correctly on `GetMapValue` expressions having literal maps with `ArrayBasedMapData` and `GenericArrayData`. ### Why are the changes needed? This is a regression from Apache Spark 1.6.x. ```scala scala> sc.version res1: String = 1.6.3 scala> sqlContext.sql("SELECT map('k1', 'v1')[k] FROM t GROUP BY map('k1', 'v1')[k]").show +---+ |_c0| +---+ | v1| +---+ ``` Apache Spark 2.x ~ 3.0.1 raise`RuntimeException` for the following queries. ```sql CREATE TABLE t USING ORC AS SELECT map('k1', 'v1') m, 'k1' k SELECT map('k1', 'v1')[k] FROM t GROUP BY 1 SELECT map('k1', 'v1')[k] FROM t GROUP BY map('k1', 'v1')[k] SELECT map('k1', 'v1')[k] a FROM t GROUP BY a ``` **BEFORE** ```scala Caused by: java.lang.RuntimeException: Couldn't find k#3 in [keys: [k1], values: [v1][k#3]#6] at scala.sys.package$.error(package.scala:27) at org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1$$anonfun$applyOrElse$1.apply(BoundAttribute.scala:85) at org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1$$anonfun$applyOrElse$1.apply(BoundAttribute.scala:79) at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:52) ``` **AFTER** ```sql spark-sql> SELECT map('k1', 'v1')[k] FROM t GROUP BY 1; v1 Time taken: 1.278 seconds, Fetched 1 row(s) spark-sql> SELECT map('k1', 'v1')[k] FROM t GROUP BY map('k1', 'v1')[k]; v1 Time taken: 0.313 seconds, Fetched 1 row(s) spark-sql> SELECT map('k1', 'v1')[k] a FROM t GROUP BY a; v1 Time taken: 0.265 seconds, Fetched 1 row(s) ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs with the newly added test case. Closes #30246 from dongjoon-hyun/SPARK-33338. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com> (cherry picked from commit 42c0b17) Signed-off-by: Dongjoon Hyun <dhyun@apple.com>

### What changes were proposed in this pull request? This PR aims to fix `semanticEquals` works correctly on `GetMapValue` expressions having literal maps with `ArrayBasedMapData` and `GenericArrayData`. ### Why are the changes needed? This is a regression from Apache Spark 1.6.x. ```scala scala> sc.version res1: String = 1.6.3 scala> sqlContext.sql("SELECT map('k1', 'v1')[k] FROM t GROUP BY map('k1', 'v1')[k]").show +---+ |_c0| +---+ | v1| +---+ ``` Apache Spark 2.x ~ 3.0.1 raise`RuntimeException` for the following queries. ```sql CREATE TABLE t USING ORC AS SELECT map('k1', 'v1') m, 'k1' k SELECT map('k1', 'v1')[k] FROM t GROUP BY 1 SELECT map('k1', 'v1')[k] FROM t GROUP BY map('k1', 'v1')[k] SELECT map('k1', 'v1')[k] a FROM t GROUP BY a ``` **BEFORE** ```scala Caused by: java.lang.RuntimeException: Couldn't find k#3 in [keys: [k1], values: [v1][k#3]#6] at scala.sys.package$.error(package.scala:27) at org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1$$anonfun$applyOrElse$1.apply(BoundAttribute.scala:85) at org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1$$anonfun$applyOrElse$1.apply(BoundAttribute.scala:79) at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:52) ``` **AFTER** ```sql spark-sql> SELECT map('k1', 'v1')[k] FROM t GROUP BY 1; v1 Time taken: 1.278 seconds, Fetched 1 row(s) spark-sql> SELECT map('k1', 'v1')[k] FROM t GROUP BY map('k1', 'v1')[k]; v1 Time taken: 0.313 seconds, Fetched 1 row(s) spark-sql> SELECT map('k1', 'v1')[k] a FROM t GROUP BY a; v1 Time taken: 0.265 seconds, Fetched 1 row(s) ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs with the newly added test case. Closes #30246 from dongjoon-hyun/SPARK-33338. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>

### What changes were proposed in this pull request? This PR aims to fix `semanticEquals` works correctly on `GetMapValue` expressions having literal maps with `ArrayBasedMapData` and `GenericArrayData`. ### Why are the changes needed? This is a regression from Apache Spark 1.6.x. ```scala scala> sc.version res1: String = 1.6.3 scala> sqlContext.sql("SELECT map('k1', 'v1')[k] FROM t GROUP BY map('k1', 'v1')[k]").show +---+ |_c0| +---+ | v1| +---+ ``` Apache Spark 2.x ~ 3.0.1 raise`RuntimeException` for the following queries. ```sql CREATE TABLE t USING ORC AS SELECT map('k1', 'v1') m, 'k1' k SELECT map('k1', 'v1')[k] FROM t GROUP BY 1 SELECT map('k1', 'v1')[k] FROM t GROUP BY map('k1', 'v1')[k] SELECT map('k1', 'v1')[k] a FROM t GROUP BY a ``` **BEFORE** ```scala Caused by: java.lang.RuntimeException: Couldn't find k#3 in [keys: [k1], values: [v1][k#3]#6] at scala.sys.package$.error(package.scala:27) at org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1$$anonfun$applyOrElse$1.apply(BoundAttribute.scala:85) at org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1$$anonfun$applyOrElse$1.apply(BoundAttribute.scala:79) at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:52) ``` **AFTER** ```sql spark-sql> SELECT map('k1', 'v1')[k] FROM t GROUP BY 1; v1 Time taken: 1.278 seconds, Fetched 1 row(s) spark-sql> SELECT map('k1', 'v1')[k] FROM t GROUP BY map('k1', 'v1')[k]; v1 Time taken: 0.313 seconds, Fetched 1 row(s) spark-sql> SELECT map('k1', 'v1')[k] a FROM t GROUP BY a; v1 Time taken: 0.265 seconds, Fetched 1 row(s) ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs with the newly added test case. Closes #30246 from dongjoon-hyun/SPARK-33338. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com> (cherry picked from commit 42c0b17) Signed-off-by: Dongjoon Hyun <dhyun@apple.com>

### What changes were proposed in this pull request? Skip capture maven repo config for views. ### Why are the changes needed? Due to the bad network, we always use the thirdparty maven repo to run test. e.g., ``` build/sbt "test:testOnly *SQLQueryTestSuite" -Dspark.sql.maven.additionalRemoteRepositories=xxxxx ``` It's failed with such error msg ``` [info] - show-tblproperties.sql *** FAILED *** (128 milliseconds) [info] show-tblproperties.sql [info] Expected "...rredTempViewNames [][]", but got "...rredTempViewNames [][ [info] view.sqlConfig.spark.sql.maven.additionalRemoteRepositories xxxxx]" Result did not match for query #6 [info] SHOW TBLPROPERTIES view (SQLQueryTestSuite.scala:464) ``` It's not necessary to capture the maven config to view since it's a session level config. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? manual test pass ``` build/sbt "test:testOnly *SQLQueryTestSuite" -Dspark.sql.maven.additionalRemoteRepositories=xxx ``` Closes #31856 from ulysses-you/skip-maven-config. Authored-by: ulysses-you <ulyssesyou18@gmail.com> Signed-off-by: Kent Yao <yao@apache.org>

backport [#31856](#31856) for branch-3.1 ### What changes were proposed in this pull request? Skip capture maven repo config for views. ### Why are the changes needed? Due to the bad network, we always use the thirdparty maven repo to run test. e.g., ``` build/sbt "test:testOnly *SQLQueryTestSuite" -Dspark.sql.maven.additionalRemoteRepositories=xxxxx ``` It's failed with such error msg ``` [info] - show-tblproperties.sql *** FAILED *** (128 milliseconds) [info] show-tblproperties.sql [info] Expected "...rredTempViewNames [][]", but got "...rredTempViewNames [][ [info] view.sqlConfig.spark.sql.maven.additionalRemoteRepositories xxxxx]" Result did not match for query #6 [info] SHOW TBLPROPERTIES view (SQLQueryTestSuite.scala:464) ``` It's not necessary to capture the maven config to view since it's a session level config. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? manual test pass ``` build/sbt "test:testOnly *SQLQueryTestSuite" -Dspark.sql.maven.additionalRemoteRepositories=xxx ``` Closes #31856 from ulysses-you/skip-maven-config. Authored-by: ulysses-you <ulyssesyou18gmail.com> Signed-off-by: Kent Yao <yaoapache.org> Closes #31879 from ulysses-you/SPARK-34766-3-1. Authored-by: ulysses-you <ulyssesyou18@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

backport [apache#31856](apache#31856) for branch-3.1 ### What changes were proposed in this pull request? Skip capture maven repo config for views. ### Why are the changes needed? Due to the bad network, we always use the thirdparty maven repo to run test. e.g., ``` build/sbt "test:testOnly *SQLQueryTestSuite" -Dspark.sql.maven.additionalRemoteRepositories=xxxxx ``` It's failed with such error msg ``` [info] - show-tblproperties.sql *** FAILED *** (128 milliseconds) [info] show-tblproperties.sql [info] Expected "...rredTempViewNames [][]", but got "...rredTempViewNames [][ [info] view.sqlConfig.spark.sql.maven.additionalRemoteRepositories xxxxx]" Result did not match for query apache#6 [info] SHOW TBLPROPERTIES view (SQLQueryTestSuite.scala:464) ``` It's not necessary to capture the maven config to view since it's a session level config. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? manual test pass ``` build/sbt "test:testOnly *SQLQueryTestSuite" -Dspark.sql.maven.additionalRemoteRepositories=xxx ``` Closes apache#31856 from ulysses-you/skip-maven-config. Authored-by: ulysses-you <ulyssesyou18gmail.com> Signed-off-by: Kent Yao <yaoapache.org> Closes apache#31879 from ulysses-you/SPARK-34766-3-1. Authored-by: ulysses-you <ulyssesyou18@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

…le 'xxx' (#6)

…6327 VINITUS-241 patch SPARK-36327

* initial change of grammar to support string collation * initial change of grammar to support string collation

…to the `hive-thriftserver` module to fix the Maven daily test ### What changes were proposed in this pull request? This pr add bouncycastle-related test dependencies to the `hive-thrift` module to fix the Maven daily test. ### Why are the changes needed? `sql-on-files.sql` added the following statement in #47480, which caused the Maven daily test to fail https://github.com/apache/spark/blob/2363aec0c14ead24ade2bfa23478a4914f179c00/sql/core/src/test/resources/sql-tests/inputs/sql-on-files.sql#L10 - https://github.com/apache/spark/actions/runs/10094638521/job/27943309504 - https://github.com/apache/spark/actions/runs/10095571472/job/27943298802 ``` - sql-on-files.sql *** FAILED *** "" did not contain "Exception" Exception did not match for query #6 CREATE TABLE sql_on_files.test_orc USING ORC AS SELECT 1, expected: , but got: java.sql.SQLException org.apache.hive.service.cli.HiveSQLException: Error running query: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 8542.0 failed 1 times, most recent failure: Lost task 0.0 in stage 8542.0 (TID 8594) (localhost executor driver): java.lang.NoClassDefFoundError: org/bouncycastle/jce/provider/BouncyCastleProvider at test.org.apache.spark.sql.execution.datasources.orc.FakeKeyProvider$Factory.createProvider(FakeKeyProvider.java:127) at org.apache.hadoop.crypto.key.KeyProviderFactory.get(KeyProviderFactory.java:96) at org.apache.hadoop.crypto.key.KeyProviderFactory.getProviders(KeyProviderFactory.java:68) at org.apache.orc.impl.HadoopShimsCurrent.createKeyProvider(HadoopShimsCurrent.java:97) at org.apache.orc.impl.HadoopShimsCurrent.getHadoopKeyProvider(HadoopShimsCurrent.java:131) at org.apache.orc.impl.CryptoUtils$HadoopKeyProviderFactory.create(CryptoUtils.java:158) at org.apache.orc.impl.CryptoUtils.getKeyProvider(CryptoUtils.java:141) at org.apache.orc.impl.WriterImpl.setupEncryption(WriterImpl.java:1015) at org.apache.orc.impl.WriterImpl.<init>(WriterImpl.java:164) at org.apache.orc.OrcFile.createWriter(OrcFile.java:1078) at org.apache.spark.sql.execution.datasources.orc.OrcOutputWriter.<init>(OrcOutputWriter.scala:49) at org.apache.spark.sql.execution.datasources.orc.OrcFileFormat$$anon$1.newInstance(OrcFileFormat.scala:89) at org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.newOutputWriter(FileFormatDataWriter.scala:180) at org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.<init>(FileFormatDataWriter.scala:165) at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:391) at org.apache.spark.sql.execution.datasources.WriteFilesExec.$anonfun$doExecuteWrite$1(WriteFiles.scala:107) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:901) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:901) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:374) at org.apache.spark.rdd.RDD.iterator(RDD.scala:338) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93) at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:171) at org.apache.spark.scheduler.Task.run(Task.scala:146) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$5(Executor.scala:644) at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64) at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:99) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:647) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) at java.base/java.lang.Thread.run(Thread.java:840) Caused by: java.lang.ClassNotFoundException: org.bouncycastle.jce.provider.BouncyCastleProvider at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:641) at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188) at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525) ... 32 more ``` Because we have configured `hadoop.security.key.provider.path` as `test:///` in the parent `pom.xml`, https://github.com/apache/spark/blob/5ccf9ba958f492c1eb4dde22a647ba75aba63d8e/pom.xml#L3165-L3166 `KeyProviderFactory#getProviders` will use `FakeKeyProvider$Factory` to create instances of `FakeKeyProvider`. https://github.com/apache/spark/blob/5ccf9ba958f492c1eb4dde22a647ba75aba63d8e/sql/core/src/test/resources/META-INF/services/org.apache.hadoop.crypto.key.KeyProviderFactory#L18 During the initialization of `FakeKeyProvider`, it first initializes its superclass `org.apache.hadoop.crypto.key.KeyProvider`, which leads to the loading of the `BouncyCastleProvider` class. Therefore, we need to add bouncycastle-related test dependencies in the `hive-thrift` module. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Manual Test with this pr. ``` build/mvn -Phive -Phive-thriftserver clean install -DskipTests build/mvn -Phive -Phive-thriftserver clean install -Dtest=none -DwildcardSuites=org.apache.spark.sql.hive.thriftserver.ThriftServerQueryTestSuite -pl sql/hive-thriftserver ``` ``` Run completed in 6 minutes, 52 seconds. Total number of tests run: 243 Suites: completed 2, aborted 0 Tests: succeeded 243, failed 0, canceled 0, ignored 20, pending 0 All tests passed. ``` ### Was this patch authored or co-authored using generative AI tooling? No Closes #47496 from LuciferYang/thrift-bouncycastle. Authored-by: yangjie01 <yangjie01@baidu.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>

…to the `hive-thriftserver` module to fix the Maven daily test ### What changes were proposed in this pull request? This pr add bouncycastle-related test dependencies to the `hive-thrift` module to fix the Maven daily test. ### Why are the changes needed? `sql-on-files.sql` added the following statement in apache#47480, which caused the Maven daily test to fail https://github.com/apache/spark/blob/2363aec0c14ead24ade2bfa23478a4914f179c00/sql/core/src/test/resources/sql-tests/inputs/sql-on-files.sql#L10 - https://github.com/apache/spark/actions/runs/10094638521/job/27943309504 - https://github.com/apache/spark/actions/runs/10095571472/job/27943298802 ``` - sql-on-files.sql *** FAILED *** "" did not contain "Exception" Exception did not match for query apache#6 CREATE TABLE sql_on_files.test_orc USING ORC AS SELECT 1, expected: , but got: java.sql.SQLException org.apache.hive.service.cli.HiveSQLException: Error running query: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 8542.0 failed 1 times, most recent failure: Lost task 0.0 in stage 8542.0 (TID 8594) (localhost executor driver): java.lang.NoClassDefFoundError: org/bouncycastle/jce/provider/BouncyCastleProvider at test.org.apache.spark.sql.execution.datasources.orc.FakeKeyProvider$Factory.createProvider(FakeKeyProvider.java:127) at org.apache.hadoop.crypto.key.KeyProviderFactory.get(KeyProviderFactory.java:96) at org.apache.hadoop.crypto.key.KeyProviderFactory.getProviders(KeyProviderFactory.java:68) at org.apache.orc.impl.HadoopShimsCurrent.createKeyProvider(HadoopShimsCurrent.java:97) at org.apache.orc.impl.HadoopShimsCurrent.getHadoopKeyProvider(HadoopShimsCurrent.java:131) at org.apache.orc.impl.CryptoUtils$HadoopKeyProviderFactory.create(CryptoUtils.java:158) at org.apache.orc.impl.CryptoUtils.getKeyProvider(CryptoUtils.java:141) at org.apache.orc.impl.WriterImpl.setupEncryption(WriterImpl.java:1015) at org.apache.orc.impl.WriterImpl.<init>(WriterImpl.java:164) at org.apache.orc.OrcFile.createWriter(OrcFile.java:1078) at org.apache.spark.sql.execution.datasources.orc.OrcOutputWriter.<init>(OrcOutputWriter.scala:49) at org.apache.spark.sql.execution.datasources.orc.OrcFileFormat$$anon$1.newInstance(OrcFileFormat.scala:89) at org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.newOutputWriter(FileFormatDataWriter.scala:180) at org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.<init>(FileFormatDataWriter.scala:165) at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:391) at org.apache.spark.sql.execution.datasources.WriteFilesExec.$anonfun$doExecuteWrite$1(WriteFiles.scala:107) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:901) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:901) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:374) at org.apache.spark.rdd.RDD.iterator(RDD.scala:338) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93) at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:171) at org.apache.spark.scheduler.Task.run(Task.scala:146) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$5(Executor.scala:644) at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64) at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:99) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:647) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) at java.base/java.lang.Thread.run(Thread.java:840) Caused by: java.lang.ClassNotFoundException: org.bouncycastle.jce.provider.BouncyCastleProvider at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:641) at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188) at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525) ... 32 more ``` Because we have configured `hadoop.security.key.provider.path` as `test:///` in the parent `pom.xml`, https://github.com/apache/spark/blob/5ccf9ba958f492c1eb4dde22a647ba75aba63d8e/pom.xml#L3165-L3166 `KeyProviderFactory#getProviders` will use `FakeKeyProvider$Factory` to create instances of `FakeKeyProvider`. https://github.com/apache/spark/blob/5ccf9ba958f492c1eb4dde22a647ba75aba63d8e/sql/core/src/test/resources/META-INF/services/org.apache.hadoop.crypto.key.KeyProviderFactory#L18 During the initialization of `FakeKeyProvider`, it first initializes its superclass `org.apache.hadoop.crypto.key.KeyProvider`, which leads to the loading of the `BouncyCastleProvider` class. Therefore, we need to add bouncycastle-related test dependencies in the `hive-thrift` module. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Manual Test with this pr. ``` build/mvn -Phive -Phive-thriftserver clean install -DskipTests build/mvn -Phive -Phive-thriftserver clean install -Dtest=none -DwildcardSuites=org.apache.spark.sql.hive.thriftserver.ThriftServerQueryTestSuite -pl sql/hive-thriftserver ``` ``` Run completed in 6 minutes, 52 seconds. Total number of tests run: 243 Suites: completed 2, aborted 0 Tests: succeeded 243, failed 0, canceled 0, ignored 20, pending 0 All tests passed. ``` ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#47496 from LuciferYang/thrift-bouncycastle. Authored-by: yangjie01 <yangjie01@baidu.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>

…anRelationPushDown ### What changes were proposed in this pull request? Add the timezone information to a cast expression when the destination type requires it. ### Why are the changes needed? When current_timestamp() is materialized as a string, the timezone information is gone (e.g., 2024-12-27 10:26:27.684158) which prohibits further optimization rules from being applied to the affected data source. For example, ``` Project [1735900357973433#10 AS current_timestamp()#6] +- 'Project [cast(2025-01-03 10:32:37.973433#11 as timestamp) AS 1735900357973433#10] +- RelationV2[2025-01-03 10:32:37.973433#11] xxx ``` -> This query fails to execute because the injected cast expression lacks the timezone information. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #49549 from changgyoopark-db/SPARK-50870. Authored-by: changgyoopark-db <changgyoo.park@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

…anRelationPushDown ### What changes were proposed in this pull request? Add the timezone information to a cast expression when the destination type requires it. ### Why are the changes needed? When current_timestamp() is materialized as a string, the timezone information is gone (e.g., 2024-12-27 10:26:27.684158) which prohibits further optimization rules from being applied to the affected data source. For example, ``` Project [1735900357973433#10 AS current_timestamp()#6] +- 'Project [cast(2025-01-03 10:32:37.973433#11 as timestamp) AS 1735900357973433#10] +- RelationV2[2025-01-03 10:32:37.973433#11] xxx ``` -> This query fails to execute because the injected cast expression lacks the timezone information. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #49549 from changgyoopark-db/SPARK-50870. Authored-by: changgyoopark-db <changgyoo.park@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 24abb0f) Signed-off-by: Wenchen Fan <wenchen@databricks.com>

…anRelationPushDown ### What changes were proposed in this pull request? Add the timezone information to a cast expression when the destination type requires it. ### Why are the changes needed? When current_timestamp() is materialized as a string, the timezone information is gone (e.g., 2024-12-27 10:26:27.684158) which prohibits further optimization rules from being applied to the affected data source. For example, ``` Project [1735900357973433#10 AS current_timestamp()#6] +- 'Project [cast(2025-01-03 10:32:37.973433#11 as timestamp) AS 1735900357973433#10] +- RelationV2[2025-01-03 10:32:37.973433#11] xxx ``` -> This query fails to execute because the injected cast expression lacks the timezone information. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#49549 from changgyoopark-db/SPARK-50870. Authored-by: changgyoopark-db <changgyoo.park@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit b5ca235) Signed-off-by: Wenchen Fan <wenchen@databricks.com>

pwendell reviewed Feb 26, 2014
View reviewed changes

vnivargi referenced this pull request in alteryx/spark Feb 27, 2014

Merge pull request #6 from markhamstra/master-csd

928aa6f

Merge 0.8.0-candidate-csd branch to master-csd

jhartlaub referenced this pull request in alteryx/spark Feb 27, 2014

Merge pull request #6 from markhamstra/streamingIterable

12280b5

SPY-287 updated streaming iterable

marmbrus referenced this pull request in marmbrus/spark Feb 27, 2014

Merge pull request #6 from marmbrus/joinWork

66adceb

Minor changes to get more tests passing.

ScrapCodes reviewed Feb 27, 2014
View reviewed changes

project/SparkBuild.scala

Copy link

Member Author

ScrapCodes Feb 27, 2014

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, since sbt does not have it by default thought we can have it for convenience.

ScrapCodes added 2 commits February 27, 2014 12:32

SPARK-1121-Only add avro if the build is for Hadoop 0.23.X and SPARK_…

46ed2ad

…YARN is set

Review feedback on PR

9b29e34

PingHao mentioned this pull request Oct 9, 2019

[SPARK-28120][SS] Rocksdb state storage implementation #24922

Closed

ringtail added a commit to ringtail/spark that referenced this pull request Apr 21, 2020

Merge pull request apache#6 from ringtail/feature/scheduling-performa…

fbe64e1

…nce-enhancement change executors requests policy

AngersZhuuuu mentioned this pull request Jun 18, 2020

[SPARK-32002][SQL]Support ExtractValue from nested ArrayStruct #28860

Closed

qiuxin2012 pushed a commit to qiuxin2012/spark that referenced this pull request Jan 18, 2022

add sgx log level option (apache#6)

ab17883

wangyum added a commit that referenced this pull request May 26, 2023

[CARMEL-2691] Support % as well in the pattern spec for show like tab…

1930e50

…le 'xxx' (#6)

risyomei pushed a commit to risyomei/spark that referenced this pull request Jun 26, 2023

Merge pull request apache#6 from IU/feature/VINITUS-241-patch-SPARK-3…

d59c52a

…6327 VINITUS-241 patch SPARK-36327

stefankandic added a commit to stefankandic/spark that referenced this pull request Feb 5, 2024

New Collate Grammar (apache#6)

835be0f

* initial change of grammar to support string collation * initial change of grammar to support string collation

Conversation

ScrapCodes commented Feb 26, 2014

Uh oh!

AmplabJenkins commented Feb 26, 2014

Uh oh!

AmplabJenkins commented Feb 26, 2014

Uh oh!

pwendell Feb 26, 2014

Choose a reason for hiding this comment

Uh oh!

pwendell commented Feb 26, 2014

Uh oh!

pwendell commented Feb 26, 2014

Uh oh!

AmplabJenkins commented Feb 26, 2014

Uh oh!

AmplabJenkins commented Feb 26, 2014

Uh oh!

pwendell Feb 26, 2014

Choose a reason for hiding this comment

Uh oh!

pwendell commented Feb 26, 2014

Uh oh!

AmplabJenkins commented Feb 26, 2014

Uh oh!

AmplabJenkins commented Feb 26, 2014

Uh oh!

AmplabJenkins commented Feb 26, 2014

Uh oh!

ScrapCodes Feb 27, 2014

Choose a reason for hiding this comment

Uh oh!

AmplabJenkins commented Feb 27, 2014

Uh oh!

AmplabJenkins commented Feb 27, 2014

Uh oh!

AmplabJenkins commented Feb 27, 2014

Uh oh!

AmplabJenkins commented Feb 27, 2014

Uh oh!

AmplabJenkins commented Feb 27, 2014

Uh oh!

AmplabJenkins commented Feb 27, 2014

Uh oh!

AmplabJenkins commented Feb 27, 2014

Uh oh!

AmplabJenkins commented Feb 27, 2014

Uh oh!

AmplabJenkins commented Feb 27, 2014

Uh oh!

AmplabJenkins commented Feb 27, 2014

Uh oh!

pwendell commented Feb 27, 2014

Uh oh!

pwendell commented Feb 27, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants