Skip to content

Enabled incremental build that comes with sbt 0.13.2 #525

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

ScrapCodes
Copy link
Member

More info at. sbt/sbt#1010

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14430/

@markhamstra
Copy link
Contributor

Did you actually find any benefit to adding this option? Previously, I tried the same thing that you are doing in this PR (just a few lines earlier in the file, right after the javacOptions), ran several duplicate incremental builds from various starting states, differing in whether withNameHashing was turned on or not, and I never found any clear advantage in terms of compile time in that small sample.

@ScrapCodes
Copy link
Member Author

Yes I did, in order to try that out. put sbt in continuous compilation. like sbt ~compile after that make a minor change to SparkContext such that it does not affect any other file. For example make a private field public. Try doing this with this option and without it, I am sure you will see for your self the difference in the times of compilations.

@ScrapCodes
Copy link
Member Author

@markhamstra I was curious if you are convinced ?

@pwendell
Copy link
Contributor

Okay I think since there doesn't seem to be anything bad about adding this we can try it out. This is only a developer-facing change and shouldn't affect users at all. We can turn it off if there are issues.

asfgit pushed a commit that referenced this pull request May 11, 2014
More info at. sbt/sbt#1010

Author: Prashant Sharma <prashant.s@imaginea.com>

Closes #525 from ScrapCodes/sbt-inc-opt and squashes the following commits:

ba8fa42 [Prashant Sharma] Enabled incremental build that comes with sbt 0.13.2
(cherry picked from commit 70bcdef)

Signed-off-by: Patrick Wendell <pwendell@gmail.com>
@asfgit asfgit closed this in 70bcdef May 11, 2014
pdeyhim pushed a commit to pdeyhim/spark-1 that referenced this pull request Jun 25, 2014
More info at. sbt/sbt#1010

Author: Prashant Sharma <prashant.s@imaginea.com>

Closes apache#525 from ScrapCodes/sbt-inc-opt and squashes the following commits:

ba8fa42 [Prashant Sharma] Enabled incremental build that comes with sbt 0.13.2
@ScrapCodes ScrapCodes deleted the sbt-inc-opt branch June 3, 2015 06:09
helenyugithub pushed a commit to helenyugithub/spark that referenced this pull request Aug 20, 2019
Revert removal of legacy timestamp formatter
bzhaoopenstack pushed a commit to bzhaoopenstack/spark that referenced this pull request Sep 11, 2019
…pache#525)

The OpenLab is a platform for Opensource SDKs/Tools,
disable the job for testing private cloud.

Closes: theopenlab/openlab#266
arjunshroff pushed a commit to arjunshroff/spark that referenced this pull request Nov 24, 2020
* Adding DataSource Reader Support

* Update SparkSessionExt.scala

* creating a package object

* Update MapRDBSpark.scala

* fully path to avoid name collition

* refactorings
agirish pushed a commit to HPEEzmeral/apache-spark that referenced this pull request May 5, 2022
…cript

K8S-1077 (apache#598)

* K8S-1077 - use single k8s secret with user info

MapR [SPARK-651] Replacing joda-time-*.jar with joda-time-2.10.3.jar.

MapR [SPARK-638] Wrong permissions when creating files under directory
with GID bit set.

MapR [SPARK-627] SparkHistoryServer-2.4 is getting 403 Unauthorized home page for users(spark.ui.view.acls) via spark-submit

MapR [SPARK-639] Default headers are adding two times

MapR [SPARK-629] Spark UI for job lose CSS styles

MapR [MS-925] After upgrade to MEP 6.2 (Spark 2.4.0) can no longer
consume Kafka / MapR Streams.

MapR [SPARK-626] Update kafka dependencies for Spark 2.4.4.0 in release MEP-6.3.0

MapR [SPARK-340] Jetty web server version at Spark should be updated tp v9.4.X

MapR [SPARK-617] an't use ssl via spark beeline

MapR [SPARK-617] Can't use ssl via spark beeline

MapR [SPARK-620] Replace core dependency in Spark-2.4.4

MapR [SPARK-621] Fix multiple XML configuration initialization for (apache#575)

custom headers. Use X-XSS-Protection, X-Content-Type-Options
Content-Security-Policy and Strict-Transport-Security configuration
only in case: cluster security is enabled OR
spark.ui.security.headers.enabled set to true.

MapR [SPARK-595] Spark cannot access hs2 through zookeeper

Revert "MapR [SPARK-595] Spark cannot access hs2 through zookeeper (apache#577)"

MapR [SPARK-595] Spark cannot access hs2 through zookeeper

MapR [SPARK-620] Replace core dependency in Spark-2.4.

MapR [SPARK-619] Move absent commits from 2.4.3 branch to 2.4.4 (apache#574)

* Adding SQL API to write to kafka from Spark (apache#567)

* Branch 2.4.3 extended kafka and examples (apache#569)

* The v2 API is in its own package

- the v2 api is in a different package
- the old functionality is available in a separated package

* v2 API examples

- All the examples are using the newest API.
- I have removed the old examples since they are not relevant any more and the same functionality is shown in the new examples usin the new API.

* MapR [SPARK-619] Move absent commits from 2.4.3 branch to 2.4.4

CORE-321. Add custom http header support for jetty.

MapR [SPARK-609] Port Apache Spark-2.4.4 changes to the MapR Spark-2.4.4 branch

Adding multi table loader (apache#560)

* Adding multi table loader

- This allows us to load multiple matching tables into one Union DataFrame.

If we have the fallowing MFS structure:

```
/clients/client_1/data.table
/clients/client_2/data.table
```
we can load a union dataframe by doing `loadFromMapRDB("/clients/*/*.table")`

* Fixing the path to the reader

MapR [SPARK-588] Spark thriftserver fails when work with hive-maprdb json table

MapR [SPARK-598] Spark can't add needed properties to hive-site.xml

MAPR-SPARK-596: Change HBase compatible version for Spark 2.4.3

MapR [SPARK-592] Add possibility to use start-thriftserver.sh script with 2304 port

MapR [SPARK-584] MaprDB connector's setHintUsingIndex method doesn't work as expected

MapR [SPARK-583] MaprDB connector's loadFromMaprDB function for Java API doesn't work as expected

SPARK-579 info about ssl_trustore is added for metrics

MapR [SPARK-552] Failed to get broadcast_11_piece0 of broadcast_11

SPARK-569 Generation of SSL ceritificates for spark UI

MapR [SPARK-575] Warning messages in spark workspace after the second attempt to login to job's UI

Update zookeeper version

Adding `joinWithMapRDBTable` function (apache#529)

The related documentation of this function is here https://github.com/anicolaspp/MapRDBConnector#joinwithmaprdbtable.

The main idea is that having a dataframe (no matter how was it constructed) we can join it with a MapR-DB table. This functions looks at the join query and load only those records from MapR-DB that will join instead of loading the full table and then join in memory. In other words, we only load what we know will be joint.

Adding DataSource Reader Support (apache#525)

* Adding DataSource Reader Support

* Update SparkSessionExt.scala

* creating a package object

* Update MapRDBSpark.scala

* fully path to avoid name collition

* refactorings

MapR [SPARK-451] Spark hadoop/core dependency updates

MapR [SPARK-566] Move absent commits from 2.4.0 branch

MapR [SPARK-561] Spark 2.4.3 porting to MapR

MapR [SPARK-561] Spark 2.4.3 porting to MapR

MapR [SPARK-558] Render application UI init page if driver is not up

MapR [SPARK-541] Avoid duplication of the first unexpired record

MapR [COLD-150][K8S] Fix metrics copy

MapR [K8S-893] Hide plain text password from logs

MapR [SPARK-540] Include 'avro' artifacts

MapR [SPARK-536] PySpark streaming package for kafka-0-10 added

K8S-853: Enable spark metrics for external tenant

MapR [SPARK-531] Remove duplicating entries from classpath in ClasspathFilter

MapR [SPARK-516] Spark jobs failure using yarn mode on kerberos fixed

MapR [SPARK-462] Spark and SparkHistoryServer allow week ciphers, which can allow man in the middle attack

[SPARK-508] MapR-DB OJAI Connector for Spark isNull condition returns incorrect result

MapR [SPARK-510] nonmapr "admin" users not able to view other user logs in SHS

SPARK-460: Spark Metrics for CollectD Configuration for collecting Spark metrics

SPARK-463 MAPR_MAVEN_REPO variable for specifying mapR repository

MapR [SPARK-492] Spark 2.4.0.0 configure.sh has error messages

MapR [SPARK-515][K8S] Remove configure.sh call for k8s

MapR [SPARK-515] Move configuring spark-env.sh back to the private-pkg

MapR [SPARK-515] Move configuring spark-env.sh back to the private-pkg

MapR [SPARK-514] Recovery from checkpoint is broken

MapR [SPARK-445] Messages loss fixed by reverting [MAPR-32290] changes from kafka09 package (apache#460)

* MapR [SPARK-445] Revert "[MAPR-32290] Spark processing offsets when messages are already TTL in the first batch (apache#376)"

This reverts commit e8d59b9.

* MapR [SPARK-445] Revert "[MAPR-32290] Spark processing offsets when messages are already ttl in first batch (apache#368)"

This reverts commit b282a8b.

MapR [SPARK-445] Messages loss fixed by reverting [MAPR-32290] changes from kafka10 package

MapR [SPARK-469] Fix NPE in generated classes by reverting "[SPARK-23466][SQL] Remove redundant null checks in generated Java code by GenerateUnsafeProjection" (apache#455)

This reverts commit c5583fd.

MapR [SPARK-482] Spark streaming app fails to start by UnknownTopicOrPartitionException with checkpoint

MapR [SPARK-496] Spark HS UI doesn't work

MapR [SPARK-416] CVE-2018-1320 vulnerability in Apache Thrift

MapR [SPARK-486][K8S] Fix sasl encryption error on Kubernetes

MapR [SPARK-481] Cannot run spark configure.sh on Client node

MapR [K8S-637][K8S] Add configure.sh configuration in spark-defaults.conf for job runtime

MapR [SPARK-465] Error messages after update of spark 2.4

MapR [SPARK-465] Error messages after update of spark 2.4

MapR [SPARK-464] Can't submit spark 2.4 jobs from mapr-client

[SPARK-466] SparkR errors fixed

MapR [SPARK-456] Spark shell can't be started

SPARK-417 impersonation fixes for spark executor. Impersonation is mo… (apache#433)

* SPARK-417 impersonation fixes for spark executor. Impersonation is moved from HadoopRDD.compute() method to org.apache.spark.executor.Executor.run() method

* SPARK-363 Hive version changed to '1.2.0-mapr-spark-MEP-6.0.0'

[SPARK-449] Kafka offset commit issue fixed

MapR [SPARK-287] Move logic of creating /apps/spark folder from installer's scripts to the configure.sh

MapR [SPARK-221] Investigate possibility to move creating of the spark-env.sh from private-pkg to configure.sh

MapR [SPARK-430] PID files should be under /opt/mapr/pid

MapR [SPARK-446] Spark configure.sh doesn't start/stop Spark services

MapR [SPARK-434] Move absent commits from 2.3.2 branch (apache#425)

* MapR [SPARK-352] Spark shell fails with "NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream" if java is not available in PATH

* MapR [SPARK-350] Deprecate Spark Kafka-09 package

* MapR [SPARK-326] Investigate possibility of writing Java example for the MapRDB OJAI connector

* [SPARK-356] Merge mapr changes from kafka-09 package into the kafka-10

* SPARK-319 Fix for sparkR version check

* MapR [SPARK-349] Update OJAI client to v3 for Spark MapR-DB JSON connector

* MapR [SPARK-367] Move absent commits from 2.3.1 branch

* MapR [SPARK-137] Analyze the warning during compilation of OJAI connector

* MapR [SPARK-369] Spark 2.3.2 fails with error related to zookeeper

* [MAPR-26258] hbasecontext.HBaseDistributedScanExample fails

* [SPARK-24355] Spark external shuffle server improvement to better handle block fetch requests

* MapR [SPARK-374] Spark Hive example fails when we submit job from another(simple) cluster user

* MapR [SPARK-434] Move absent commits from 2.3.2 branch

* MapR [SPARK-434] Move absent commits from 2.3.2 branch

* MapR [SPARK-373] Unexpected behavior during job running in standalone cluster mode

* MapR [SPARK-419] Update hive-maprdb-json-handler jar for spark 2.3.2.0 and spark 2.2.1

* MapR [SPARK-396] Interface change of sendToKafka

* MapR [SPARK-357] consumer groups are prepeneded with a "service_" prefix

* MapR [SPARK-429] Changes in maprdb connector are the cause of broken backward compatibility

* MapR [SPARK-427] Update kafka in Spark-2.4.0 to the 1.1.1-mapr

* MapR [SPARK-434] Move absent commits from 2.3.2 branch

* Move absent commits from 2.3.2 branch

* MapR [SPARK-434] Move absent commits from 2.3.2 branch

* Move absent commits from 2.3.2 branch

* Move absent commits from 2.3.2 branch

MapR [SPARK-427] Update kafka in Spark-2.4.0 to the 1.1.1-mapr

MapR [SPARK-379] Spark 2.4 4-gidit version

MapR [PIC-48][K8S] Port k8s changes to 2.4.0

[PIC-48] Create user for k8s driver and executor if required

[PIC-48] Create user for k8s driver and executor if required

Revert "Remove spark.ui.filters property"

This reverts commit d8941ba36c3451cdce15d18d6c1a52991de3b971.

[SPARK-351] Copy kubernetes start scripts anyway

PIC-34: Rename default configmap name to be consistent with mapr-kubernetes

[SPARK-23668][K8S] Add config option for passing through k8s Pod.spec.imagePullSecrets (apache#355)

Pass through the `imagePullSecrets` option to the k8s pod in order to allow user to access private image registries.

See https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/

Unit tests + manual testing.

Manual testing procedure:
1. Have private image registry.
2. Spark-submit application with no `spark.kubernetes.imagePullSecret` set. Do `kubectl describe pod ...`. See the error message:
```
Error syncing pod, skipping: failed to "StartContainer" for "spark-kubernetes-driver" with ErrImagePull: "rpc error: code = 2 desc = Error: Status 400 trying to pull repository ...: \"{\\n  \\\"errors\\\" : [ {\\n    \\\"status\\\" : 400,\\n    \\\"message\\\" : \\\"Unsupported docker v1 repository request for '...'\\\"\\n  } ]\\n}\""
```
3. Create secret `kubectl create secret docker-registry ...`
4. Spark-submit with `spark.kubernetes.imagePullSecret` set to the new secret. See that deployment was successful.

Author: Andrew Korzhuev <andrew.korzhuev@klarna.com>
Author: Andrew Korzhuev <korzhuev@andrusha.me>

Closes apache#20811 from andrusha/spark-23668-image-pull-secrets.

[SPARK-321] Change default value of spark.mapr.ssl.secret.prefix property

[PIC-32] Spark on k8s with MapR secure cluster

Update entrypoint.sh with correct spark version (apache#340)

This PR has minor fix to correct the spark version string

[SPARK-274] Create home directory for user who submitted job

[MAPR-SPARK-230] Implement security for Spark on Kubernetes

Run Spark job with specify the username for driver and executor

Read cluster configs from configMap

Run configure.sh script form entrypoint.sh

Remove spark.kubernetes.driver.pod.commands property

Add Spark properties for executor and driver environment variable

MapR [SPARK-296] Structured Streaming memory leak

Revert "[MAPR-SPARK-210] Rename sprk-defaults.conf to spark-defaults.conf.tem…" (apache#252)

* Revert "[MAPR-SPARK-176] Fix Spark Project Catalyst unit tests (apache#251)"

This reverts commit 5de05075cd14abf8ac65046a57a5d76617818fbe.

* Revert "[MAPR-SPARK-210] Rename sprk-defaults.conf to spark-defaults.conf.template (apache#249)"

This reverts commit 1baa677d727e89db7c605ffbae9a9eba00337ad0.

[MAPR-SPARK-210] Rename sprk-defaults.conf to spark-defaults.conf.template

MapR [SPARK-379] Port Spark to 2.4.0

MapR [SPARK-341] Spark 2.3.2 porting

[MAPR-32290] Spark processing offsets when messages are already TTL in the first batch

* Bug 32263 - Seek called on unsubscribed partitions

[MSPARK-331] Remove snapshot versions of mapr dependencies from Spark-2.3.1

[MAPR-32290] Spark processing offsets when messages are already ttl in first batch

MapR [SPARK-325] Add examples for work with the MapRDB JSON connector into the Spark project

[ATS-449] Unit test for EBF 32013 created.

MAPR-SPARK-311: Spark beeline uses default ssl truststore instead of mapr ssl truststore

Bug 32355 - Executor tab empty on Spark UI

[SPARK-318] Submitting Spark jobs from Oozie fails due to ClassNotFoundException

Bug 32014 - Spark Consumer fails with java.lang.AssertionError

Revert "[SPARK-306] Kafka clients 1.0.1 present in jars directory for Spark 2.3.1" (apache#341)

* Revert "[SPARK-306] Kafka clients 1.0.1 present in jars directory for Spark 2.3.1 (apache#335)"

This reverts commit 832411e.

Bug 32014 - Spark Consumer fails with java.lang.AssertionError (apache#326) (apache#336)

* MapR [32014] Spark Consumer fails with java.lang.AssertionError

[SPARK-306] Kafka clients 1.0.1 present in jars directory for Spark 2.3.1

DEVOPS-2768 temporarily removed curl for file downloading

[SPARK-302] Local privilege escalation

MapR [SPARK-297] Added unit test for empty value conversion

MapR [SPARK-297] Empty values are loaded as non-null

MapR [SPARK-296] Structured Streaming memory leak

2.3.1 spark 289 (apache#318)

* MapR [SPARK-289] Fix unit test for Spark-2.3.1

[SPARK-130] MapRDB connector - NPE while saving Pair RDD with 'null' values

MapR [SPARK-283] Unit tests fail during initialization SSL properties.

[SPARK-212] SparkHiveExample fails when we run it twice

MapR [SPARK-282] Remove maprfs and hadoop jars from mapr spark package

MapR [SPARK-278] Spark submit fails for jobs with python

MapR [SPARK-279] Can't connect to spark thrift server with new spark and hive packages

MapR [SPARK-276] Update zookeeper dependency to v.3.4.11 for spark 2.3.1

MapR [SPARK-272] Use only client passwords from ssl-client.xml

MapR [SPARK-266] Spark jobs can't finish correctly, when there is an error during job running

MapR [SPARK-263] Add possibility to use keyPassword which is different from keyStorePassword

[MSPARK-31632] RM UI showing broken page for Spark jobs

MapR [SPARK-261] Use mapr-security-web for getting passwords.

MapR [SPARK-259] Spark application doesn't finish correctly

MapR [SPARK-268] Update Spark version for Warden

change project version to 2.3.1-mapr-SNAPSHOT

MapR [SPARK-256] Spark doesn't work on yarn mode

MapR [SPARK-255] Installer fresh install 610/600 secure fails to start "mapr-spark-thriftserver", "mapr-spark-historyserver"

Mapr [SPARK-248] MapRDBTableScanRDD fails to convert to Scala Dataframe when using where clause

MapR [SPARK-225] Hadoop credentials provider usage for hiding passwords at spark-defaults

MapR [SPARK-214] Hive-2.1 poperties can't be read from a hive-site.xml as Spark uses Hive-1.2

MapR [SPARK-216] Spark thriftserver fails when work with hive-maprdb json table

SPARK-244 (apache#278)

Provide ability to use MapR-Negotiation authentication for Spark HistoryServer

MapR [SPARK-226] Spark - pySpark Security Vulnerability

MapR [SPARK-220] SparkR fails with UDF functions bug fixed

MapR [SPARK-227] KafkaUtils.createDirectStream fails with kafka-09

MapR [SPARK-183] Spark Integration for Kafka 0.10 unit tests disabled

MapR [SPARK-182] Spark Project External Kafka Producer v09 unit tests fixed

MapR [SPARK-179] Spark Integration for Kafka 0.9 unit tests fixed

MapR [SPARK-181] Kafka 0.10 Structured Streaming unit tests fixed

[MSPARK-31305] Spark History server NOT loading applications submitted by users other than 'mapr'

MapR [SPARK-175] Fix Spark Project Streaming unit tests

[MAPR-SPARK-176] Fix Spark Project Catalyst unit tests

[MAPR-SPARK-178] Fix Spark Project Hive unit tests

MapR [SPARK-174] Spark Core unit tests fixed

Changed version for spark-kafka connector.

MapR [SPARK-202] Update MapR Spark to 2.3.0

Fixed compile time errors in tests

Change project version

[SPARK-198] Update hadoop dependency version to 2.7.0-mapr-1803 for Spark 2.2.1

MapR [SPARK-188] Couldn't connect to thrift server via spark beeline on kerberos cluster

MapR [SPARK-143] Spark History Server does not require login for secured-by-default clusters

MapR [SPARK-186] Update OJAI versions to the latest for Spark-2.2.1 OJAI Connector

MapR [SPARK-191] Incorrect work of MapR-DB Sink 'complete' and 'update' modes fixed

MapR [SPARK-170] StackOverflowException in equals method in DBMapValue

2.2.1 build fixed (apache#231)

* MapR [SPARK-164] Update Kafka version to 1.0.1-mapr in Spark Kafka Producer module

MapR [SPARK-161] Include Kafka Structured streaming jar to Spark package.

MapR [SPARK-155] Change Spark Master port from 8080

MapR [SPARK-153] Exception in spark job with configured labels on yarn-client mode

MapR [SPARK-152] Incorrect date string parsing fixed

MapR [SPARK-21] Structured Streaming MapR-DB Sink created

MapR [SPARK-135]  Spark 2.2 with MapR Streams ( Kafka 1.0) (apache#218)

* MapR [SPARK-135] Spark 2.2 with MapR Streams (Kafka 1.0)
Added functionality of MapR-Streams specific EOF handling.

MapR [SPARK-143] Spark History Server does not require login for secured-by-default clusters

Disable build failing if scalastyle checking is fall.

MapR [SPARK-16] Change Spark version in Warden files and configure.sh

MapR [SPARK-144] Add insertToMapRDB method for rdd for Java API

[MAPR-30536]  Spark SQL queries on Map column fails after upgrade

MapR [SPARK-139] Remove "update" related APIs from connector

MapR [SPARK-140] Change the option name "tableName" to "tablePath" in the Spark/MapR-DB connectors.

MapR [SPARK-121] Spark OJAI JAVA: update functionality removed

MapR [SPARK-118] Spark OJAI Python: missed DataFrame import while moving imports in order to fix MapR [ZEP-101] interpreter issue

MapR [SPARK-118] Spark OJAI Python: move MapR DB Connector class importing in order to fix MapR [ZEP-101] interpreter issue

MapR [SPARK-117] Spark OJAI Python: Save functionality implementation

MapR [SPARK-131] Exception when try to save JSON table with Binary _id field

Spark OJAI JAVA: load to RDD, save from RDD implementation (apache#195)

* MapR [SPARK-124] Loading to JavaRDD implemented
* MapR [SPARK-124] MapRDBJavaSparkContext constructor changed
* MapR [SPARK-124] implemented RDD[Row] saving

MapR [SPARK-118] Spark OJAI Python: Read implementation

MapR [SPARK-128] MapRDB connector - wrong handle of null fields when nullable is false

* MapR [SPARK-121] Spark OJAI JAVA: Read to Dataset functionality implementation
* Minor refactoring

MapR [SPARK-125] Default value of idFieldPath parameter is not handle

MapR [SPARK-113] Hit java.lang.UnsupportedOperationException: empty.reduceLeft during loadFromMapRDB

Spark Mapr-DB connector was refactored according to Scala style
Removed code duplication

[MSPARK-107]idField information is lost in MapRDBDataFrameWriterFunctions.saveToMapRDB

configure.sh takes options to change ports

Kafka client excluded from package because correct version is located in "mapr classpath"

Changed Kafka version in Kafka producer module.

Branch spark 69 (apache#170)

* Fixing the wrong type casting of TimeStamp to OTimeStamp when read from spark dataFrame.

* SPARK-69: Problem with license when we try to read from json and write to maprdb

remove creatin /usr/local/spark link from configure.sh. This link will be creates by private-pkg

remove include-maprdb from default profiles

added profiles in maprdb pom file instead of two pom files

Fixed maprdb connector dependencies.

Fixing the wrong type casting of TimeStamp to OTimeStamp when read from spark dataFrame.

changed port for spark-thriftserver as it conflicts with hive server

changed port for spark-thriftserver as it conflicts with hive server

remove .not_configured_yet file after success

Ojai connector fixed required java version

[MSPARK-45] Move Spark-OJAI connector code to Spark github repo (apache#132)

* SPARK-45 Move Spark-OJAI connector code to Spark github repo

* Fixing pom versions for maprdb spark connector.

* Changes made to the connector code to be compatible with 5.2.* and 6.0 clients.

Spark 2.1.0 mapr 29106 (apache#150)

* [SPARK-20922][CORE] Add whitelist of classes that can be deserialized by the launcher.

Blindly deserializing classes using Java serialization opens the code up to
issues in other libraries, since just deserializing data from a stream may
end up execution code (think readObject()).

Since the launcher protocol is pretty self-contained, there's just a handful
of classes it legitimately needs to deserialize, and they're in just two
packages, so add a filter that throws errors if classes from any other
package show up in the stream.

This also maintains backwards compatibility (the updated launcher code can
still communicate with the backend code in older Spark releases).

Tested with new and existing unit tests.

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes apache#18166 from vanzin/SPARK-20922.

(cherry picked from commit 8efc6e9)
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>

(cherry picked from commit 772a9b9)

* [SPARK-20922][CORE][HOTFIX] Don't use Java 8 lambdas in older branches.

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes apache#18178 from vanzin/SPARK-20922-hotfix.

Added security by default for historyserver

use waitForConsumerAssignment() instead of consumer.poll(0) for spark-29052

change MAPR_HADOOP_CLASSPATH in configure.sh for creating it by mapr-classpath.sh

change MAPR_HADOOP_CLASSPATH in configure.sh for creating it by mapr-classpath.sh

changes for mapr-classpath.sh

changes for mapr-classpath.sh

configure.sh changes

[SPARK-39] Classpath filter was added

Fixed impersonation when data read from MapR-DB via Spark-Hive.

added configure.sh and warden.spark-thriftserver.conf

hive-hbase-handler added to Spark jars

Fixed "Single message comes late"

28339 bug fixed

Spark streaming skipped message with zero offset from Kafka 0.9

[MSPARK-9] Initial fix for Spark unit tests

Bump dependencies after ECO-1703 release

[SPARK-33] Streaming example fixed

[MAPR-26060] Fixed case when mapr-streams make gaps in offsets

ported features from kafka 10 to kafka 9

[MAPR-26289][SPARK-2.1] Streaming general improvements (apache#93)

* Added include-kafka-09 profile to Assembly
* Set default poll timeout to 120s

Set default HBase verison to 1.1.8

Changes from Kafka10  package were ported to Kafka09 package.

[MAPR-26053] Include MapR Classes to the default value of spark.sql.hive.metastore.sharedPrefixes

[MAPR-25807] Spark-Warehouse path computes incorrectly

Add MapR-SASL support for Thrift Server

Adding scala library.

[MAPR-25713] Spark might try to load MapR Class Loader multiple times and fail

[MAPR-25311] Bump Spark dependencies after ECO-1611 release

[MINOR] Fix spark-jars.sh script

[MAPR-24603] Could not launch beeline shell after starting spark thrift server

fixed syntax error in V09DirectKafkaWordCount example

Spark 2.0.1 MAPR-streams Python API

[MAPR-24415] SPARK_JAVA_OPTS is deprecated

Kafka streaming producer added.

Minor fix for previous commit

Added script for MAPR-24374

Some minor changes to spark-defaults.conf

Changed default HBase version to 1.1.1 in compatibility.version

Streaming example was refactored

[MAPR-24470] HiveFromSpark test fails in yarn-cluster mode

Added MapR Repo

[MAPR-22940] Failed to connect spark beeline (after spark thrift server is started) on Kerberos cluster

[MAPR-18865] Unable to submit spark apps from Windows client

Skip maven clean task on the parent module

New: Issue with running Hive commands in Spark

This is fixed in SPARK-7819
Isolated Hive Client Loader appears to cause Native Library
libMapRClient.4.0.2-mapr.so already loaded in another classloader error

Spark warden.services.conf should have dependency on cldb

Remove DFS shuffle settings.

These settings are not used right now.

Copy every file in the conf directory into the distribution package.

Create spark-defaults.conf for MapR

Settings to enable DFS shuffle on MapR.

Support hbase classpath computation in util script.

Adding external conf and scripts.

Enable SPARK_HIVE mode while building.

This is needed to bundle datanucleus jars needed for hive table creation.

Build Spark on MapR.
- make-distribution.sh takes an environment variable to enable profiles -
  MVN_PROFILE_ARG
- Added warden conf files under ext-conf.
- Updated pom.xml to use right set of jars and version.

Spark Master failed to start in HA mode

Updated Apache Curator version

Added spark streaming integration with kafka 0.9 and mapr-streams

Added MapR Repo
RolatZhang pushed a commit to RolatZhang/spark that referenced this pull request Dec 16, 2022
…aging (Sort with expressions) (apache#525)

* [SPARK-39784][SQL] Put Literal values on the right side of the data source filter after translating Catalyst Expression to data source filter

Even though the literal value could be on both sides of the filter, e.g. both `a > 1` and `1 < a` are valid, after translating Catalyst Expression to data source filter, we want the literal value on the right side so it's easier for the data source to handle these filters. We do this kind of normalization for V1 Filter. We should have the same behavior for V2 Filter.

Before this PR, for the filters that have literal values on the right side, e.g. `1 > a`, we keep it as is. After this PR, we will normalize it to `a < 1` so the data source doesn't need to check each of the filters (and do the flip).

I think we should follow V1 Filter's behavior, normalize the filters during catalyst Expression to DS Filter translation time to make the literal values on the right side, so later on, data source doesn't need to check every single filter to figure out if it needs to flip the sides.

no

new test

Closes apache#37197 from huaxingao/flip.

Authored-by: huaxingao <huaxin_gao@apple.com>
Signed-off-by: huaxingao <huaxin_gao@apple.com>

* [SPARK-39836][SQL] Simplify V2ExpressionBuilder by extract common method

Currently, `V2ExpressionBuilder` have a lot of similar code, we can extract them as one common method.

We can simplify the implement with the common method.

Simplify `V2ExpressionBuilder` by extract common method.

'No'.
Just update inner implementation.

N/A

Closes apache#37249 from beliefer/SPARK-39836.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* [SPARK-39858][SQL] Remove unnecessary `AliasHelper` or `PredicateHelper` for some rules

When I using `AliasHelper`, I find that some rules inherit it instead of using it.

This PR removes unnecessary `AliasHelper` or `PredicateHelper` in the following cases:
- The rule inherit `AliasHelper` instead of using it. In this case, we can remove `AliasHelper` directly.
- The rule inherit `PredicateHelper` instead of using it. In this case, we can remove `PredicateHelper` directly.
- The rule inherit `AliasHelper` and `PredicateHelper`. In fact, `PredicateHelper` already extends `AliasHelper`. In this case, we can remove `AliasHelper`.
- The rule inherit `OperationHelper` and `PredicateHelper`. In fact, `OperationHelper` already extends `PredicateHelper`. In this case, we can remove `PredicateHelper`.
- The rule inherit `PlanTest` and `PredicateHelper`. In fact, `PlanTest` already extends `PredicateHelper`.  In this case, we can remove `PredicateHelper`.
- The rule inherit `QueryTest` and `PredicateHelper`. In fact, `QueryTest` already extends `PredicateHelper`.  In this case, we can remove `PredicateHelper`.

Remove unnecessary `AliasHelper` or `PredicateHelper` for some rules

'No'.
Just improve the inner implementation.

N/A

Closes apache#37272 from beliefer/SPARK-39858.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* [SPARK-39784][SQL][FOLLOW-UP] Use BinaryComparison instead of Predicate (if) for type check

follow up this [comment](apache#37197 (comment))

code simplification

No

Existing test

Closes apache#37278 from huaxingao/followup.

Authored-by: huaxingao <huaxin_gao@apple.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

* [SPARK-39909] Organize the check of push down information for JDBCV2Suite

This PR changes the check method from `check(one_large_string)` to `check(small_string1, small_string2, ...)`

It can help us check the results individually and make the code more clearer.

no

existing tests

Closes apache#37342 from yabola/fix.

Authored-by: chenliang.lu <marssss2929@gmail.com>
Signed-off-by: huaxingao <huaxin_gao@apple.com>

* [SPARK-39961][SQL] DS V2 push-down translate Cast if the cast is safe

Currently, DS V2 push-down translate `Cast` only if the ansi mode is true.
In fact, if the cast is safe(e.g. cast number to string, cast int to long), we can translate it too.

This PR will call `Cast.canUpCast` so as we can translate `Cast` to V2 `Cast` safely.

Note: The rule `SimplifyCasts` optimize some safe cast, e.g. cast int to long, so we may not see the `Cast`.

Add the range for DS V2 push down `Cast`.

'Yes'.
`Cast` could be pushed down to data source in more cases.

Test cases updated.

Closes apache#37388 from beliefer/SPARK-39961.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

* [SPARK-38901][SQL] DS V2 supports push down misc functions

Currently, Spark have some misc functions. Please refer
https://github.com/apache/spark/blob/2f8613f22c0750c00cf1dcfb2f31c431d8dc1be7/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala#L688

These functions show below:
`AES_ENCRYPT,`
`AES_DECRYPT`,
`SHA1`,
`SHA2`,
`MD5`,
`CRC32`

Function|PostgreSQL|ClickHouse|H2|MySQL|Oracle|Redshift|Snowflake|DB2|Vertica|Exasol|SqlServer|Yellowbrick|Mariadb|Singlestore|
-- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
`AesEncrypt`|Yes|Yes|Yes|Yes|Yes|NO|Yes|Yes|NO|NO|NO|Yes|Yes|Yes|
`AesDecrypt`|Yes|Yes|Yes|Yes|Yes|NO|Yes|Yes|NO|NO|NO|Yes|Yes|Yes|
`Sha1`|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|
`Sha2`|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|
`Md5`|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|
`Crc32`|No|Yes|No|Yes|NO|Yes|NO|Yes|NO|NO|NO|NO|NO|Yes|

DS V2 should supports push down these misc functions.

DS V2 supports push down misc functions.

'No'.
New feature.

New tests.

Closes apache#37169 from chenzhx/misc.

Authored-by: chenzhx <chen@apache.org>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* [SPARK-39964][SQL] DS V2 pushdown should unify the translate path

Currently, DS V2 pushdown have two translate path `DataSourceStrategy.translateAggregate` used to translate aggregate functions and `V2ExpressionBuilder` used to translate other functions and expressions, we can unify them.

After this PR, the translate have only one code path, developers will easy to coding and reading.

Unify the translate path for DS V2 pushdown.

'No'.
Just update the inner implementation.

N/A

Closes apache#37391 from beliefer/SPARK-39964.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* [SPARK-39819][SQL] DS V2 aggregate push down can work with Top N or Paging (Sort with expressions)

Currently, DS V2 aggregate push-down cannot work with DS V2 Top N push-down (`ORDER BY col LIMIT m`) or DS V2 Paging push-down (`ORDER BY col LIMIT m OFFSET n`).
If we can push down aggregate with Top N or Paging, it will be better performance.

This PR only let aggregate pushed down with ORDER BY expressions which must be GROUP BY expressions.

The idea of this PR are:
1. When we give an expectation outputs of `ScanBuilderHolder`, holding the map from expectation outputs to origin expressions (contains origin columns).
2. When we try to push down Top N or Paging, we need restore the origin expressions for `SortOrder`.

Let DS V2 aggregate push down can work with Top N or Paging (Sort with group expressions), then users can get the better performance.

'No'.
New feature.

New test cases.

Closes apache#37320 from beliefer/SPARK-39819_new.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* [SPARK-39929][SQL] DS V2 supports push down string  functions(non ANSI)

**What changes were proposed in this pull request?**

support more  commonly used string functions

BIT_LENGTH
CHAR_LENGTH
CONCAT

The mainstream databases support these functions show below.

Function | PostgreSQL | ClickHouse | H2 | MySQL | Oracle | Redshift | Presto | Teradata | Snowflake | DB2 | Vertica | Exasol | SqlServer | Yellowbrick | Impala | Mariadb | Druid | Pig | SQLite | Influxdata | Singlestore | ElasticSearch
-- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
BIT_LENGTH | Yes | Yes | Yes | Yes | Yes | no | no | no | no | Yes | Yes | Yes | no | Yes | no | Yes | no | no | no | no | no | Yes
CHAR_LENGTH | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | no | Yes | Yes | Yes | Yes
CONCAT | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | no | no | no | Yes | Yes

**Why are the changes needed?**
DS V2 supports push down string functions

**Does this PR introduce any user-facing change?**
'No'.
New feature.

How was this patch tested?
New tests.

Closes apache#37427 from zheniantoushipashi/SPARK-39929.

Authored-by: biaobiao.sun <1319027852@qq.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* [SPARK-38899][SQL][FOLLOWUP] Fix bug extract datetime in DS V2 pushdown

[SPARK-38899](apache#36663) supports extract function in JDBC data source.
But the implement is incorrect.
This PR just add a test case and it will be failed !
The test case show below.
```
test("scan with filter push-down with date time functions")  {
    val df9 = sql("SELECT name FROM h2.test.datetime WHERE " +
      "dayofyear(date1) > 100 order by dayofyear(date1) limit 1")
    checkFiltersRemoved(df9)
    val expectedPlanFragment9 =
      "PushedFilters: [DATE1 IS NOT NULL, EXTRACT(DAY_OF_YEAR FROM DATE1) > 100], " +
      "PushedTopN: ORDER BY [EXTRACT(DAY_OF_YEAR FROM DATE1) ASC NULLS FIRST] LIMIT 1,"
    checkPushedInfo(df9, expectedPlanFragment9)
    checkAnswer(df9, Seq(Row("alex")))
  }
```

The test case output failure show below.
```
"== Parsed Logical Plan ==
'GlobalLimit 1
+- 'LocalLimit 1
   +- 'Sort ['dayofyear('date1) ASC NULLS FIRST], true
      +- 'Project ['name]
         +- 'Filter ('dayofyear('date1) > 100)
            +- 'UnresolvedRelation [h2, test, datetime], [], false

== Analyzed Logical Plan ==
name: string
GlobalLimit 1
+- LocalLimit 1
   +- Project [name#x]
      +- Sort [dayofyear(date1#x) ASC NULLS FIRST], true
         +- Project [name#x, date1#x]
            +- Filter (dayofyear(date1#x) > 100)
               +- SubqueryAlias h2.test.datetime
                  +- RelationV2[NAME#x, DATE1#x, TIME1#x] h2.test.datetime test.datetime

== Optimized Logical Plan ==
Project [name#x]
+- RelationV2[NAME#x] test.datetime

== Physical Plan ==
*(1) Scan org.apache.spark.sql.execution.datasources.v2.jdbc.JDBCScan$$anon$145f6181a [NAME#x] PushedFilters: [DATE1 IS NOT NULL, EXTRACT(DAY_OF_YEAR FROM DATE1) > 100], PushedTopN: ORDER BY [org.apache.spark.sql.connector.expressions.Extract3b95fce9 ASC NULLS FIRST] LIMIT 1, ReadSchema: struct<NAME:string>

" did not contain "PushedFilters: [DATE1 IS NOT NULL, EXTRACT(DAY_OF_YEAR FROM DATE1) > 100], PushedTopN: ORDER BY [EXTRACT(DAY_OF_YEAR FROM DATE1) ASC NULLS FIRST] LIMIT 1,"
```

Fix an implement bug.
The reason of the bug is the Extract the function does not implement the toString method when pushing down the JDBC data source.

'No'.
New feature.

New test case.

Closes apache#37469 from chenzhx/spark-master.

Authored-by: chenzhx <chen@apache.org>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* code update

Signed-off-by: huaxingao <huaxin_gao@apple.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Co-authored-by: huaxingao <huaxin_gao@apple.com>
Co-authored-by: Jiaan Geng <beliefer@163.com>
Co-authored-by: chenliang.lu <marssss2929@gmail.com>
Co-authored-by: biaobiao.sun <1319027852@qq.com>
RolatZhang pushed a commit to RolatZhang/spark that referenced this pull request Apr 7, 2023
…aging (Sort with expressions) (apache#525)

* [SPARK-39784][SQL] Put Literal values on the right side of the data source filter after translating Catalyst Expression to data source filter

### What changes were proposed in this pull request?

Even though the literal value could be on both sides of the filter, e.g. both `a > 1` and `1 < a` are valid, after translating Catalyst Expression to data source filter, we want the literal value on the right side so it's easier for the data source to handle these filters. We do this kind of normalization for V1 Filter. We should have the same behavior for V2 Filter.

Before this PR, for the filters that have literal values on the right side, e.g. `1 > a`, we keep it as is. After this PR, we will normalize it to `a < 1` so the data source doesn't need to check each of the filters (and do the flip).

### Why are the changes needed?
I think we should follow V1 Filter's behavior, normalize the filters during catalyst Expression to DS Filter translation time to make the literal values on the right side, so later on, data source doesn't need to check every single filter to figure out if it needs to flip the sides.

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?
new test

Closes apache#37197 from huaxingao/flip.

Authored-by: huaxingao <huaxin_gao@apple.com>
Signed-off-by: huaxingao <huaxin_gao@apple.com>

* [SPARK-39836][SQL] Simplify V2ExpressionBuilder by extract common method

### What changes were proposed in this pull request?
Currently, `V2ExpressionBuilder` have a lot of similar code, we can extract them as one common method.

We can simplify the implement with the common method.

### Why are the changes needed?
Simplify `V2ExpressionBuilder` by extract common method.

### Does this PR introduce _any_ user-facing change?
'No'.
Just update inner implementation.

### How was this patch tested?
N/A

Closes apache#37249 from beliefer/SPARK-39836.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* [SPARK-39858][SQL] Remove unnecessary `AliasHelper` or `PredicateHelper` for some rules

### What changes were proposed in this pull request?
When I using `AliasHelper`, I find that some rules inherit it instead of using it.

This PR removes unnecessary `AliasHelper` or `PredicateHelper` in the following cases:
- The rule inherit `AliasHelper` instead of using it. In this case, we can remove `AliasHelper` directly.
- The rule inherit `PredicateHelper` instead of using it. In this case, we can remove `PredicateHelper` directly.
- The rule inherit `AliasHelper` and `PredicateHelper`. In fact, `PredicateHelper` already extends `AliasHelper`. In this case, we can remove `AliasHelper`.
- The rule inherit `OperationHelper` and `PredicateHelper`. In fact, `OperationHelper` already extends `PredicateHelper`. In this case, we can remove `PredicateHelper`.
- The rule inherit `PlanTest` and `PredicateHelper`. In fact, `PlanTest` already extends `PredicateHelper`.  In this case, we can remove `PredicateHelper`.
- The rule inherit `QueryTest` and `PredicateHelper`. In fact, `QueryTest` already extends `PredicateHelper`.  In this case, we can remove `PredicateHelper`.

### Why are the changes needed?
Remove unnecessary `AliasHelper` or `PredicateHelper` for some rules

### Does this PR introduce _any_ user-facing change?
'No'.
Just improve the inner implementation.

### How was this patch tested?
N/A

Closes apache#37272 from beliefer/SPARK-39858.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* [SPARK-39784][SQL][FOLLOW-UP] Use BinaryComparison instead of Predicate (if) for type check

### What changes were proposed in this pull request?
follow up this [comment](apache#37197 (comment))

### Why are the changes needed?
code simplification

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Existing test

Closes apache#37278 from huaxingao/followup.

Authored-by: huaxingao <huaxin_gao@apple.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

* [SPARK-39909] Organize the check of push down information for JDBCV2Suite

### What changes were proposed in this pull request?
This PR changes the check method from `check(one_large_string)` to `check(small_string1, small_string2, ...)`

### Why are the changes needed?
It can help us check the results individually and make the code more clearer.

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
existing tests

Closes apache#37342 from yabola/fix.

Authored-by: chenliang.lu <marssss2929@gmail.com>
Signed-off-by: huaxingao <huaxin_gao@apple.com>

* [SPARK-39961][SQL] DS V2 push-down translate Cast if the cast is safe

### What changes were proposed in this pull request?
Currently, DS V2 push-down translate `Cast` only if the ansi mode is true.
In fact, if the cast is safe(e.g. cast number to string, cast int to long), we can translate it too.

This PR will call `Cast.canUpCast` so as we can translate `Cast` to V2 `Cast` safely.

Note: The rule `SimplifyCasts` optimize some safe cast, e.g. cast int to long, so we may not see the `Cast`.

### Why are the changes needed?
Add the range for DS V2 push down `Cast`.

### Does this PR introduce _any_ user-facing change?
'Yes'.
`Cast` could be pushed down to data source in more cases.

### How was this patch tested?
Test cases updated.

Closes apache#37388 from beliefer/SPARK-39961.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

* [SPARK-38901][SQL] DS V2 supports push down misc functions

### What changes were proposed in this pull request?

Currently, Spark have some misc functions. Please refer
https://github.com/apache/spark/blob/2f8613f22c0750c00cf1dcfb2f31c431d8dc1be7/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala#L688

These functions show below:
`AES_ENCRYPT,`
`AES_DECRYPT`,
`SHA1`,
`SHA2`,
`MD5`,
`CRC32`

Function|PostgreSQL|ClickHouse|H2|MySQL|Oracle|Redshift|Snowflake|DB2|Vertica|Exasol|SqlServer|Yellowbrick|Mariadb|Singlestore|
-- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
`AesEncrypt`|Yes|Yes|Yes|Yes|Yes|NO|Yes|Yes|NO|NO|NO|Yes|Yes|Yes|
`AesDecrypt`|Yes|Yes|Yes|Yes|Yes|NO|Yes|Yes|NO|NO|NO|Yes|Yes|Yes|
`Sha1`|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|
`Sha2`|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|
`Md5`|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|
`Crc32`|No|Yes|No|Yes|NO|Yes|NO|Yes|NO|NO|NO|NO|NO|Yes|

DS V2 should supports push down these misc functions.

### Why are the changes needed?

DS V2 supports push down misc functions.

### Does this PR introduce _any_ user-facing change?

'No'.
New feature.

### How was this patch tested?

New tests.

Closes apache#37169 from chenzhx/misc.

Authored-by: chenzhx <chen@apache.org>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* [SPARK-39964][SQL] DS V2 pushdown should unify the translate path

### What changes were proposed in this pull request?
Currently, DS V2 pushdown have two translate path `DataSourceStrategy.translateAggregate` used to translate aggregate functions and `V2ExpressionBuilder` used to translate other functions and expressions, we can unify them.

After this PR, the translate have only one code path, developers will easy to coding and reading.

### Why are the changes needed?
Unify the translate path for DS V2 pushdown.

### Does this PR introduce _any_ user-facing change?
'No'.
Just update the inner implementation.

### How was this patch tested?
N/A

Closes apache#37391 from beliefer/SPARK-39964.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* [SPARK-39819][SQL] DS V2 aggregate push down can work with Top N or Paging (Sort with expressions)

### What changes were proposed in this pull request?
Currently, DS V2 aggregate push-down cannot work with DS V2 Top N push-down (`ORDER BY col LIMIT m`) or DS V2 Paging push-down (`ORDER BY col LIMIT m OFFSET n`).
If we can push down aggregate with Top N or Paging, it will be better performance.

This PR only let aggregate pushed down with ORDER BY expressions which must be GROUP BY expressions.

The idea of this PR are:
1. When we give an expectation outputs of `ScanBuilderHolder`, holding the map from expectation outputs to origin expressions (contains origin columns).
2. When we try to push down Top N or Paging, we need restore the origin expressions for `SortOrder`.

### Why are the changes needed?
Let DS V2 aggregate push down can work with Top N or Paging (Sort with group expressions), then users can get the better performance.

### Does this PR introduce _any_ user-facing change?
'No'.
New feature.

### How was this patch tested?
New test cases.

Closes apache#37320 from beliefer/SPARK-39819_new.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* [SPARK-39929][SQL] DS V2 supports push down string  functions(non ANSI)

**What changes were proposed in this pull request?**

support more  commonly used string functions

BIT_LENGTH
CHAR_LENGTH
CONCAT

The mainstream databases support these functions show below.

Function | PostgreSQL | ClickHouse | H2 | MySQL | Oracle | Redshift | Presto | Teradata | Snowflake | DB2 | Vertica | Exasol | SqlServer | Yellowbrick | Impala | Mariadb | Druid | Pig | SQLite | Influxdata | Singlestore | ElasticSearch
-- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
BIT_LENGTH | Yes | Yes | Yes | Yes | Yes | no | no | no | no | Yes | Yes | Yes | no | Yes | no | Yes | no | no | no | no | no | Yes
CHAR_LENGTH | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | no | Yes | Yes | Yes | Yes
CONCAT | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | no | no | no | Yes | Yes

**Why are the changes needed?**
DS V2 supports push down string functions

**Does this PR introduce any user-facing change?**
'No'.
New feature.

How was this patch tested?
New tests.

Closes apache#37427 from zheniantoushipashi/SPARK-39929.

Authored-by: biaobiao.sun <1319027852@qq.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* [SPARK-38899][SQL][FOLLOWUP] Fix bug extract datetime in DS V2 pushdown

### What changes were proposed in this pull request?

[SPARK-38899](apache#36663) supports extract function in JDBC data source.
But the implement is incorrect.
This PR just add a test case and it will be failed !
The test case show below.
```
test("scan with filter push-down with date time functions")  {
    val df9 = sql("SELECT name FROM h2.test.datetime WHERE " +
      "dayofyear(date1) > 100 order by dayofyear(date1) limit 1")
    checkFiltersRemoved(df9)
    val expectedPlanFragment9 =
      "PushedFilters: [DATE1 IS NOT NULL, EXTRACT(DAY_OF_YEAR FROM DATE1) > 100], " +
      "PushedTopN: ORDER BY [EXTRACT(DAY_OF_YEAR FROM DATE1) ASC NULLS FIRST] LIMIT 1,"
    checkPushedInfo(df9, expectedPlanFragment9)
    checkAnswer(df9, Seq(Row("alex")))
  }
```

The test case output failure show below.
```
"== Parsed Logical Plan ==
'GlobalLimit 1
+- 'LocalLimit 1
   +- 'Sort ['dayofyear('date1) ASC NULLS FIRST], true
      +- 'Project ['name]
         +- 'Filter ('dayofyear('date1) > 100)
            +- 'UnresolvedRelation [h2, test, datetime], [], false

== Analyzed Logical Plan ==
name: string
GlobalLimit 1
+- LocalLimit 1
   +- Project [name#x]
      +- Sort [dayofyear(date1#x) ASC NULLS FIRST], true
         +- Project [name#x, date1#x]
            +- Filter (dayofyear(date1#x) > 100)
               +- SubqueryAlias h2.test.datetime
                  +- RelationV2[NAME#x, DATE1#x, TIME1#x] h2.test.datetime test.datetime

== Optimized Logical Plan ==
Project [name#x]
+- RelationV2[NAME#x] test.datetime

== Physical Plan ==
*(1) Scan org.apache.spark.sql.execution.datasources.v2.jdbc.JDBCScan$$anon$145f6181a [NAME#x] PushedFilters: [DATE1 IS NOT NULL, EXTRACT(DAY_OF_YEAR FROM DATE1) > 100], PushedTopN: ORDER BY [org.apache.spark.sql.connector.expressions.Extract3b95fce9 ASC NULLS FIRST] LIMIT 1, ReadSchema: struct<NAME:string>

" did not contain "PushedFilters: [DATE1 IS NOT NULL, EXTRACT(DAY_OF_YEAR FROM DATE1) > 100], PushedTopN: ORDER BY [EXTRACT(DAY_OF_YEAR FROM DATE1) ASC NULLS FIRST] LIMIT 1,"
```

### Why are the changes needed?

Fix an implement bug.
The reason of the bug is the Extract the function does not implement the toString method when pushing down the JDBC data source.

### Does this PR introduce _any_ user-facing change?

'No'.
New feature.

### How was this patch tested?

New test case.

Closes apache#37469 from chenzhx/spark-master.

Authored-by: chenzhx <chen@apache.org>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* code update

Signed-off-by: huaxingao <huaxin_gao@apple.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Co-authored-by: huaxingao <huaxin_gao@apple.com>
Co-authored-by: Jiaan Geng <beliefer@163.com>
Co-authored-by: chenliang.lu <marssss2929@gmail.com>
Co-authored-by: biaobiao.sun <1319027852@qq.com>
RolatZhang pushed a commit to RolatZhang/spark that referenced this pull request Dec 8, 2023
…aging (Sort with expressions) (apache#525)

* [SPARK-39784][SQL] Put Literal values on the right side of the data source filter after translating Catalyst Expression to data source filter

Even though the literal value could be on both sides of the filter, e.g. both `a > 1` and `1 < a` are valid, after translating Catalyst Expression to data source filter, we want the literal value on the right side so it's easier for the data source to handle these filters. We do this kind of normalization for V1 Filter. We should have the same behavior for V2 Filter.

Before this PR, for the filters that have literal values on the right side, e.g. `1 > a`, we keep it as is. After this PR, we will normalize it to `a < 1` so the data source doesn't need to check each of the filters (and do the flip).

I think we should follow V1 Filter's behavior, normalize the filters during catalyst Expression to DS Filter translation time to make the literal values on the right side, so later on, data source doesn't need to check every single filter to figure out if it needs to flip the sides.

no

new test

Closes apache#37197 from huaxingao/flip.

Authored-by: huaxingao <huaxin_gao@apple.com>
Signed-off-by: huaxingao <huaxin_gao@apple.com>

* [SPARK-39836][SQL] Simplify V2ExpressionBuilder by extract common method

Currently, `V2ExpressionBuilder` have a lot of similar code, we can extract them as one common method.

We can simplify the implement with the common method.

Simplify `V2ExpressionBuilder` by extract common method.

'No'.
Just update inner implementation.

N/A

Closes apache#37249 from beliefer/SPARK-39836.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* [SPARK-39858][SQL] Remove unnecessary `AliasHelper` or `PredicateHelper` for some rules

When I using `AliasHelper`, I find that some rules inherit it instead of using it.

This PR removes unnecessary `AliasHelper` or `PredicateHelper` in the following cases:
- The rule inherit `AliasHelper` instead of using it. In this case, we can remove `AliasHelper` directly.
- The rule inherit `PredicateHelper` instead of using it. In this case, we can remove `PredicateHelper` directly.
- The rule inherit `AliasHelper` and `PredicateHelper`. In fact, `PredicateHelper` already extends `AliasHelper`. In this case, we can remove `AliasHelper`.
- The rule inherit `OperationHelper` and `PredicateHelper`. In fact, `OperationHelper` already extends `PredicateHelper`. In this case, we can remove `PredicateHelper`.
- The rule inherit `PlanTest` and `PredicateHelper`. In fact, `PlanTest` already extends `PredicateHelper`.  In this case, we can remove `PredicateHelper`.
- The rule inherit `QueryTest` and `PredicateHelper`. In fact, `QueryTest` already extends `PredicateHelper`.  In this case, we can remove `PredicateHelper`.

Remove unnecessary `AliasHelper` or `PredicateHelper` for some rules

'No'.
Just improve the inner implementation.

N/A

Closes apache#37272 from beliefer/SPARK-39858.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* [SPARK-39784][SQL][FOLLOW-UP] Use BinaryComparison instead of Predicate (if) for type check

follow up this [comment](apache#37197 (comment))

code simplification

No

Existing test

Closes apache#37278 from huaxingao/followup.

Authored-by: huaxingao <huaxin_gao@apple.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

* [SPARK-39909] Organize the check of push down information for JDBCV2Suite

This PR changes the check method from `check(one_large_string)` to `check(small_string1, small_string2, ...)`

It can help us check the results individually and make the code more clearer.

no

existing tests

Closes apache#37342 from yabola/fix.

Authored-by: chenliang.lu <marssss2929@gmail.com>
Signed-off-by: huaxingao <huaxin_gao@apple.com>

* [SPARK-39961][SQL] DS V2 push-down translate Cast if the cast is safe

Currently, DS V2 push-down translate `Cast` only if the ansi mode is true.
In fact, if the cast is safe(e.g. cast number to string, cast int to long), we can translate it too.

This PR will call `Cast.canUpCast` so as we can translate `Cast` to V2 `Cast` safely.

Note: The rule `SimplifyCasts` optimize some safe cast, e.g. cast int to long, so we may not see the `Cast`.

Add the range for DS V2 push down `Cast`.

'Yes'.
`Cast` could be pushed down to data source in more cases.

Test cases updated.

Closes apache#37388 from beliefer/SPARK-39961.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

* [SPARK-38901][SQL] DS V2 supports push down misc functions

Currently, Spark have some misc functions. Please refer
https://github.com/apache/spark/blob/2f8613f22c0750c00cf1dcfb2f31c431d8dc1be7/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala#L688

These functions show below:
`AES_ENCRYPT,`
`AES_DECRYPT`,
`SHA1`,
`SHA2`,
`MD5`,
`CRC32`

Function|PostgreSQL|ClickHouse|H2|MySQL|Oracle|Redshift|Snowflake|DB2|Vertica|Exasol|SqlServer|Yellowbrick|Mariadb|Singlestore|
-- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
`AesEncrypt`|Yes|Yes|Yes|Yes|Yes|NO|Yes|Yes|NO|NO|NO|Yes|Yes|Yes|
`AesDecrypt`|Yes|Yes|Yes|Yes|Yes|NO|Yes|Yes|NO|NO|NO|Yes|Yes|Yes|
`Sha1`|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|
`Sha2`|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|
`Md5`|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|
`Crc32`|No|Yes|No|Yes|NO|Yes|NO|Yes|NO|NO|NO|NO|NO|Yes|

DS V2 should supports push down these misc functions.

DS V2 supports push down misc functions.

'No'.
New feature.

New tests.

Closes apache#37169 from chenzhx/misc.

Authored-by: chenzhx <chen@apache.org>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* [SPARK-39964][SQL] DS V2 pushdown should unify the translate path

Currently, DS V2 pushdown have two translate path `DataSourceStrategy.translateAggregate` used to translate aggregate functions and `V2ExpressionBuilder` used to translate other functions and expressions, we can unify them.

After this PR, the translate have only one code path, developers will easy to coding and reading.

Unify the translate path for DS V2 pushdown.

'No'.
Just update the inner implementation.

N/A

Closes apache#37391 from beliefer/SPARK-39964.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* [SPARK-39819][SQL] DS V2 aggregate push down can work with Top N or Paging (Sort with expressions)

Currently, DS V2 aggregate push-down cannot work with DS V2 Top N push-down (`ORDER BY col LIMIT m`) or DS V2 Paging push-down (`ORDER BY col LIMIT m OFFSET n`).
If we can push down aggregate with Top N or Paging, it will be better performance.

This PR only let aggregate pushed down with ORDER BY expressions which must be GROUP BY expressions.

The idea of this PR are:
1. When we give an expectation outputs of `ScanBuilderHolder`, holding the map from expectation outputs to origin expressions (contains origin columns).
2. When we try to push down Top N or Paging, we need restore the origin expressions for `SortOrder`.

Let DS V2 aggregate push down can work with Top N or Paging (Sort with group expressions), then users can get the better performance.

'No'.
New feature.

New test cases.

Closes apache#37320 from beliefer/SPARK-39819_new.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* [SPARK-39929][SQL] DS V2 supports push down string  functions(non ANSI)

**What changes were proposed in this pull request?**

support more  commonly used string functions

BIT_LENGTH
CHAR_LENGTH
CONCAT

The mainstream databases support these functions show below.

Function | PostgreSQL | ClickHouse | H2 | MySQL | Oracle | Redshift | Presto | Teradata | Snowflake | DB2 | Vertica | Exasol | SqlServer | Yellowbrick | Impala | Mariadb | Druid | Pig | SQLite | Influxdata | Singlestore | ElasticSearch
-- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
BIT_LENGTH | Yes | Yes | Yes | Yes | Yes | no | no | no | no | Yes | Yes | Yes | no | Yes | no | Yes | no | no | no | no | no | Yes
CHAR_LENGTH | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | no | Yes | Yes | Yes | Yes
CONCAT | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | no | no | no | Yes | Yes

**Why are the changes needed?**
DS V2 supports push down string functions

**Does this PR introduce any user-facing change?**
'No'.
New feature.

How was this patch tested?
New tests.

Closes apache#37427 from zheniantoushipashi/SPARK-39929.

Authored-by: biaobiao.sun <1319027852@qq.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* [SPARK-38899][SQL][FOLLOWUP] Fix bug extract datetime in DS V2 pushdown

[SPARK-38899](apache#36663) supports extract function in JDBC data source.
But the implement is incorrect.
This PR just add a test case and it will be failed !
The test case show below.
```
test("scan with filter push-down with date time functions")  {
    val df9 = sql("SELECT name FROM h2.test.datetime WHERE " +
      "dayofyear(date1) > 100 order by dayofyear(date1) limit 1")
    checkFiltersRemoved(df9)
    val expectedPlanFragment9 =
      "PushedFilters: [DATE1 IS NOT NULL, EXTRACT(DAY_OF_YEAR FROM DATE1) > 100], " +
      "PushedTopN: ORDER BY [EXTRACT(DAY_OF_YEAR FROM DATE1) ASC NULLS FIRST] LIMIT 1,"
    checkPushedInfo(df9, expectedPlanFragment9)
    checkAnswer(df9, Seq(Row("alex")))
  }
```

The test case output failure show below.
```
"== Parsed Logical Plan ==
'GlobalLimit 1
+- 'LocalLimit 1
   +- 'Sort ['dayofyear('date1) ASC NULLS FIRST], true
      +- 'Project ['name]
         +- 'Filter ('dayofyear('date1) > 100)
            +- 'UnresolvedRelation [h2, test, datetime], [], false

== Analyzed Logical Plan ==
name: string
GlobalLimit 1
+- LocalLimit 1
   +- Project [name#x]
      +- Sort [dayofyear(date1#x) ASC NULLS FIRST], true
         +- Project [name#x, date1#x]
            +- Filter (dayofyear(date1#x) > 100)
               +- SubqueryAlias h2.test.datetime
                  +- RelationV2[NAME#x, DATE1#x, TIME1#x] h2.test.datetime test.datetime

== Optimized Logical Plan ==
Project [name#x]
+- RelationV2[NAME#x] test.datetime

== Physical Plan ==
*(1) Scan org.apache.spark.sql.execution.datasources.v2.jdbc.JDBCScan$$anon$145f6181a [NAME#x] PushedFilters: [DATE1 IS NOT NULL, EXTRACT(DAY_OF_YEAR FROM DATE1) > 100], PushedTopN: ORDER BY [org.apache.spark.sql.connector.expressions.Extract3b95fce9 ASC NULLS FIRST] LIMIT 1, ReadSchema: struct<NAME:string>

" did not contain "PushedFilters: [DATE1 IS NOT NULL, EXTRACT(DAY_OF_YEAR FROM DATE1) > 100], PushedTopN: ORDER BY [EXTRACT(DAY_OF_YEAR FROM DATE1) ASC NULLS FIRST] LIMIT 1,"
```

Fix an implement bug.
The reason of the bug is the Extract the function does not implement the toString method when pushing down the JDBC data source.

'No'.
New feature.

New test case.

Closes apache#37469 from chenzhx/spark-master.

Authored-by: chenzhx <chen@apache.org>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* code update

Signed-off-by: huaxingao <huaxin_gao@apple.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Co-authored-by: huaxingao <huaxin_gao@apple.com>
Co-authored-by: Jiaan Geng <beliefer@163.com>
Co-authored-by: chenliang.lu <marssss2929@gmail.com>
Co-authored-by: biaobiao.sun <1319027852@qq.com>
udaynpusa pushed a commit to mapr/spark that referenced this pull request Jan 30, 2024
…cript

K8S-1077 (apache#598)

* K8S-1077 - use single k8s secret with user info

MapR [SPARK-651] Replacing joda-time-*.jar with joda-time-2.10.3.jar.

MapR [SPARK-638] Wrong permissions when creating files under directory
with GID bit set.

MapR [SPARK-627] SparkHistoryServer-2.4 is getting 403 Unauthorized home page for users(spark.ui.view.acls) via spark-submit

MapR [SPARK-639] Default headers are adding two times

MapR [SPARK-629] Spark UI for job lose CSS styles

MapR [MS-925] After upgrade to MEP 6.2 (Spark 2.4.0) can no longer
consume Kafka / MapR Streams.

MapR [SPARK-626] Update kafka dependencies for Spark 2.4.4.0 in release MEP-6.3.0

MapR [SPARK-340] Jetty web server version at Spark should be updated tp v9.4.X

MapR [SPARK-617] an't use ssl via spark beeline

MapR [SPARK-617] Can't use ssl via spark beeline

MapR [SPARK-620] Replace core dependency in Spark-2.4.4

MapR [SPARK-621] Fix multiple XML configuration initialization for (apache#575)

custom headers. Use X-XSS-Protection, X-Content-Type-Options
Content-Security-Policy and Strict-Transport-Security configuration
only in case: cluster security is enabled OR
spark.ui.security.headers.enabled set to true.

MapR [SPARK-595] Spark cannot access hs2 through zookeeper

Revert "MapR [SPARK-595] Spark cannot access hs2 through zookeeper (apache#577)"

MapR [SPARK-595] Spark cannot access hs2 through zookeeper

MapR [SPARK-620] Replace core dependency in Spark-2.4.

MapR [SPARK-619] Move absent commits from 2.4.3 branch to 2.4.4 (apache#574)

* Adding SQL API to write to kafka from Spark (apache#567)

* Branch 2.4.3 extended kafka and examples (apache#569)

* The v2 API is in its own package

- the v2 api is in a different package
- the old functionality is available in a separated package

* v2 API examples

- All the examples are using the newest API.
- I have removed the old examples since they are not relevant any more and the same functionality is shown in the new examples usin the new API.

* MapR [SPARK-619] Move absent commits from 2.4.3 branch to 2.4.4

CORE-321. Add custom http header support for jetty.

MapR [SPARK-609] Port Apache Spark-2.4.4 changes to the MapR Spark-2.4.4 branch

Adding multi table loader (apache#560)

* Adding multi table loader

- This allows us to load multiple matching tables into one Union DataFrame.

If we have the fallowing MFS structure:

```
/clients/client_1/data.table
/clients/client_2/data.table
```
we can load a union dataframe by doing `loadFromMapRDB("/clients/*/*.table")`

* Fixing the path to the reader

MapR [SPARK-588] Spark thriftserver fails when work with hive-maprdb json table

MapR [SPARK-598] Spark can't add needed properties to hive-site.xml

MAPR-SPARK-596: Change HBase compatible version for Spark 2.4.3

MapR [SPARK-592] Add possibility to use start-thriftserver.sh script with 2304 port

MapR [SPARK-584] MaprDB connector's setHintUsingIndex method doesn't work as expected

MapR [SPARK-583] MaprDB connector's loadFromMaprDB function for Java API doesn't work as expected

SPARK-579 info about ssl_trustore is added for metrics

MapR [SPARK-552] Failed to get broadcast_11_piece0 of broadcast_11

SPARK-569 Generation of SSL ceritificates for spark UI

MapR [SPARK-575] Warning messages in spark workspace after the second attempt to login to job's UI

Update zookeeper version

Adding `joinWithMapRDBTable` function (apache#529)

The related documentation of this function is here https://github.com/anicolaspp/MapRDBConnector#joinwithmaprdbtable.

The main idea is that having a dataframe (no matter how was it constructed) we can join it with a MapR-DB table. This functions looks at the join query and load only those records from MapR-DB that will join instead of loading the full table and then join in memory. In other words, we only load what we know will be joint.

Adding DataSource Reader Support (apache#525)

* Adding DataSource Reader Support

* Update SparkSessionExt.scala

* creating a package object

* Update MapRDBSpark.scala

* fully path to avoid name collition

* refactorings

MapR [SPARK-451] Spark hadoop/core dependency updates

MapR [SPARK-566] Move absent commits from 2.4.0 branch

MapR [SPARK-561] Spark 2.4.3 porting to MapR

MapR [SPARK-561] Spark 2.4.3 porting to MapR

MapR [SPARK-558] Render application UI init page if driver is not up

MapR [SPARK-541] Avoid duplication of the first unexpired record

MapR [COLD-150][K8S] Fix metrics copy

MapR [K8S-893] Hide plain text password from logs

MapR [SPARK-540] Include 'avro' artifacts

MapR [SPARK-536] PySpark streaming package for kafka-0-10 added

K8S-853: Enable spark metrics for external tenant

MapR [SPARK-531] Remove duplicating entries from classpath in ClasspathFilter

MapR [SPARK-516] Spark jobs failure using yarn mode on kerberos fixed

MapR [SPARK-462] Spark and SparkHistoryServer allow week ciphers, which can allow man in the middle attack

[SPARK-508] MapR-DB OJAI Connector for Spark isNull condition returns incorrect result

MapR [SPARK-510] nonmapr "admin" users not able to view other user logs in SHS

SPARK-460: Spark Metrics for CollectD Configuration for collecting Spark metrics

SPARK-463 MAPR_MAVEN_REPO variable for specifying mapR repository

MapR [SPARK-492] Spark 2.4.0.0 configure.sh has error messages

MapR [SPARK-515][K8S] Remove configure.sh call for k8s

MapR [SPARK-515] Move configuring spark-env.sh back to the private-pkg

MapR [SPARK-515] Move configuring spark-env.sh back to the private-pkg

MapR [SPARK-514] Recovery from checkpoint is broken

MapR [SPARK-445] Messages loss fixed by reverting [MAPR-32290] changes from kafka09 package (apache#460)

* MapR [SPARK-445] Revert "[MAPR-32290] Spark processing offsets when messages are already TTL in the first batch (apache#376)"

This reverts commit e8d59b9.

* MapR [SPARK-445] Revert "[MAPR-32290] Spark processing offsets when messages are already ttl in first batch (apache#368)"

This reverts commit b282a8b.

MapR [SPARK-445] Messages loss fixed by reverting [MAPR-32290] changes from kafka10 package

MapR [SPARK-469] Fix NPE in generated classes by reverting "[SPARK-23466][SQL] Remove redundant null checks in generated Java code by GenerateUnsafeProjection" (apache#455)

This reverts commit c5583fd.

MapR [SPARK-482] Spark streaming app fails to start by UnknownTopicOrPartitionException with checkpoint

MapR [SPARK-496] Spark HS UI doesn't work

MapR [SPARK-416] CVE-2018-1320 vulnerability in Apache Thrift

MapR [SPARK-486][K8S] Fix sasl encryption error on Kubernetes

MapR [SPARK-481] Cannot run spark configure.sh on Client node

MapR [K8S-637][K8S] Add configure.sh configuration in spark-defaults.conf for job runtime

MapR [SPARK-465] Error messages after update of spark 2.4

MapR [SPARK-465] Error messages after update of spark 2.4

MapR [SPARK-464] Can't submit spark 2.4 jobs from mapr-client

[SPARK-466] SparkR errors fixed

MapR [SPARK-456] Spark shell can't be started

SPARK-417 impersonation fixes for spark executor. Impersonation is mo… (apache#433)

* SPARK-417 impersonation fixes for spark executor. Impersonation is moved from HadoopRDD.compute() method to org.apache.spark.executor.Executor.run() method

* SPARK-363 Hive version changed to '1.2.0-mapr-spark-MEP-6.0.0'

[SPARK-449] Kafka offset commit issue fixed

MapR [SPARK-287] Move logic of creating /apps/spark folder from installer's scripts to the configure.sh

MapR [SPARK-221] Investigate possibility to move creating of the spark-env.sh from private-pkg to configure.sh

MapR [SPARK-430] PID files should be under /opt/mapr/pid

MapR [SPARK-446] Spark configure.sh doesn't start/stop Spark services

MapR [SPARK-434] Move absent commits from 2.3.2 branch (apache#425)

* MapR [SPARK-352] Spark shell fails with "NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream" if java is not available in PATH

* MapR [SPARK-350] Deprecate Spark Kafka-09 package

* MapR [SPARK-326] Investigate possibility of writing Java example for the MapRDB OJAI connector

* [SPARK-356] Merge mapr changes from kafka-09 package into the kafka-10

* SPARK-319 Fix for sparkR version check

* MapR [SPARK-349] Update OJAI client to v3 for Spark MapR-DB JSON connector

* MapR [SPARK-367] Move absent commits from 2.3.1 branch

* MapR [SPARK-137] Analyze the warning during compilation of OJAI connector

* MapR [SPARK-369] Spark 2.3.2 fails with error related to zookeeper

* [MAPR-26258] hbasecontext.HBaseDistributedScanExample fails

* [SPARK-24355] Spark external shuffle server improvement to better handle block fetch requests

* MapR [SPARK-374] Spark Hive example fails when we submit job from another(simple) cluster user

* MapR [SPARK-434] Move absent commits from 2.3.2 branch

* MapR [SPARK-434] Move absent commits from 2.3.2 branch

* MapR [SPARK-373] Unexpected behavior during job running in standalone cluster mode

* MapR [SPARK-419] Update hive-maprdb-json-handler jar for spark 2.3.2.0 and spark 2.2.1

* MapR [SPARK-396] Interface change of sendToKafka

* MapR [SPARK-357] consumer groups are prepeneded with a "service_" prefix

* MapR [SPARK-429] Changes in maprdb connector are the cause of broken backward compatibility

* MapR [SPARK-427] Update kafka in Spark-2.4.0 to the 1.1.1-mapr

* MapR [SPARK-434] Move absent commits from 2.3.2 branch

* Move absent commits from 2.3.2 branch

* MapR [SPARK-434] Move absent commits from 2.3.2 branch

* Move absent commits from 2.3.2 branch

* Move absent commits from 2.3.2 branch

MapR [SPARK-427] Update kafka in Spark-2.4.0 to the 1.1.1-mapr

MapR [SPARK-379] Spark 2.4 4-gidit version

MapR [PIC-48][K8S] Port k8s changes to 2.4.0

[PIC-48] Create user for k8s driver and executor if required

[PIC-48] Create user for k8s driver and executor if required

Revert "Remove spark.ui.filters property"

This reverts commit d8941ba36c3451cdce15d18d6c1a52991de3b971.

[SPARK-351] Copy kubernetes start scripts anyway

PIC-34: Rename default configmap name to be consistent with mapr-kubernetes

[SPARK-23668][K8S] Add config option for passing through k8s Pod.spec.imagePullSecrets (apache#355)

Pass through the `imagePullSecrets` option to the k8s pod in order to allow user to access private image registries.

See https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/

Unit tests + manual testing.

Manual testing procedure:
1. Have private image registry.
2. Spark-submit application with no `spark.kubernetes.imagePullSecret` set. Do `kubectl describe pod ...`. See the error message:
```
Error syncing pod, skipping: failed to "StartContainer" for "spark-kubernetes-driver" with ErrImagePull: "rpc error: code = 2 desc = Error: Status 400 trying to pull repository ...: \"{\\n  \\\"errors\\\" : [ {\\n    \\\"status\\\" : 400,\\n    \\\"message\\\" : \\\"Unsupported docker v1 repository request for '...'\\\"\\n  } ]\\n}\""
```
3. Create secret `kubectl create secret docker-registry ...`
4. Spark-submit with `spark.kubernetes.imagePullSecret` set to the new secret. See that deployment was successful.

Author: Andrew Korzhuev <andrew.korzhuev@klarna.com>
Author: Andrew Korzhuev <korzhuev@andrusha.me>

Closes apache#20811 from andrusha/spark-23668-image-pull-secrets.

[SPARK-321] Change default value of spark.mapr.ssl.secret.prefix property

[PIC-32] Spark on k8s with MapR secure cluster

Update entrypoint.sh with correct spark version (apache#340)

This PR has minor fix to correct the spark version string

[SPARK-274] Create home directory for user who submitted job

[MAPR-SPARK-230] Implement security for Spark on Kubernetes

Run Spark job with specify the username for driver and executor

Read cluster configs from configMap

Run configure.sh script form entrypoint.sh

Remove spark.kubernetes.driver.pod.commands property

Add Spark properties for executor and driver environment variable

MapR [SPARK-296] Structured Streaming memory leak

Revert "[MAPR-SPARK-210] Rename sprk-defaults.conf to spark-defaults.conf.tem…" (apache#252)

* Revert "[MAPR-SPARK-176] Fix Spark Project Catalyst unit tests (apache#251)"

This reverts commit 5de05075cd14abf8ac65046a57a5d76617818fbe.

* Revert "[MAPR-SPARK-210] Rename sprk-defaults.conf to spark-defaults.conf.template (apache#249)"

This reverts commit 1baa677d727e89db7c605ffbae9a9eba00337ad0.

[MAPR-SPARK-210] Rename sprk-defaults.conf to spark-defaults.conf.template

MapR [SPARK-379] Port Spark to 2.4.0

MapR [SPARK-341] Spark 2.3.2 porting

[MAPR-32290] Spark processing offsets when messages are already TTL in the first batch

* Bug 32263 - Seek called on unsubscribed partitions

[MSPARK-331] Remove snapshot versions of mapr dependencies from Spark-2.3.1

[MAPR-32290] Spark processing offsets when messages are already ttl in first batch

MapR [SPARK-325] Add examples for work with the MapRDB JSON connector into the Spark project

[ATS-449] Unit test for EBF 32013 created.

MAPR-SPARK-311: Spark beeline uses default ssl truststore instead of mapr ssl truststore

Bug 32355 - Executor tab empty on Spark UI

[SPARK-318] Submitting Spark jobs from Oozie fails due to ClassNotFoundException

Bug 32014 - Spark Consumer fails with java.lang.AssertionError

Revert "[SPARK-306] Kafka clients 1.0.1 present in jars directory for Spark 2.3.1" (apache#341)

* Revert "[SPARK-306] Kafka clients 1.0.1 present in jars directory for Spark 2.3.1 (apache#335)"

This reverts commit 832411e.

Bug 32014 - Spark Consumer fails with java.lang.AssertionError (apache#326) (apache#336)

* MapR [32014] Spark Consumer fails with java.lang.AssertionError

[SPARK-306] Kafka clients 1.0.1 present in jars directory for Spark 2.3.1

DEVOPS-2768 temporarily removed curl for file downloading

[SPARK-302] Local privilege escalation

MapR [SPARK-297] Added unit test for empty value conversion

MapR [SPARK-297] Empty values are loaded as non-null

MapR [SPARK-296] Structured Streaming memory leak

2.3.1 spark 289 (apache#318)

* MapR [SPARK-289] Fix unit test for Spark-2.3.1

[SPARK-130] MapRDB connector - NPE while saving Pair RDD with 'null' values

MapR [SPARK-283] Unit tests fail during initialization SSL properties.

[SPARK-212] SparkHiveExample fails when we run it twice

MapR [SPARK-282] Remove maprfs and hadoop jars from mapr spark package

MapR [SPARK-278] Spark submit fails for jobs with python

MapR [SPARK-279] Can't connect to spark thrift server with new spark and hive packages

MapR [SPARK-276] Update zookeeper dependency to v.3.4.11 for spark 2.3.1

MapR [SPARK-272] Use only client passwords from ssl-client.xml

MapR [SPARK-266] Spark jobs can't finish correctly, when there is an error during job running

MapR [SPARK-263] Add possibility to use keyPassword which is different from keyStorePassword

[MSPARK-31632] RM UI showing broken page for Spark jobs

MapR [SPARK-261] Use mapr-security-web for getting passwords.

MapR [SPARK-259] Spark application doesn't finish correctly

MapR [SPARK-268] Update Spark version for Warden

change project version to 2.3.1-mapr-SNAPSHOT

MapR [SPARK-256] Spark doesn't work on yarn mode

MapR [SPARK-255] Installer fresh install 610/600 secure fails to start "mapr-spark-thriftserver", "mapr-spark-historyserver"

Mapr [SPARK-248] MapRDBTableScanRDD fails to convert to Scala Dataframe when using where clause

MapR [SPARK-225] Hadoop credentials provider usage for hiding passwords at spark-defaults

MapR [SPARK-214] Hive-2.1 poperties can't be read from a hive-site.xml as Spark uses Hive-1.2

MapR [SPARK-216] Spark thriftserver fails when work with hive-maprdb json table

SPARK-244 (apache#278)

Provide ability to use MapR-Negotiation authentication for Spark HistoryServer

MapR [SPARK-226] Spark - pySpark Security Vulnerability

MapR [SPARK-220] SparkR fails with UDF functions bug fixed

MapR [SPARK-227] KafkaUtils.createDirectStream fails with kafka-09

MapR [SPARK-183] Spark Integration for Kafka 0.10 unit tests disabled

MapR [SPARK-182] Spark Project External Kafka Producer v09 unit tests fixed

MapR [SPARK-179] Spark Integration for Kafka 0.9 unit tests fixed

MapR [SPARK-181] Kafka 0.10 Structured Streaming unit tests fixed

[MSPARK-31305] Spark History server NOT loading applications submitted by users other than 'mapr'

MapR [SPARK-175] Fix Spark Project Streaming unit tests

[MAPR-SPARK-176] Fix Spark Project Catalyst unit tests

[MAPR-SPARK-178] Fix Spark Project Hive unit tests

MapR [SPARK-174] Spark Core unit tests fixed

Changed version for spark-kafka connector.

MapR [SPARK-202] Update MapR Spark to 2.3.0

Fixed compile time errors in tests

Change project version

[SPARK-198] Update hadoop dependency version to 2.7.0-mapr-1803 for Spark 2.2.1

MapR [SPARK-188] Couldn't connect to thrift server via spark beeline on kerberos cluster

MapR [SPARK-143] Spark History Server does not require login for secured-by-default clusters

MapR [SPARK-186] Update OJAI versions to the latest for Spark-2.2.1 OJAI Connector

MapR [SPARK-191] Incorrect work of MapR-DB Sink 'complete' and 'update' modes fixed

MapR [SPARK-170] StackOverflowException in equals method in DBMapValue

2.2.1 build fixed (apache#231)

* MapR [SPARK-164] Update Kafka version to 1.0.1-mapr in Spark Kafka Producer module

MapR [SPARK-161] Include Kafka Structured streaming jar to Spark package.

MapR [SPARK-155] Change Spark Master port from 8080

MapR [SPARK-153] Exception in spark job with configured labels on yarn-client mode

MapR [SPARK-152] Incorrect date string parsing fixed

MapR [SPARK-21] Structured Streaming MapR-DB Sink created

MapR [SPARK-135]  Spark 2.2 with MapR Streams ( Kafka 1.0) (apache#218)

* MapR [SPARK-135] Spark 2.2 with MapR Streams (Kafka 1.0)
Added functionality of MapR-Streams specific EOF handling.

MapR [SPARK-143] Spark History Server does not require login for secured-by-default clusters

Disable build failing if scalastyle checking is fall.

MapR [SPARK-16] Change Spark version in Warden files and configure.sh

MapR [SPARK-144] Add insertToMapRDB method for rdd for Java API

[MAPR-30536]  Spark SQL queries on Map column fails after upgrade

MapR [SPARK-139] Remove "update" related APIs from connector

MapR [SPARK-140] Change the option name "tableName" to "tablePath" in the Spark/MapR-DB connectors.

MapR [SPARK-121] Spark OJAI JAVA: update functionality removed

MapR [SPARK-118] Spark OJAI Python: missed DataFrame import while moving imports in order to fix MapR [ZEP-101] interpreter issue

MapR [SPARK-118] Spark OJAI Python: move MapR DB Connector class importing in order to fix MapR [ZEP-101] interpreter issue

MapR [SPARK-117] Spark OJAI Python: Save functionality implementation

MapR [SPARK-131] Exception when try to save JSON table with Binary _id field

Spark OJAI JAVA: load to RDD, save from RDD implementation (apache#195)

* MapR [SPARK-124] Loading to JavaRDD implemented
* MapR [SPARK-124] MapRDBJavaSparkContext constructor changed
* MapR [SPARK-124] implemented RDD[Row] saving

MapR [SPARK-118] Spark OJAI Python: Read implementation

MapR [SPARK-128] MapRDB connector - wrong handle of null fields when nullable is false

* MapR [SPARK-121] Spark OJAI JAVA: Read to Dataset functionality implementation
* Minor refactoring

MapR [SPARK-125] Default value of idFieldPath parameter is not handle

MapR [SPARK-113] Hit java.lang.UnsupportedOperationException: empty.reduceLeft during loadFromMapRDB

Spark Mapr-DB connector was refactored according to Scala style
Removed code duplication

[MSPARK-107]idField information is lost in MapRDBDataFrameWriterFunctions.saveToMapRDB

configure.sh takes options to change ports

Kafka client excluded from package because correct version is located in "mapr classpath"

Changed Kafka version in Kafka producer module.

Branch spark 69 (apache#170)

* Fixing the wrong type casting of TimeStamp to OTimeStamp when read from spark dataFrame.

* SPARK-69: Problem with license when we try to read from json and write to maprdb

remove creatin /usr/local/spark link from configure.sh. This link will be creates by private-pkg

remove include-maprdb from default profiles

added profiles in maprdb pom file instead of two pom files

Fixed maprdb connector dependencies.

Fixing the wrong type casting of TimeStamp to OTimeStamp when read from spark dataFrame.

changed port for spark-thriftserver as it conflicts with hive server

changed port for spark-thriftserver as it conflicts with hive server

remove .not_configured_yet file after success

Ojai connector fixed required java version

[MSPARK-45] Move Spark-OJAI connector code to Spark github repo (apache#132)

* SPARK-45 Move Spark-OJAI connector code to Spark github repo

* Fixing pom versions for maprdb spark connector.

* Changes made to the connector code to be compatible with 5.2.* and 6.0 clients.

Spark 2.1.0 mapr 29106 (apache#150)

* [SPARK-20922][CORE] Add whitelist of classes that can be deserialized by the launcher.

Blindly deserializing classes using Java serialization opens the code up to
issues in other libraries, since just deserializing data from a stream may
end up execution code (think readObject()).

Since the launcher protocol is pretty self-contained, there's just a handful
of classes it legitimately needs to deserialize, and they're in just two
packages, so add a filter that throws errors if classes from any other
package show up in the stream.

This also maintains backwards compatibility (the updated launcher code can
still communicate with the backend code in older Spark releases).

Tested with new and existing unit tests.

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes apache#18166 from vanzin/SPARK-20922.

(cherry picked from commit 8efc6e9)
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>

(cherry picked from commit 772a9b9)

* [SPARK-20922][CORE][HOTFIX] Don't use Java 8 lambdas in older branches.

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes apache#18178 from vanzin/SPARK-20922-hotfix.

Added security by default for historyserver

use waitForConsumerAssignment() instead of consumer.poll(0) for spark-29052

change MAPR_HADOOP_CLASSPATH in configure.sh for creating it by mapr-classpath.sh

change MAPR_HADOOP_CLASSPATH in configure.sh for creating it by mapr-classpath.sh

changes for mapr-classpath.sh

changes for mapr-classpath.sh

configure.sh changes

[SPARK-39] Classpath filter was added

Fixed impersonation when data read from MapR-DB via Spark-Hive.

added configure.sh and warden.spark-thriftserver.conf

hive-hbase-handler added to Spark jars

Fixed "Single message comes late"

28339 bug fixed

Spark streaming skipped message with zero offset from Kafka 0.9

[MSPARK-9] Initial fix for Spark unit tests

Bump dependencies after ECO-1703 release

[SPARK-33] Streaming example fixed

[MAPR-26060] Fixed case when mapr-streams make gaps in offsets

ported features from kafka 10 to kafka 9

[MAPR-26289][SPARK-2.1] Streaming general improvements (apache#93)

* Added include-kafka-09 profile to Assembly
* Set default poll timeout to 120s

Set default HBase verison to 1.1.8

Changes from Kafka10  package were ported to Kafka09 package.

[MAPR-26053] Include MapR Classes to the default value of spark.sql.hive.metastore.sharedPrefixes

[MAPR-25807] Spark-Warehouse path computes incorrectly

Add MapR-SASL support for Thrift Server

Adding scala library.

[MAPR-25713] Spark might try to load MapR Class Loader multiple times and fail

[MAPR-25311] Bump Spark dependencies after ECO-1611 release

[MINOR] Fix spark-jars.sh script

[MAPR-24603] Could not launch beeline shell after starting spark thrift server

fixed syntax error in V09DirectKafkaWordCount example

Spark 2.0.1 MAPR-streams Python API

[MAPR-24415] SPARK_JAVA_OPTS is deprecated

Kafka streaming producer added.

Minor fix for previous commit

Added script for MAPR-24374

Some minor changes to spark-defaults.conf

Changed default HBase version to 1.1.1 in compatibility.version

Streaming example was refactored

[MAPR-24470] HiveFromSpark test fails in yarn-cluster mode

Added MapR Repo

[MAPR-22940] Failed to connect spark beeline (after spark thrift server is started) on Kerberos cluster

[MAPR-18865] Unable to submit spark apps from Windows client

Skip maven clean task on the parent module

New: Issue with running Hive commands in Spark

This is fixed in SPARK-7819
Isolated Hive Client Loader appears to cause Native Library
libMapRClient.4.0.2-mapr.so already loaded in another classloader error

Spark warden.services.conf should have dependency on cldb

Remove DFS shuffle settings.

These settings are not used right now.

Copy every file in the conf directory into the distribution package.

Create spark-defaults.conf for MapR

Settings to enable DFS shuffle on MapR.

Support hbase classpath computation in util script.

Adding external conf and scripts.

Enable SPARK_HIVE mode while building.

This is needed to bundle datanucleus jars needed for hive table creation.

Build Spark on MapR.
- make-distribution.sh takes an environment variable to enable profiles -
  MVN_PROFILE_ARG
- Added warden conf files under ext-conf.
- Updated pom.xml to use right set of jars and version.

Spark Master failed to start in HA mode

Updated Apache Curator version

Added spark streaming integration with kafka 0.9 and mapr-streams

Added MapR Repo
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants