[SPARK-1468] Modify the partition function used by partitionBy. #371

tyro89 · 2014-04-09T21:08:45Z

Make partitionBy use a tweaked version of hash as its default partition function
since the python hash function does not consistently assign the same value
to None across python processes.

Associated JIRA at https://issues.apache.org/jira/browse/SPARK-1468

…on function since the python hash function does not consistently assign the same value to None across python processes.

AmplabJenkins · 2014-04-09T21:12:23Z

Merged build triggered.

AmplabJenkins · 2014-04-09T21:12:30Z

Merged build started.

AmplabJenkins · 2014-04-09T22:07:39Z

Merged build finished.

AmplabJenkins · 2014-04-09T22:07:39Z

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13962/

tyro89 · 2014-04-10T14:58:57Z

Not sure why the build is failing as I'm pretty sure this change isn't touching any of those two things.

pwendell · 2014-04-10T17:33:56Z

Jenkins, retest this please.

AmplabJenkins · 2014-04-10T17:38:12Z

Merged build triggered.

AmplabJenkins · 2014-04-10T17:38:21Z

Merged build started.

AmplabJenkins · 2014-04-10T18:17:38Z

Merged build finished. All automated tests passed.

AmplabJenkins · 2014-04-10T18:17:38Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14003/

pwendell · 2014-04-10T22:22:54Z

@tyro89 Thanks for the fix, makes sense. Would you mind creating a JIRA for this on the Spark issue tracker? Also if there is a symptom or error that this causes that would be helpful to know (I'd guess it's just seeing the None key in multiple places on the reduce side of the shuffle).

Otherwise if people run into this it will be hard for them to learn where/when it was fixed.

tyro89 · 2014-04-10T23:02:40Z

@pwendell opened jira https://issues.apache.org/jira/browse/SPARK-1468

mateiz · 2014-06-03T19:06:59Z

Jenkins, test this please

mateiz · 2014-06-03T19:07:37Z

Sorry for the delay, just re-testing this before merging it.

AmplabJenkins · 2014-06-03T19:08:02Z

Merged build triggered.

AmplabJenkins · 2014-06-03T19:19:03Z

Merged build started.

AmplabJenkins · 2014-06-03T20:14:01Z

Merged build finished. All automated tests passed.

AmplabJenkins · 2014-06-03T20:14:01Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15395/

Make partitionBy use a tweaked version of hash as its default partition function since the python hash function does not consistently assign the same value to None across python processes. Associated JIRA at https://issues.apache.org/jira/browse/SPARK-1468 Author: Erik Selin <erik.selin@jadedpixel.com> Closes #371 from tyro89/consistent_hashing and squashes the following commits: 201c301 [Erik Selin] Make partitionBy use a tweaked version of hash as its default partition function since the python hash function does not consistently assign the same value to None across python processes. (cherry picked from commit 8edc9d0) Signed-off-by: Matei Zaharia <matei@databricks.com>

mateiz · 2014-06-03T20:33:57Z

Thanks Erik! Merged this into branch-0.9, 1.0 and master.

Make partitionBy use a tweaked version of hash as its default partition function since the python hash function does not consistently assign the same value to None across python processes. Associated JIRA at https://issues.apache.org/jira/browse/SPARK-1468 Author: Erik Selin <erik.selin@jadedpixel.com> Closes apache#371 from tyro89/consistent_hashing and squashes the following commits: 201c301 [Erik Selin] Make partitionBy use a tweaked version of hash as its default partition function since the python hash function does not consistently assign the same value to None across python processes.

…erals [SPARK-24151] fix case sensitive literals

1. do not use uuid directly, to get the id quering by name 2. can not create flavor in public clouds, so let the tests fail first 3. only add one playbook terraform-provider-openstack-acceptance-test-public-clouds for all public clouds 4. add post.yaml to clean up the resources after the acctests Closes: theopenlab/openlab#125 Closes: theopenlab/openlab#136

…-2.3.1 (apache#371)

…codegen vs interpreted (apache#371) ### What changes were proposed in this pull request? When an overflow occurs casting long to timestamp there are different behaviors between codegen and interpreted ``` scala> Seq(Long.MaxValue, Long.MinValue).toDF("v").repartition(1).selectExpr("*", "CAST(v AS timestamp) as ts").selectExpr("*", "unix_micros(ts)").show(false) +--------------------+-------------------+---------------+ |v |ts |unix_micros(ts)| +--------------------+-------------------+---------------+ |9223372036854775807 |1969-12-31 20:59:59|-1000000 | |-9223372036854775808|1969-12-31 21:00:00|0 | +--------------------+-------------------+---------------+ scala> spark.conf.set("spark.sql.codegen.wholeStage", false) scala> spark.conf.set("spark.sql.codegen.factoryMode", "NO_CODEGEN") scala> Seq(Long.MaxValue, Long.MinValue).toDF("v").repartition(1).selectExpr("*", "CAST(v AS timestamp) as ts").selectExpr("*", "unix_micros(ts)").show(false) +--------------------+-----------------------------+--------------------+ |v |ts |unix_micros(ts) | +--------------------+-----------------------------+--------------------+ |9223372036854775807 |+294247-01-10 01:00:54.775807|9223372036854775807 | |-9223372036854775808|-290308-12-21 15:16:20.224192|-9223372036854775808| +--------------------+-----------------------------+--------------------+ ``` To align the behavior this PR change the codegen function the be the same as interpreted (https://github.com/apache/spark/blob/f0090c95ad4eca18040104848117a7da648ffa3c/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala#L687) ### Why are the changes needed? This is necesary to be consistent in all cases ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? With unit test and manually ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#45294 from planga82/bugfix/spark47063_cast_codegen. Authored-by: Pablo Langa <soypab@gmail.com> (cherry picked from commit f18d945) Signed-off-by: Kent Yao <yao@apache.org> Co-authored-by: Pablo Langa <soypab@gmail.com>

Make partitionBy use a tweaked version of hash as its default partiti…

201c301

…on function since the python hash function does not consistently assign the same value to None across python processes.

tyro89 changed the title ~~Modify the partition function used by partitionBy.~~ [SPARK-1468] Modify the partition function used by partitionBy. Apr 10, 2014

asfgit closed this in 8edc9d0 Jun 3, 2014

mccheah pushed a commit to mccheah/spark that referenced this pull request Oct 3, 2018

Merge pull request apache#371 from palantir/jt/fix-case-sensitive-lit…

9c32b5b

…erals [SPARK-24151] fix case sensitive literals

arjunshroff pushed a commit to arjunshroff/spark that referenced this pull request Nov 24, 2020

[MSPARK-331] Remove snapshot versions of mapr dependencies from Spark…

0b21a8a

…-2.3.1 (apache#371)

[SPARK-1468] Modify the partition function used by partitionBy. #371

[SPARK-1468] Modify the partition function used by partitionBy. #371

Uh oh!

Conversation

tyro89 commented Apr 9, 2014

Uh oh!

AmplabJenkins commented Apr 9, 2014

Uh oh!

AmplabJenkins commented Apr 9, 2014

Uh oh!

AmplabJenkins commented Apr 9, 2014

Uh oh!

AmplabJenkins commented Apr 9, 2014

Uh oh!

tyro89 commented Apr 10, 2014

Uh oh!

pwendell commented Apr 10, 2014

Uh oh!

AmplabJenkins commented Apr 10, 2014

Uh oh!

AmplabJenkins commented Apr 10, 2014

Uh oh!

AmplabJenkins commented Apr 10, 2014

Uh oh!

AmplabJenkins commented Apr 10, 2014

Uh oh!

pwendell commented Apr 10, 2014

Uh oh!

tyro89 commented Apr 10, 2014

Uh oh!

mateiz commented Jun 3, 2014

Uh oh!

mateiz commented Jun 3, 2014

Uh oh!

AmplabJenkins commented Jun 3, 2014

Uh oh!

AmplabJenkins commented Jun 3, 2014

Uh oh!

AmplabJenkins commented Jun 3, 2014

Uh oh!

AmplabJenkins commented Jun 3, 2014

Uh oh!

mateiz commented Jun 3, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants