Skip to content

Conversation

@tyro89
Copy link

@tyro89 tyro89 commented Apr 9, 2014

Make partitionBy use a tweaked version of hash as its default partition function
since the python hash function does not consistently assign the same value
to None across python processes.

Associated JIRA at https://issues.apache.org/jira/browse/SPARK-1468

…on function

since the python hash function does not consistently assign the same value
to None across python processes.
@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished.

@AmplabJenkins
Copy link

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13962/

@tyro89
Copy link
Author

tyro89 commented Apr 10, 2014

Not sure why the build is failing as I'm pretty sure this change isn't touching any of those two things.

@pwendell
Copy link
Contributor

Jenkins, retest this please.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14003/

@pwendell
Copy link
Contributor

@tyro89 Thanks for the fix, makes sense. Would you mind creating a JIRA for this on the Spark issue tracker? Also if there is a symptom or error that this causes that would be helpful to know (I'd guess it's just seeing the None key in multiple places on the reduce side of the shuffle).

Otherwise if people run into this it will be hard for them to learn where/when it was fixed.

@tyro89
Copy link
Author

tyro89 commented Apr 10, 2014

@tyro89 tyro89 changed the title Modify the partition function used by partitionBy. [SPARK-1468] Modify the partition function used by partitionBy. Apr 10, 2014
@mateiz
Copy link
Contributor

mateiz commented Jun 3, 2014

Jenkins, test this please

@mateiz
Copy link
Contributor

mateiz commented Jun 3, 2014

Sorry for the delay, just re-testing this before merging it.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15395/

@asfgit asfgit closed this in 8edc9d0 Jun 3, 2014
asfgit pushed a commit that referenced this pull request Jun 3, 2014
Make partitionBy use a tweaked version of hash as its default partition function
since the python hash function does not consistently assign the same value
to None across python processes.

Associated JIRA at https://issues.apache.org/jira/browse/SPARK-1468

Author: Erik Selin <erik.selin@jadedpixel.com>

Closes #371 from tyro89/consistent_hashing and squashes the following commits:

201c301 [Erik Selin] Make partitionBy use a tweaked version of hash as its default partition function since the python hash function does not consistently assign the same value to None across python processes.

(cherry picked from commit 8edc9d0)
Signed-off-by: Matei Zaharia <matei@databricks.com>
asfgit pushed a commit that referenced this pull request Jun 3, 2014
Make partitionBy use a tweaked version of hash as its default partition function
since the python hash function does not consistently assign the same value
to None across python processes.

Associated JIRA at https://issues.apache.org/jira/browse/SPARK-1468

Author: Erik Selin <erik.selin@jadedpixel.com>

Closes #371 from tyro89/consistent_hashing and squashes the following commits:

201c301 [Erik Selin] Make partitionBy use a tweaked version of hash as its default partition function since the python hash function does not consistently assign the same value to None across python processes.

(cherry picked from commit 8edc9d0)
Signed-off-by: Matei Zaharia <matei@databricks.com>
@mateiz
Copy link
Contributor

mateiz commented Jun 3, 2014

Thanks Erik! Merged this into branch-0.9, 1.0 and master.

pdeyhim pushed a commit to pdeyhim/spark-1 that referenced this pull request Jun 25, 2014
Make partitionBy use a tweaked version of hash as its default partition function
since the python hash function does not consistently assign the same value
to None across python processes.

Associated JIRA at https://issues.apache.org/jira/browse/SPARK-1468

Author: Erik Selin <erik.selin@jadedpixel.com>

Closes apache#371 from tyro89/consistent_hashing and squashes the following commits:

201c301 [Erik Selin] Make partitionBy use a tweaked version of hash as its default partition function since the python hash function does not consistently assign the same value to None across python processes.
xiliu82 pushed a commit to xiliu82/spark that referenced this pull request Sep 4, 2014
Make partitionBy use a tweaked version of hash as its default partition function
since the python hash function does not consistently assign the same value
to None across python processes.

Associated JIRA at https://issues.apache.org/jira/browse/SPARK-1468

Author: Erik Selin <erik.selin@jadedpixel.com>

Closes apache#371 from tyro89/consistent_hashing and squashes the following commits:

201c301 [Erik Selin] Make partitionBy use a tweaked version of hash as its default partition function since the python hash function does not consistently assign the same value to None across python processes.
mccheah pushed a commit to mccheah/spark that referenced this pull request Oct 3, 2018
…erals

[SPARK-24151] fix case sensitive literals
bzhaoopenstack pushed a commit to bzhaoopenstack/spark that referenced this pull request Sep 11, 2019
1. do not use uuid directly, to get the id quering by name
2. can not create flavor in public clouds, so let the tests fail first
3. only add one playbook
terraform-provider-openstack-acceptance-test-public-clouds for all
public clouds
4. add post.yaml to clean up the resources after the acctests

Closes: theopenlab/openlab#125
Closes: theopenlab/openlab#136
arjunshroff pushed a commit to arjunshroff/spark that referenced this pull request Nov 24, 2020
turboFei pushed a commit to turboFei/spark that referenced this pull request Nov 6, 2025
…codegen vs interpreted (apache#371)

### What changes were proposed in this pull request?

When an overflow occurs casting long to timestamp there are different behaviors between codegen and interpreted

```
scala> Seq(Long.MaxValue, Long.MinValue).toDF("v").repartition(1).selectExpr("*", "CAST(v AS timestamp) as ts").selectExpr("*", "unix_micros(ts)").show(false)
+--------------------+-------------------+---------------+
|v                   |ts                 |unix_micros(ts)|
+--------------------+-------------------+---------------+
|9223372036854775807 |1969-12-31 20:59:59|-1000000       |
|-9223372036854775808|1969-12-31 21:00:00|0              |
+--------------------+-------------------+---------------+

scala> spark.conf.set("spark.sql.codegen.wholeStage", false)

scala> spark.conf.set("spark.sql.codegen.factoryMode", "NO_CODEGEN")

scala> Seq(Long.MaxValue, Long.MinValue).toDF("v").repartition(1).selectExpr("*", "CAST(v AS timestamp) as ts").selectExpr("*", "unix_micros(ts)").show(false)
+--------------------+-----------------------------+--------------------+
|v                   |ts                           |unix_micros(ts)     |
+--------------------+-----------------------------+--------------------+
|9223372036854775807 |+294247-01-10 01:00:54.775807|9223372036854775807 |
|-9223372036854775808|-290308-12-21 15:16:20.224192|-9223372036854775808|
+--------------------+-----------------------------+--------------------+

```

To align the behavior this PR change the codegen function the be the same as interpreted (https://github.com/apache/spark/blob/f0090c95ad4eca18040104848117a7da648ffa3c/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala#L687)

### Why are the changes needed?

This is necesary to be consistent in all cases

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

With unit test and manually

### Was this patch authored or co-authored using generative AI tooling?

No

Closes apache#45294 from planga82/bugfix/spark47063_cast_codegen.

Authored-by: Pablo Langa <soypab@gmail.com>

(cherry picked from commit f18d945)

Signed-off-by: Kent Yao <yao@apache.org>
Co-authored-by: Pablo Langa <soypab@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants