Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-49719][SQL] Make UUID and SHUFFLE accept integer seed #48166

Closed
wants to merge 1 commit into from

Conversation

zhengruifeng
Copy link
Contributor

@zhengruifeng zhengruifeng commented Sep 19, 2024

What changes were proposed in this pull request?

Make UUID and SHUFFLE accept integer seed

Why are the changes needed?

In most cases, seed accept both int and long, but UUID and SHUFFLE only accept long seed

In [1]: spark.sql("SELECT RAND(1L), RAND(1), SHUFFLE(array(1, 20, 3, 5), 1L), UUID(1L)").show()
+------------------+------------------+---------------------------+--------------------+
|           rand(1)|           rand(1)|shuffle(array(1, 20, 3, 5))|              uuid()|
+------------------+------------------+---------------------------+--------------------+
|0.6363787615254752|0.6363787615254752|              [20, 1, 3, 5]|1ced31d7-59ef-4bb...|
+------------------+------------------+---------------------------+--------------------+


In [2]: spark.sql("SELECT UUID(1)").show()
...
AnalysisException: [INVALID_PARAMETER_VALUE.LONG] The value of parameter(s) `seed` in `UUID` is invalid: expects a long literal, but got "1". SQLSTATE: 22023; line 1 pos 7
...

In [3]: spark.sql("SELECT SHUFFLE(array(1, 20, 3, 5), 1)").show()
...
AnalysisException: [INVALID_PARAMETER_VALUE.LONG] The value of parameter(s) `seed` in `shuffle` is invalid: expects a long literal, but got "1". SQLSTATE: 22023; line 1 pos 7
...

Does this PR introduce any user-facing change?

yes

after this fix:

In [2]: spark.sql("SELECT SHUFFLE(array(1, 20, 3, 5), 1L), SHUFFLE(array(1, 20, 3, 5), 1), UUID(1L), UUID(1)").show()
+---------------------------+---------------------------+--------------------+--------------------+
|shuffle(array(1, 20, 3, 5))|shuffle(array(1, 20, 3, 5))|              uuid()|              uuid()|
+---------------------------+---------------------------+--------------------+--------------------+
|              [20, 1, 3, 5]|              [20, 1, 3, 5]|1ced31d7-59ef-4bb...|1ced31d7-59ef-4bb...|
+---------------------------+---------------------------+--------------------+--------------------+

How was this patch tested?

added tests

Was this patch authored or co-authored using generative AI tooling?

no

@@ -81,6 +81,7 @@ trait ExpressionWithRandomSeed extends Expression {

private[catalyst] object ExpressionWithRandomSeed {
def expressionToSeed(e: Expression, source: String): Option[Long] = e match {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this helper function is only used in uuid and shuffle for now

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you, @zhengruifeng .
Merged to master.

@zhengruifeng zhengruifeng deleted the int_seed branch September 20, 2024 01:04
attilapiros pushed a commit to attilapiros/spark that referenced this pull request Oct 4, 2024
### What changes were proposed in this pull request?
Make `UUID` and `SHUFFLE` accept integer `seed`

### Why are the changes needed?
In most cases, `seed` accept both int and long, but `UUID` and `SHUFFLE` only accept long seed

```py
In [1]: spark.sql("SELECT RAND(1L), RAND(1), SHUFFLE(array(1, 20, 3, 5), 1L), UUID(1L)").show()
+------------------+------------------+---------------------------+--------------------+
|           rand(1)|           rand(1)|shuffle(array(1, 20, 3, 5))|              uuid()|
+------------------+------------------+---------------------------+--------------------+
|0.6363787615254752|0.6363787615254752|              [20, 1, 3, 5]|1ced31d7-59ef-4bb...|
+------------------+------------------+---------------------------+--------------------+

In [2]: spark.sql("SELECT UUID(1)").show()
...
AnalysisException: [INVALID_PARAMETER_VALUE.LONG] The value of parameter(s) `seed` in `UUID` is invalid: expects a long literal, but got "1". SQLSTATE: 22023; line 1 pos 7
...

In [3]: spark.sql("SELECT SHUFFLE(array(1, 20, 3, 5), 1)").show()
...
AnalysisException: [INVALID_PARAMETER_VALUE.LONG] The value of parameter(s) `seed` in `shuffle` is invalid: expects a long literal, but got "1". SQLSTATE: 22023; line 1 pos 7
...
```

### Does this PR introduce _any_ user-facing change?
yes

after this fix:
```py
In [2]: spark.sql("SELECT SHUFFLE(array(1, 20, 3, 5), 1L), SHUFFLE(array(1, 20, 3, 5), 1), UUID(1L), UUID(1)").show()
+---------------------------+---------------------------+--------------------+--------------------+
|shuffle(array(1, 20, 3, 5))|shuffle(array(1, 20, 3, 5))|              uuid()|              uuid()|
+---------------------------+---------------------------+--------------------+--------------------+
|              [20, 1, 3, 5]|              [20, 1, 3, 5]|1ced31d7-59ef-4bb...|1ced31d7-59ef-4bb...|
+---------------------------+---------------------------+--------------------+--------------------+
```

### How was this patch tested?
added tests

### Was this patch authored or co-authored using generative AI tooling?
no

Closes apache#48166 from zhengruifeng/int_seed.

Authored-by: Ruifeng Zheng <ruifengz@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants