Skip to content

[SPARK-32167][SQL] Fix GetArrayStructFields to respect inner field's nullability together #28992

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

cloud-fan
Copy link
Contributor

What changes were proposed in this pull request?

Fix nullability of GetArrayStructFields. It should consider both the original array's containsNull and the inner field's nullability.

Why are the changes needed?

Fix a correctness issue.

Does this PR introduce any user-facing change?

Yes. See the added test.

How was this patch tested?

a new UT and end-to-end test

@cloud-fan
Copy link
Contributor Author

cc @maropu @viirya @dongjoon-hyun

Copy link
Contributor

@rednaxelafx rednaxelafx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with a question inline

test("SPARK-32167: get field from an array of struct") {
val innerStruct = new StructType().add("i", "int", nullable = true)
val schema = new StructType().add("arr", ArrayType(innerStruct, containsNull = false))
val df = spark.createDataFrame(List(Row(Seq(Row(1), Row(null)))).asJava, schema)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit curious why asJava is needed here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because the createDataFrame only takes java list, not scala Seq or List. We should probably fix that.

Copy link
Member

@maropu maropu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch! LGTM

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@HyukjinKwon HyukjinKwon changed the title [SPARK-32167][SQL] fix nullability of GetArrayStructFields [SPARK-32167][SQL] Fix GetArrayStructFields to respect inner field's nullability together Jul 3, 2020
Copy link
Member

@gengliangwang gengliangwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch!

Copy link
Member

@viirya viirya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch!

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the fix, @cloud-fan .

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thanks. I also verified locally.

@SparkQA
Copy link

SparkQA commented Jul 5, 2020

Test build #124926 has finished for PR 28992 at commit bc3542a.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member

maropu commented Jul 5, 2020

retest this please

@dongjoon-hyun
Copy link
Member

Oh, it looks relevant. Could you check and update SelectedFieldSuite, @cloud-fan ?

org.scalatest.exceptions.TestFailedException: Expected 
StructField(col3,ArrayType(StructType(StructField(field1,StructType(StructField(subfield1,IntegerType,false)),true)),false),false), but got 
StructField(col3,ArrayType(StructType(StructField(field1,StructType(StructField(subfield1,IntegerType,false)),true)),true),false)

@SparkQA
Copy link

SparkQA commented Jul 6, 2020

Test build #124978 has finished for PR 28992 at commit bc3542a.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 6, 2020

Test build #125066 has finished for PR 28992 at commit eb81598.

  • This patch fails to generate documentation.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member

maropu commented Jul 6, 2020

retest this please

@SparkQA
Copy link

SparkQA commented Jul 6, 2020

Test build #125073 has finished for PR 28992 at commit eb81598.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor Author

retest this please

@dongjoon-hyun
Copy link
Member

Retest this please.

@SparkQA
Copy link

SparkQA commented Jul 6, 2020

Test build #125099 has started for PR 28992 at commit eb81598.

@shaneknapp
Copy link
Contributor

test this please

@SparkQA
Copy link

SparkQA commented Jul 6, 2020

Test build #125106 has finished for PR 28992 at commit eb81598.

  • This patch fails to generate documentation.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Retest this please.

@SparkQA
Copy link

SparkQA commented Jul 7, 2020

Test build #125115 has finished for PR 28992 at commit eb81598.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

dongjoon-hyun pushed a commit that referenced this pull request Jul 7, 2020
…nullability together

### What changes were proposed in this pull request?

Fix nullability of `GetArrayStructFields`. It should consider both the original array's `containsNull` and the inner field's nullability.

### Why are the changes needed?

Fix a correctness issue.

### Does this PR introduce _any_ user-facing change?

Yes. See the added test.

### How was this patch tested?

a new UT and end-to-end test

Closes #28992 from cloud-fan/bug.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 5d296ed)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
@dongjoon-hyun
Copy link
Member

Merged to master/3.0.
Could you make a backport to branch-2.4, @cloud-fan ?

cloud-fan added a commit to cloud-fan/spark that referenced this pull request Jul 7, 2020
…nullability together

Fix nullability of `GetArrayStructFields`. It should consider both the original array's `containsNull` and the inner field's nullability.

Fix a correctness issue.

Yes. See the added test.

a new UT and end-to-end test

Closes apache#28992 from cloud-fan/bug.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 5d296ed)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
cloud-fan added a commit to cloud-fan/spark that referenced this pull request Jul 8, 2020
…nullability together

Fix nullability of `GetArrayStructFields`. It should consider both the original array's `containsNull` and the inner field's nullability.

Fix a correctness issue.

Yes. See the added test.

a new UT and end-to-end test

Closes apache#28992 from cloud-fan/bug.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 5d296ed)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
dongjoon-hyun pushed a commit that referenced this pull request Jul 8, 2020
…ld's nullability together

### What changes were proposed in this pull request?

Backport #28992 to 2.4

Fix nullability of `GetArrayStructFields`. It should consider both the original array's `containsNull` and the inner field's nullability.

### Why are the changes needed?

Fix a correctness issue.

### Does this PR introduce _any_ user-facing change?

Yes. See the added test.

### How was this patch tested?

a new UT and end-to-end test

Closes #29019 from cloud-fan/port.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
dongjoon-hyun added a commit that referenced this pull request Jul 8, 2020
### What changes were proposed in this pull request?

This PR aims to disable SBT `unidoc` generation testing in Jenkins environment because it's flaky in Jenkins environment and not used for the official documentation generation. Also, GitHub Action has the correct test coverage for the official documentation generation.

- #28848 (comment) (amp-jenkins-worker-06)
- #28926 (comment) (amp-jenkins-worker-06)
- #28969 (comment) (amp-jenkins-worker-06)
- #28975 (comment) (amp-jenkins-worker-05)
- #28986 (comment)  (amp-jenkins-worker-05)
- #28992 (comment) (amp-jenkins-worker-06)
- #28993 (comment) (amp-jenkins-worker-05)
- #28999 (comment) (amp-jenkins-worker-04)
- #29010 (comment) (amp-jenkins-worker-03)
- #29013 (comment) (amp-jenkins-worker-04)
- #29016 (comment) (amp-jenkins-worker-05)
- #29025 (comment) (amp-jenkins-worker-04)
- #29042 (comment) (amp-jenkins-worker-03)

### Why are the changes needed?

Apache Spark `release-build.sh` generates the official document by using the following command.
- https://github.com/apache/spark/blob/master/dev/create-release/release-build.sh#L341

```bash
PRODUCTION=1 RELEASE_VERSION="$SPARK_VERSION" jekyll build
```

And, this is executed by the following `unidoc` command for Scala/Java API doc.
- https://github.com/apache/spark/blob/master/docs/_plugins/copy_api_dirs.rb#L30

```ruby
system("build/sbt -Pkinesis-asl clean compile unidoc") || raise("Unidoc generation failed")
```

However, the PR builder disabled `Jekyll build` and instead has a different test coverage.
```python
# determine if docs were changed and if we're inside the amplab environment
# note - the below commented out until *all* Jenkins workers can get `jekyll` installed
# if "DOCS" in changed_modules and test_env == "amplab_jenkins":
#    build_spark_documentation()
```

```
Building Unidoc API Documentation
========================================================================
[info] Building Spark unidoc using SBT with these arguments:
-Phadoop-3.2 -Phive-2.3 -Pspark-ganglia-lgpl -Pkubernetes -Pmesos
-Phadoop-cloud -Phive -Phive-thriftserver -Pkinesis-asl -Pyarn unidoc
```

### Does this PR introduce _any_ user-facing change?

No. (This is used only for testing and not used in the official doc generation.)

### How was this patch tested?

Pass the Jenkins without doc generation invocation.

Closes #29017 from dongjoon-hyun/SPARK-DOC-GEN.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
otterc pushed a commit to linkedin/spark that referenced this pull request Mar 22, 2023
…nullability together

Ref: LIHADOOP-56842
(cherry picked from commit 146062d)

Backport apache#28992 to 2.4

Fix nullability of `GetArrayStructFields`. It should consider both the original array's `containsNull` and the inner field's nullability.

Fix a correctness issue.

Yes. See the added test.

a new UT and end-to-end test

Closes apache#29019 from cloud-fan/port.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

RB=2459030
BUG=LIHADOOP-56842
G=spark-reviewers
R=zolin,ekrogen
A=ekrogen
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants