Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-32209][SQL] Re-use GetTimestamp in ParseToDate #28999

Closed
wants to merge 1 commit into from

Conversation

MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Jul 5, 2020

What changes were proposed in this pull request?

Replace the combination of expressions SecondsToTimestamp and UnixTimestamp by GetTimestamp in ParseToDate.

Why are the changes needed?

Eliminate unnecessary parsing overhead in: string -> timestamp -> long (seconds) -> timestamp -> date. After the changes, the chain will be: string -> timestamp -> date.

Does this PR introduce any user-facing change?

No

How was this patch tested?

By existing test suites such as DateFunctionsSuite.

@SparkQA
Copy link

SparkQA commented Jul 5, 2020

Test build #124922 has finished for PR 28999 at commit 87a3752.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Jul 6, 2020

Test build #124995 has finished for PR 28999 at commit 87a3752.

  • This patch fails to generate documentation.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member

maropu commented Jul 6, 2020

retest this please

@SparkQA
Copy link

SparkQA commented Jul 6, 2020

Test build #125014 has finished for PR 28999 at commit 87a3752.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Jul 6, 2020

Test build #125021 has finished for PR 28999 at commit 87a3752.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MaxGekk
Copy link
Member Author

MaxGekk commented Jul 6, 2020

jenkins, retest this, please

@SparkQA
Copy link

SparkQA commented Jul 6, 2020

Test build #125046 has finished for PR 28999 at commit 87a3752.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MaxGekk
Copy link
Member Author

MaxGekk commented Jul 6, 2020

jenkins, retest this, please

@SparkQA
Copy link

SparkQA commented Jul 6, 2020

Test build #125129 has started for PR 28999 at commit 87a3752.

@@ -1450,8 +1450,7 @@ case class ParseToDate(left: Expression, format: Option[Expression], child: Expr
extends RuntimeReplaceable {

def this(left: Expression, format: Expression) {
this(left, Option(format),
Cast(SecondsToTimestamp(UnixTimestamp(left, format)), DateType))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @MaxGekk . If you don't mind, please make a JIRA issue for this. That will give us a traceability for this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was thinking about it too. Maybe it's best to file a JIRA

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@MaxGekk
Copy link
Member Author

MaxGekk commented Jul 7, 2020

jenkins, retest this, please

@SparkQA
Copy link

SparkQA commented Jul 7, 2020

Test build #125180 has finished for PR 28999 at commit 87a3752.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

retest this please

@MaxGekk MaxGekk changed the title [MINOR][SQL] Re-use GetTimestamp in ParseToDate [SPARK-32209][SQL] Re-use GetTimestamp in ParseToDate Jul 7, 2020
@SparkQA
Copy link

SparkQA commented Jul 7, 2020

Test build #125192 has finished for PR 28999 at commit 87a3752.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Merged to master. Thank you all.

dongjoon-hyun added a commit that referenced this pull request Jul 8, 2020
### What changes were proposed in this pull request?

This PR aims to disable SBT `unidoc` generation testing in Jenkins environment because it's flaky in Jenkins environment and not used for the official documentation generation. Also, GitHub Action has the correct test coverage for the official documentation generation.

- #28848 (comment) (amp-jenkins-worker-06)
- #28926 (comment) (amp-jenkins-worker-06)
- #28969 (comment) (amp-jenkins-worker-06)
- #28975 (comment) (amp-jenkins-worker-05)
- #28986 (comment)  (amp-jenkins-worker-05)
- #28992 (comment) (amp-jenkins-worker-06)
- #28993 (comment) (amp-jenkins-worker-05)
- #28999 (comment) (amp-jenkins-worker-04)
- #29010 (comment) (amp-jenkins-worker-03)
- #29013 (comment) (amp-jenkins-worker-04)
- #29016 (comment) (amp-jenkins-worker-05)
- #29025 (comment) (amp-jenkins-worker-04)
- #29042 (comment) (amp-jenkins-worker-03)

### Why are the changes needed?

Apache Spark `release-build.sh` generates the official document by using the following command.
- https://github.com/apache/spark/blob/master/dev/create-release/release-build.sh#L341

```bash
PRODUCTION=1 RELEASE_VERSION="$SPARK_VERSION" jekyll build
```

And, this is executed by the following `unidoc` command for Scala/Java API doc.
- https://github.com/apache/spark/blob/master/docs/_plugins/copy_api_dirs.rb#L30

```ruby
system("build/sbt -Pkinesis-asl clean compile unidoc") || raise("Unidoc generation failed")
```

However, the PR builder disabled `Jekyll build` and instead has a different test coverage.
```python
# determine if docs were changed and if we're inside the amplab environment
# note - the below commented out until *all* Jenkins workers can get `jekyll` installed
# if "DOCS" in changed_modules and test_env == "amplab_jenkins":
#    build_spark_documentation()
```

```
Building Unidoc API Documentation
========================================================================
[info] Building Spark unidoc using SBT with these arguments:
-Phadoop-3.2 -Phive-2.3 -Pspark-ganglia-lgpl -Pkubernetes -Pmesos
-Phadoop-cloud -Phive -Phive-thriftserver -Pkinesis-asl -Pyarn unidoc
```

### Does this PR introduce _any_ user-facing change?

No. (This is used only for testing and not used in the official doc generation.)

### How was this patch tested?

Pass the Jenkins without doc generation invocation.

Closes #29017 from dongjoon-hyun/SPARK-DOC-GEN.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
@MaxGekk MaxGekk deleted the ParseToDate-parse-timestamp branch December 11, 2020 20:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants