Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-9167][SQL] use UTC Calendar in stringToDate #7488

Closed
wants to merge 2 commits into from

Conversation

cloud-fan
Copy link
Contributor

fix 2 bugs introduced in #7353

  1. we should use UTC Calendar when cast string to date . Before [SPARK-8995][SQL] cast date strings like '2015-01-01 12:15:31' to date #7353 , we use DateTimeUtils.fromJavaDate(Date.valueOf(s.toString)) to cast string to date, and fromJavaDate will call millisToDays to avoid the time zone issue. Now we use DateTimeUtils.stringToDate(s), we should create a Calendar with UTC in the begging.
  2. we should not change the default time zone in test cases. The threadLocalLocalTimeZone and threadLocalTimestampFormat in DateTimeUtils will only be evaluated once for each thread, so we can't set the default time zone back anymore.

@cloud-fan
Copy link
Contributor Author

cc @tarekauel @davies

@@ -377,6 +377,7 @@ object DateTimeUtils {
}
val c = Calendar.getInstance()
c.set(segments(0), segments(1) - 1, segments(2), 0, 0, 0)
Some((c.getTimeInMillis / 1000 / 3600 / 24).toInt)
c.set(Calendar.MILLISECOND, 0)
Some(millisToDays(c.getTimeInMillis))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of calling millisToDays, we could create a Calendar with UTC in the begging to avoid the timezone issues. Forget to comment on this when reviewing.

@cloud-fan cloud-fan changed the title [SPARK-9167][SQL] call millisToDays in stringToDate [SPARK-9167][SQL] use UTC Calendar in stringToDate Jul 18, 2015
@SparkQA
Copy link

SparkQA commented Jul 18, 2015

Test build #37704 has finished for PR 7488 at commit 21ef293.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class NaiveBayes(override val uid: String)
    • class KMeans(override val uid: String) extends Estimator[KMeansModel] with KMeansParams
    • class IntArrayParam(parent: Params, name: String, doc: String, isValid: Array[Int] => Boolean)
    • class KMeansModel(JavaModel):
    • class KMeans(JavaEstimator, HasFeaturesCol, HasMaxIter, HasSeed):
    • class PCA(JavaEstimator, HasInputCol, HasOutputCol):
    • class PCAModel(JavaModel):
    • abstract class Expression extends TreeNode[Expression] with Product
    • trait Generator extends Expression
    • abstract class UnaryLogExpression(f: Double => Double, name: String)
    • case class Conv(numExpr: Expression, fromBaseExpr: Expression, toBaseExpr: Expression)
    • case class Log(child: Expression) extends UnaryLogExpression(math.log, "LOG")
    • case class Log10(child: Expression) extends UnaryLogExpression(math.log10, "LOG10")
    • case class Log1p(child: Expression) extends UnaryLogExpression(math.log1p, "LOG1P")
    • trait NamedExpression extends Expression
    • abstract class Attribute extends LeafExpression with NamedExpression
    • case class IsNaN(child: Expression) extends UnaryExpression
    • abstract class LogicalPlan extends QueryPlan[LogicalPlan] with Logging with Product
    • abstract class SparkPlan extends QueryPlan[SparkPlan] with Logging with Product with Serializable
    • trait HashSemiJoin

@SparkQA
Copy link

SparkQA commented Jul 18, 2015

Test build #37707 has finished for PR 7488 at commit 9cd6005.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor Author

retest this please.

@SparkQA
Copy link

SparkQA commented Jul 18, 2015

Test build #37711 has finished for PR 7488 at commit 9cd6005.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 18, 2015

Test build #34 has finished for PR 7488 at commit 9cd6005.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class NaiveBayes(override val uid: String)
    • class KMeans(override val uid: String) extends Estimator[KMeansModel] with KMeansParams
    • class IntArrayParam(parent: Params, name: String, doc: String, isValid: Array[Int] => Boolean)
    • logDebug("isMulticlass = " + metadata.isMulticlass)
    • * (i.e., if isMulticlass && isSpaceSufficientForAllCategoricalSplits),
    • logDebug("isMulticlass = " + metadata.isMulticlass)
    • class KMeansModel(JavaModel):
    • class KMeans(JavaEstimator, HasFeaturesCol, HasMaxIter, HasSeed):
    • class PCA(JavaEstimator, HasInputCol, HasOutputCol):
    • class PCAModel(JavaModel):
    • abstract class Expression extends TreeNode[Expression] with Product
    • abstract class UnsafeProjection extends Projection
    • case class FromUnsafeProjection(fields: Seq[DataType]) extends Projection
    • abstract class BaseProjection extends Projection
    • class SpecificProjection extends $
    • class SpecificProjection extends $
    • trait Generator extends Expression
    • abstract class UnaryLogExpression(f: Double => Double, name: String)
    • case class Conv(numExpr: Expression, fromBaseExpr: Expression, toBaseExpr: Expression)
    • case class Log(child: Expression) extends UnaryLogExpression(math.log, "LOG")
    • case class Log10(child: Expression) extends UnaryLogExpression(math.log10, "LOG10")
    • case class Log1p(child: Expression) extends UnaryLogExpression(math.log1p, "LOG1P")
    • trait NamedExpression extends Expression
    • abstract class Attribute extends LeafExpression with NamedExpression
    • case class IsNaN(child: Expression) extends UnaryExpression
    • abstract class LogicalPlan extends QueryPlan[LogicalPlan] with Logging with Product
    • abstract class SparkPlan extends QueryPlan[SparkPlan] with Logging with Product with Serializable
    • trait HashSemiJoin

tarekbecker added a commit to tarekbecker/spark that referenced this pull request Jul 18, 2015
@yjshen
Copy link
Member

yjshen commented Jul 18, 2015

LGTM, tried this locally and fix the CastSuite test failure due to #7353 , hope to be merged ASAP.

@rxin
Copy link
Contributor

rxin commented Jul 18, 2015

Thanks - I've merged this.

@asfgit asfgit closed this in 692378c Jul 18, 2015
asfgit pushed a commit that referenced this pull request Jul 19, 2015
…][SPARK-8179][SPARK-8177][SPARK-8178][SPARK-9115][SQL] date functions

Jira:
https://issues.apache.org/jira/browse/SPARK-8199
https://issues.apache.org/jira/browse/SPARK-8184
https://issues.apache.org/jira/browse/SPARK-8183
https://issues.apache.org/jira/browse/SPARK-8182
https://issues.apache.org/jira/browse/SPARK-8181
https://issues.apache.org/jira/browse/SPARK-8180
https://issues.apache.org/jira/browse/SPARK-8179
https://issues.apache.org/jira/browse/SPARK-8177
https://issues.apache.org/jira/browse/SPARK-8179
https://issues.apache.org/jira/browse/SPARK-9115

Regarding `day`and `dayofmonth` are both necessary?

~~I am going to add `Quarter` to this PR as well.~~ Done.

~~As soon as the Scala coding is reviewed and discussed, I'll add the python api.~~ Done

Author: Tarek Auel <tarek.auel@googlemail.com>
Author: Tarek Auel <tarek.auel@gmail.com>

Closes #6981 from tarekauel/SPARK-8199 and squashes the following commits:

f7b4c8c [Tarek Auel] [SPARK-8199] fixed bug in tests
bb567b6 [Tarek Auel] [SPARK-8199] fixed test
3e095ba [Tarek Auel] [SPARK-8199] style and timezone fix
256c357 [Tarek Auel] [SPARK-8199] code cleanup
5983dcc [Tarek Auel] [SPARK-8199] whitespace fix
6e0c78f [Tarek Auel] [SPARK-8199] removed setTimeZone in tests, according to cloud-fans comment in #7488
4afc09c [Tarek Auel] [SPARK-8199] concise leap year handling
ea6c110 [Tarek Auel] [SPARK-8199] fix after merging master
70238e0 [Tarek Auel] Merge branch 'master' into SPARK-8199
3c6ae2e [Tarek Auel] [SPARK-8199] removed binary search
fb98ba0 [Tarek Auel] [SPARK-8199] python docstring fix
cdfae27 [Tarek Auel] [SPARK-8199] cleanup & python docstring fix
746b80a [Tarek Auel] [SPARK-8199] build fix
0ad6db8 [Tarek Auel] [SPARK-8199] minor fix
523542d [Tarek Auel] [SPARK-8199] address comments
2259299 [Tarek Auel] [SPARK-8199] day_of_month alias
d01b977 [Tarek Auel] [SPARK-8199] python underscore
56c4a92 [Tarek Auel] [SPARK-8199] update python docu
e223bc0 [Tarek Auel] [SPARK-8199] refactoring
d6aa14e [Tarek Auel] [SPARK-8199] fixed Hive compatibility
b382267 [Tarek Auel] [SPARK-8199] fixed bug in day calculation; removed set TimeZone in HiveCompatibilitySuite for test purposes; removed Hive tests for second and minute, because we can cast '2015-03-18' to a timestamp and extract a minute/second from it
1b2e540 [Tarek Auel] [SPARK-8119] style fix
0852655 [Tarek Auel] [SPARK-8119] changed from ExpectsInputTypes to implicit casts
ec87c69 [Tarek Auel] [SPARK-8119] bug fixing and refactoring
1358cdc [Tarek Auel] Merge remote-tracking branch 'origin/master' into SPARK-8199
740af0e [Tarek Auel] implement date function using a calculation based on days
4fb66da [Tarek Auel] WIP: date functions on calculation only
1a436c9 [Tarek Auel] wip
f775f39 [Tarek Auel] fixed return type
ad17e96 [Tarek Auel] improved implementation
c42b444 [Tarek Auel] Removed merge conflict file
ccb723c [Tarek Auel] [SPARK-8199] style and fixed merge issues
10e4ad1 [Tarek Auel] Merge branch 'master' into date-functions-fast
7d9f0eb [Tarek Auel] [SPARK-8199] git renaming issue
f3e7a9f [Tarek Auel] [SPARK-8199] revert change in DataFrameFunctionsSuite
6f5d95c [Tarek Auel] [SPARK-8199] fixed year interval
d9f8ac3 [Tarek Auel] [SPARK-8199] implement fast track
7bc9d93 [Tarek Auel] Merge branch 'master' into SPARK-8199
5a105d9 [Tarek Auel] [SPARK-8199] rebase after #6985 got merged
eb6760d [Tarek Auel] Merge branch 'master' into SPARK-8199
f120415 [Tarek Auel] improved runtime
a8edebd [Tarek Auel] use Calendar instead of SimpleDateFormat
5fe74e1 [Tarek Auel] fixed python style
3bfac90 [Tarek Auel] fixed style
356df78 [Tarek Auel] rely on cast mechanism of Spark. Simplified implementation
02efc5d [Tarek Auel] removed doubled code
a5ea120 [Tarek Auel] added python api; changed test to be more meaningful
b680db6 [Tarek Auel] added codegeneration to all functions
c739788 [Tarek Auel] added support for quarter SPARK-8178
849fb41 [Tarek Auel] fixed stupid test
638596f [Tarek Auel] improved codegen
4d8049b [Tarek Auel] fixed tests and added type check
5ebb235 [Tarek Auel] resolved naming conflict
d0e2f99 [Tarek Auel] date functions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants