-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-20763][SQL]The function of month
and day
return the value which is not we expected.
#17997
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
month
and day
return an error valuemonth
and day
return the value which is not we expected.
Point taken, but do other DBs support dates before 1970? it does highly depend on historical calendars. The right solution isn't a hack in one place here but using Calendar properly, if anything. |
@srowen I have tested in mysql, it can support dates before 1970. mysql> select day("1582-04-18"); |
(year, dayInYear) | ||
private[this] def getYearAndDayInYear(daysSince1970: SQLDate): (Int, Int, Int) = { | ||
val date = new Date(daysToMillis(daysSince1970)) | ||
val YMD = date.toString.trim.split("-") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this would cause massive performance regression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes,this cause performance worse than before
3f2d8d6
to
a8d6e04
Compare
I guess this would also affect date-related functions such as |
@ueshin Yes, I will do it ,thanks |
@@ -76,6 +76,9 @@ class DateExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper { | |||
} | |||
} | |||
checkEvaluation(DayOfYear(Literal.create(null, DateType)), null) | |||
|
|||
checkEvaluation(DayOfYear(Literal(new Date(sdf.parse("1582-04-28 13:10:15").getTime))), 118) | |||
checkEvaluation(DayOfYear(Literal(new Date(sdf.parse("1582-10-4 13:10:15").getTime))), 277) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the dates we need to test here should be the boundaries like 1582-10-04
and 1582-10-15
.
Btw, let's add 0
for 1-digit date, e.g. 1582-10-04
instead of 1582-10-4
.
@@ -96,6 +99,8 @@ class DateExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper { | |||
} | |||
} | |||
} | |||
checkEvaluation(Year(Literal(new Date(sdf.parse("1582-01-01 13:10:15").getTime))), 1582) | |||
checkEvaluation(Year(Literal(new Date(sdf.parse("1582-12-31 13:10:15").getTime))), 1582) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here should be 1581-12-31
and 1582-01-01
, for example, and the rest should be in a similar way.
@@ -116,6 +121,9 @@ class DateExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper { | |||
} | |||
} | |||
} | |||
|
|||
checkEvaluation(Quarter(Literal(new Date(sdf.parse("1582-03-28 13:10:15").getTime))), 1) | |||
checkEvaluation(Quarter(Literal(new Date(sdf.parse("1582-09-30 13:10:15").getTime))), 3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about using 1582-09-30
and 1582-10-01
for this?
@@ -76,6 +76,9 @@ class DateExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper { | |||
} | |||
} | |||
checkEvaluation(DayOfYear(Literal.create(null, DateType)), null) | |||
|
|||
checkEvaluation(DayOfYear(Literal(new Date(sdf.parse("1582-10-15 13:10:15").getTime))), 288) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure whether this value is correct or not. should be 278
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure too, maybe 278
is better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @cloud-fan @gatorsmile
Do you have any ideas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we check with other databases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan In mysql ,the rusult is :
mysql> select dayofyear("1982-10-04");
+-------------------------+
| dayofyear("1982-10-04") |
+-------------------------+
| 277 |
+-------------------------+
1 row in set (0.00 sec)
mysql> select dayofyear("1982-10-15");
+--------------------------+
| dayofyear("1982-10-15") |
+--------------------------+
| 288 |
+--------------------------+
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why 278
is better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan Because in history,the period(5.10.1582 ~ 14.10.1582) is not exist.
But maybe 288
is more human readable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's follow mysql
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, thanks @cloud-fan
@@ -603,7 +603,13 @@ object DateTimeUtils { | |||
*/ | |||
private[this] def getYearAndDayInYear(daysSince1970: SQLDate): (Int, Int) = { | |||
// add the difference (in days) between 1.1.1970 and the artificial year 0 (-17999) | |||
val daysNormalized = daysSince1970 + toYearZero | |||
var daysSince1970Tmp = daysSince1970 | |||
// In history,the period(5.10.1582 ~ 14.10.1582) is not exist |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this representation may be confusing for all of the people in the world. Can we use 1582-10-5 or like Oct. 5, 1582?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's only about comment, and I think 1582-10-5 or Oct. 5, 1582 is more human readable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I will do ,thanks @kiszk @cloud-fan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In history,the period(5.10.1582 ~ 14.10.1582) is not exist
->
Since Julian calendar was replaced with the Gregorian calendar, the 10 days after Oct. 4 were skipped.
205d3d0
to
8c99316
Compare
ok to test |
@@ -603,7 +603,13 @@ object DateTimeUtils { | |||
*/ | |||
private[this] def getYearAndDayInYear(daysSince1970: SQLDate): (Int, Int) = { | |||
// add the difference (in days) between 1.1.1970 and the artificial year 0 (-17999) | |||
val daysNormalized = daysSince1970 + toYearZero | |||
var daysSince1970Tmp = daysSince1970 | |||
// In history,the period(1582-10-05 ~ 1582-10-14) is not exist |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In history,the period(5.10.1582 ~ 14.10.1582) is not exist
->
Since Julian calendar was replaced with the Gregorian calendar, the 10 days after Oct. 4 were skipped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gatorsmile ok,thanks
The current impl is still missing the special handling of that miracle period. See the outputs of Oracle:
|
@gatorsmile I have tested like this: Also,i tested in mysql: mysql> select dayofyear('1582-10-5'); mysql> select dayofyear('1582-10-6'); mysql> select dayofyear('1582-10-14'); mysql> select dayofyear('1582-10-15'); |
After checking DB2, it behaves the same as MySQL. LGTM |
… which is not we expected. ## What changes were proposed in this pull request? spark-sql>select month("1582-09-28"); spark-sql>10 For this case, the expected result is 9, but it is 10. spark-sql>select day("1582-04-18"); spark-sql>28 For this case, the expected result is 18, but it is 28. when the date before "1582-10-04", the function of `month` and `day` return the value which is not we expected. ## How was this patch tested? unit tests Author: liuxian <liu.xian3@zte.com.cn> Closes #17997 from 10110346/wip_lx_0516. (cherry picked from commit ea3b1e3) Signed-off-by: Xiao Li <gatorsmile@gmail.com>
Thanks! Merging to master/2.2. |
Could you submit a backport PR to 2.1? Thanks! |
OK,l Will do it, thanks. @gatorsmile |
…turn the value which is not we expected. What changes were proposed in this pull request? This PR is to backport #17997 to Spark 2.1 when the date before "1582-10-04", the function of month and day return the value which is not we expected. How was this patch tested? unit tests Author: liuxian <liu.xian3@zte.com.cn> Closes #18054 from 10110346/wip-lx-0522.
… which is not we expected. ## What changes were proposed in this pull request? spark-sql>select month("1582-09-28"); spark-sql>10 For this case, the expected result is 9, but it is 10. spark-sql>select day("1582-04-18"); spark-sql>28 For this case, the expected result is 18, but it is 28. when the date before "1582-10-04", the function of `month` and `day` return the value which is not we expected. ## How was this patch tested? unit tests Author: liuxian <liu.xian3@zte.com.cn> Closes apache#17997 from 10110346/wip_lx_0516.
What changes were proposed in this pull request?
spark-sql>select month("1582-09-28");
spark-sql>10
For this case, the expected result is 9, but it is 10.
spark-sql>select day("1582-04-18");
spark-sql>28
For this case, the expected result is 18, but it is 28.
when the date before "1582-10-04", the function of
month
andday
return the value which is not we expected.How was this patch tested?
unit tests