Skip to content

[SPARK-20763][SQL]The function of month and day return the value which is not we expected. #17997

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

10110346
Copy link
Contributor

@10110346 10110346 commented May 16, 2017

What changes were proposed in this pull request?

spark-sql>select month("1582-09-28");
spark-sql>10
For this case, the expected result is 9, but it is 10.

spark-sql>select day("1582-04-18");
spark-sql>28
For this case, the expected result is 18, but it is 28.

when the date before "1582-10-04", the function of month and day return the value which is not we expected.

How was this patch tested?

unit tests

@10110346 10110346 changed the title [SPARK-20763][SQL]The function of month and day return an error value [SPARK-20763][SQL]The function of month and day return the value which is not we expected. May 16, 2017
@srowen
Copy link
Member

srowen commented May 16, 2017

Point taken, but do other DBs support dates before 1970? it does highly depend on historical calendars. The right solution isn't a hack in one place here but using Calendar properly, if anything.

@10110346
Copy link
Contributor Author

@srowen I have tested in mysql, it can support dates before 1970.
mysql> select month("1582-09-28");
+---------------------+
| month("1582-09-28") |
+---------------------+
| 9 |
+---------------------+

mysql> select day("1582-04-18");
+-------------------+
| day("1582-04-18") |
+-------------------+
| 18 |
+-------------------+

(year, dayInYear)
private[this] def getYearAndDayInYear(daysSince1970: SQLDate): (Int, Int, Int) = {
val date = new Date(daysToMillis(daysSince1970))
val YMD = date.toString.trim.split("-")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this would cause massive performance regression.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes,this cause performance worse than before

@10110346
Copy link
Contributor Author

10110346 commented May 17, 2017

@srowen @rxin So, by contrast, maybe make a hack in one place is a better solution

@10110346 10110346 force-pushed the wip_lx_0516 branch 2 times, most recently from 3f2d8d6 to a8d6e04 Compare May 17, 2017 02:21
@ueshin
Copy link
Member

ueshin commented May 17, 2017

I guess this would also affect date-related functions such as dayofyear, year, quater and weekofyear, etc.
Could you add tests for them?

@10110346
Copy link
Contributor Author

@ueshin Yes, I will do it ,thanks

@@ -76,6 +76,9 @@ class DateExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper {
}
}
checkEvaluation(DayOfYear(Literal.create(null, DateType)), null)

checkEvaluation(DayOfYear(Literal(new Date(sdf.parse("1582-04-28 13:10:15").getTime))), 118)
checkEvaluation(DayOfYear(Literal(new Date(sdf.parse("1582-10-4 13:10:15").getTime))), 277)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the dates we need to test here should be the boundaries like 1582-10-04 and 1582-10-15.

Btw, let's add 0 for 1-digit date, e.g. 1582-10-04 instead of 1582-10-4.

@@ -96,6 +99,8 @@ class DateExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper {
}
}
}
checkEvaluation(Year(Literal(new Date(sdf.parse("1582-01-01 13:10:15").getTime))), 1582)
checkEvaluation(Year(Literal(new Date(sdf.parse("1582-12-31 13:10:15").getTime))), 1582)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here should be 1581-12-31 and 1582-01-01, for example, and the rest should be in a similar way.

@@ -116,6 +121,9 @@ class DateExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper {
}
}
}

checkEvaluation(Quarter(Literal(new Date(sdf.parse("1582-03-28 13:10:15").getTime))), 1)
checkEvaluation(Quarter(Literal(new Date(sdf.parse("1582-09-30 13:10:15").getTime))), 3)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about using 1582-09-30 and 1582-10-01 for this?

@@ -76,6 +76,9 @@ class DateExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper {
}
}
checkEvaluation(DayOfYear(Literal.create(null, DateType)), null)

checkEvaluation(DayOfYear(Literal(new Date(sdf.parse("1582-10-15 13:10:15").getTime))), 288)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure whether this value is correct or not. should be 278?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure too, maybe 278 is better

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @cloud-fan @gatorsmile
Do you have any ideas?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we check with other databases?

Copy link
Contributor Author

@10110346 10110346 May 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan In mysql ,the rusult is :
mysql> select dayofyear("1982-10-04");
+-------------------------+
| dayofyear("1982-10-04") |
+-------------------------+
| 277 |
+-------------------------+
1 row in set (0.00 sec)

mysql> select dayofyear("1982-10-15");
+--------------------------+
| dayofyear("1982-10-15") |
+--------------------------+
| 288 |
+--------------------------+

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why 278 is better?

Copy link
Contributor Author

@10110346 10110346 May 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan Because in history,the period(5.10.1582 ~ 14.10.1582) is not exist.
But maybe 288 is more human readable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's follow mysql

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, thanks @cloud-fan

@@ -603,7 +603,13 @@ object DateTimeUtils {
*/
private[this] def getYearAndDayInYear(daysSince1970: SQLDate): (Int, Int) = {
// add the difference (in days) between 1.1.1970 and the artificial year 0 (-17999)
val daysNormalized = daysSince1970 + toYearZero
var daysSince1970Tmp = daysSince1970
// In history,the period(5.10.1582 ~ 14.10.1582) is not exist
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this representation may be confusing for all of the people in the world. Can we use 1582-10-5 or like Oct. 5, 1582?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's only about comment, and I think 1582-10-5 or Oct. 5, 1582 is more human readable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I will do ,thanks @kiszk @cloud-fan

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In history,the period(5.10.1582 ~ 14.10.1582) is not exist
->
Since Julian calendar was replaced with the Gregorian calendar, the 10 days after Oct. 4 were skipped.

@10110346 10110346 force-pushed the wip_lx_0516 branch 2 times, most recently from 205d3d0 to 8c99316 Compare May 18, 2017 05:29
@ueshin
Copy link
Member

ueshin commented May 18, 2017

ok to test

@@ -603,7 +603,13 @@ object DateTimeUtils {
*/
private[this] def getYearAndDayInYear(daysSince1970: SQLDate): (Int, Int) = {
// add the difference (in days) between 1.1.1970 and the artificial year 0 (-17999)
val daysNormalized = daysSince1970 + toYearZero
var daysSince1970Tmp = daysSince1970
// In history,the period(1582-10-05 ~ 1582-10-14) is not exist
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In history,the period(5.10.1582 ~ 14.10.1582) is not exist
->
Since Julian calendar was replaced with the Gregorian calendar, the 10 days after Oct. 4 were skipped.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gatorsmile ok,thanks

@gatorsmile
Copy link
Member

The current impl is still missing the special handling of that miracle period. See the outputs of Oracle:

Result Set 34
TO_CHAR(TO_DATE('1582-10-04','YYYY-MM-DD'),'DDD')
277

Result Set 35
TO_CHAR(TO_DATE('1582-10-05','YYYY-MM-DD'),'DDD')
288

Result Set 36
TO_CHAR(TO_DATE('1582-10-11','YYYY-MM-DD'),'DDD')
288

Result Set 37
TO_CHAR(TO_DATE('1582-10-12','YYYY-MM-DD'),'DDD')
288

Result Set 38
TO_CHAR(TO_DATE('1582-10-13','YYYY-MM-DD'),'DDD')
288

Result Set 39
TO_CHAR(TO_DATE('1582-10-14','YYYY-MM-DD'),'DDD')
288

Result Set 40
TO_CHAR(TO_DATE('1582-10-15','YYYY-MM-DD'),'DDD')
288

Result Set 41
TO_CHAR(TO_DATE('1582-10-16','YYYY-MM-DD'),'DDD')
289

@10110346
Copy link
Contributor Author

10110346 commented May 19, 2017

@gatorsmile I have tested like this:
val dav = Date.valueOf("1582-10-04"); val date = new Date(dav.getTime); println(date.toString)
the output is :1582-10-04
val dav = Date.valueOf("1582-10-05"); val date = new Date(dav.getTime); println(date.toString)
the output is :1582-10-15
val dav = Date.valueOf("1582-10-14"); val date = new Date(dav.getTime); println(date.toString)
the output is :1582-10-24
val dav = Date.valueOf("1582-10-15"); val date = new Date(dav.getTime); println(date.toString)
the output is :1582-10-15

Also,i tested in mysql:
mysql> select dayofyear('1582-10-4');
+------------------------+
| dayofyear('1582-10-4') |
+------------------------+
| 277 |
+------------------------+
1 row in set (0.00 sec)

mysql> select dayofyear('1582-10-5');
+------------------------+
| dayofyear('1582-10-5') |
+------------------------+
| 278 |
+------------------------+
1 row in set (0.00 sec)

mysql> select dayofyear('1582-10-6');
+------------------------+
| dayofyear('1582-10-6') |
+------------------------+
| 279 |
+------------------------+
1 row in set (0.00 sec)

mysql> select dayofyear('1582-10-14');
+-------------------------+
| dayofyear('1582-10-14') |
+-------------------------+
| 287 |
+-------------------------+
1 row in set (0.00 sec)

mysql> select dayofyear('1582-10-15');
+-------------------------+
| dayofyear('1582-10-15') |
+-------------------------+
| 288 |
+-------------------------+

@gatorsmile
Copy link
Member

After checking DB2, it behaves the same as MySQL.

LGTM

asfgit pushed a commit that referenced this pull request May 19, 2017
… which is not we expected.

## What changes were proposed in this pull request?
spark-sql>select month("1582-09-28");
spark-sql>10
For this case, the expected result is 9, but it is 10.

spark-sql>select day("1582-04-18");
spark-sql>28
For this case, the expected result is 18, but it is 28.

when the date  before "1582-10-04", the function of `month` and `day` return the value which is not we expected.

## How was this patch tested?
unit tests

Author: liuxian <liu.xian3@zte.com.cn>

Closes #17997 from 10110346/wip_lx_0516.

(cherry picked from commit ea3b1e3)
Signed-off-by: Xiao Li <gatorsmile@gmail.com>
@gatorsmile
Copy link
Member

Thanks! Merging to master/2.2.

@gatorsmile
Copy link
Member

Could you submit a backport PR to 2.1? Thanks!

@asfgit asfgit closed this in ea3b1e3 May 19, 2017
@10110346
Copy link
Contributor Author

OK,l Will do it, thanks. @gatorsmile

@10110346 10110346 deleted the wip_lx_0516 branch May 22, 2017 00:56
@10110346 10110346 restored the wip_lx_0516 branch May 22, 2017 00:57
asfgit pushed a commit that referenced this pull request May 23, 2017
…turn the value which is not we expected.

What changes were proposed in this pull request?

This PR is to backport #17997 to Spark 2.1

when the date before "1582-10-04", the function of month and day return the value which is not we expected.
How was this patch tested?

unit tests

Author: liuxian <liu.xian3@zte.com.cn>

Closes #18054 from 10110346/wip-lx-0522.
liyichao pushed a commit to liyichao/spark that referenced this pull request May 24, 2017
… which is not we expected.

## What changes were proposed in this pull request?
spark-sql>select month("1582-09-28");
spark-sql>10
For this case, the expected result is 9, but it is 10.

spark-sql>select day("1582-04-18");
spark-sql>28
For this case, the expected result is 18, but it is 28.

when the date  before "1582-10-04", the function of `month` and `day` return the value which is not we expected.

## How was this patch tested?
unit tests

Author: liuxian <liu.xian3@zte.com.cn>

Closes apache#17997 from 10110346/wip_lx_0516.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants