Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MySQL] last_record with timestamp column stored wrong timestamp. #126

Open
hiroyuki-sato opened this issue Dec 7, 2017 · 3 comments
Open

Comments

@hiroyuki-sato
Copy link
Member

hiroyuki-sato commented Dec 7, 2017

  • Embulk: 0.8.39
  • embulk-input-mysql: 0.8.2
  • MySQL: 5.7.19
  • Server and Client timezone: JST.
  • sample_data: sample.txt

Those data is JST timezone.

mysql> select * from in_mysql_test;
+------+---------------------+-------+
| c0   | c1                  | c2    |
+------+---------------------+-------+
|    1 | 2017-12-01 00:00:00 | dummy |
|    2 | 2017-12-01 00:01:00 | dummy |
...
| 1440 | 2017-12-01 23:59:00 | dummy |
| 1441 | 2017-12-02 00:00:00 | dummy |
+------+---------------------+-------+

in:
  type: mysql
  host: 127.0.0.1
  user: {{ env.mysql_user }}
  password: {{ env.mysql_password }}
  table: in_mysql_test
  database: embulk_test
  options: { useLegacyDatetimeCode: false }
  incremental_columns:
  - c1
  incremental: true

out:
  type: stdout

Execute embulk run -o diff.yml input_mysql.yml.liquid

in:
  last_record: ['2017-12-01T06:00:00.000000Z']
out: {}
1439,2017-12-01 14:58:00,dummy
1440,2017-12-01 14:59:00,dummy
1441,2017-12-01 15:00:00,dummy
2017-12-07 17:31:58.364 +0900 [INFO] (0001:transaction): {done:  1 / 1, running: 0}
2017-12-07 17:31:58.370 +0900 [INFO] (main): Committed.
2017-12-07 17:31:58.371 +0900 [INFO] (main): Next config diff: {"in":{"last_record":["2017-12-01T06:00:00.000000Z"]},"out":{}}

The last_record should be 2017-12-01T15:00:00.000000Z but It set 2017-12-01T06:00:00.000000Z

And second execution log

2017-12-07 20:05:33.310 +0900 [INFO] (0018:task-0000): SQL: SELECT * FROM `in_mysql_test` WHERE ((`c1` > ?)) ORDER BY `c1`
2017-12-07 20:05:33.311 +0900 [INFO] (0018:task-0000): Parameters: ["2017-12-01T06:00:00.000000Z"]

Reference

@hiroyuki-sato
Copy link
Member Author

Hello, @ganchiku

Thank you for your comment,
Does this mean the following change?

It seems to work well in my environment.
@hito4t What do you think this change?

diff --git a/embulk-input-mysql/src/main/java/org/embulk/input/mysql/getter/MySQLTimestampTimestampIncrementalHandler.java b/embulk-input-mysql/src/main/java/org/embulk/input/mysql/getter/MySQLTimestampTimestampIncrementalHandler.java
index f128ee1..9033c8f 100644
--- a/embulk-input-mysql/src/main/java/org/embulk/input/mysql/getter/MySQLTimestampTimestampIncrementalHandler.java
+++ b/embulk-input-mysql/src/main/java/org/embulk/input/mysql/getter/MySQLTimestampTimestampIncrementalHandler.java
@@ -22,8 +22,8 @@ public class MySQLTimestampTimestampIncrementalHandler
     @Override
     public org.embulk.spi.time.Timestamp utcTimestampFromSessionTime(long epochSecond, int nano)
     {
-        long sec = sessionTimeZone.convertLocalToUTC(epochSecond * 1000, false) / 1000;
-        return org.embulk.spi.time.Timestamp.ofEpochSecond(sec, nano);
+//        long sec = sessionTimeZone.convertLocalToUTC(epochSecond * 1000, false) / 1000;
+        return org.embulk.spi.time.Timestamp.ofEpochSecond(epochSecond, nano);
     }

     @Override

@hito4t

After applying above change, The last_record result is 2017-12-01T15:00:00.000000Z.
It expects value.

I want to verify one thing.
Does this timestamp parameter change to local timezone before query?
(Change 2017-12-01T15:00:00.000000Z -> 2017-12-02T00:00:00+0900)

2017-12-08 18:48:06.486 +0900 [INFO] (0018:task-0000): SQL: SELECT * FROM `in_mysql_test` WHERE ((`c1` > ?)) ORDER BY `c1`
2017-12-08 18:48:06.487 +0900 [INFO] (0018:task-0000): Parameters: ["2017-12-01T15:00:00.000000Z"]

Because when I execute the following SQL, It returns the wrong result.
But this plugin returns no data.(correct)

SELECT * FROM `in_mysql_test` WHERE ((`c1` > '2017-12-01T15:00:00.000000Z' )) ORDER BY `c1`;
+------+---------------------+-------+
| c0   | c1                  | c2    |
+------+---------------------+-------+
|  902 | 2017-12-01 15:01:00 | dummy |
|  903 | 2017-12-01 15:02:00 | dummy |
|  904 | 2017-12-01 15:03:00 | dummy |
|  905 | 2017-12-01 15:04:00 | dummy |

@ganchiku
Copy link
Contributor

@hiroyuki-sato

Thank you for your investigation, and I'm sorry that I have deleted my previous comment on this issue. I am still skeptical that my configuration might be bad.

But it was follows:

Thank you for raising this issue. I'm new to Embulk, and now trying to research how to use this software.

I looked into the embulk-input-mysql, and found something strange, and would like to share it.

https://github.com/embulk/embulk-input-jdbc/blob/master/embulk-input-mysql/src/main/java/org/embulk/input/mysql/getter/MySQLTimestampTimestampIncrementalHandler.java#L23

In `utcTimestampFromSessionTime` method, if the MySQL time_zone is JST, both `sessionTimeZone.convertLocalToUTC` and `org.embulk.spi.time.Timestamp.ofEpochSecond`  do UTC calculation.

If I removed `sessionTimeZone.convertLocalToUTC` from the method, it works, but I don't know if it is right fix for this problem. I also tried if MySQL time_zone is UTC, and it also works.

I can make PR for the fix, or if you have better idea, please provide how.

Thanks,

I will look into this issue more.

Thank you.

@hito4t
Copy link
Contributor

hito4t commented Dec 14, 2017

@ganchiku @hiroyuki-sato
Thank you for the information!

As you wrote, we shoudn't convert timestamp in utcTimestampFromSessionTime method because the timestamp returned from the server will be already valid.

If you make PR, I'll merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants