Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

time_zone=SYSTEM has an influence on timestamp. #29427

Open
Leavrth opened this issue Nov 4, 2021 · 10 comments
Open

time_zone=SYSTEM has an influence on timestamp. #29427

Leavrth opened this issue Nov 4, 2021 · 10 comments

Comments

@Leavrth
Copy link
Contributor

Leavrth commented Nov 4, 2021

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

prepare data:

create database sql_mode;
create table time_zone(id int primary key, a timestamp);
set @@session.time_zone = "+8:00";
insert into time_zone(id, a) values (1, '1990-04-15 01:30:12');
set @@session.time_zone = "-7:00";
insert into time_zone(id, a) values (2, '1990-04-15 01:30:12');

test:

mysql> set @@session.time_zone= "SYSTEM";
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT cast(TIMEDIFF(NOW(6), UTC_TIMESTAMP(6)) as time);
+--------------------------------------------------+
| cast(TIMEDIFF(NOW(6), UTC_TIMESTAMP(6)) as time) |
+--------------------------------------------------+
| 08:00:00                                         |
+--------------------------------------------------+
1 row in set (0.01 sec)

mysql> show variables like "%time_zone%";
+------------------+---------------+
| Variable_name    | Value         |
+------------------+---------------+
| system_time_zone | Asia/Shanghai |
| time_zone        | SYSTEM        |
+------------------+---------------+
2 rows in set (0.01 sec)

mysql> select * from sql_mode.time_zone;
+----+---------------------+
| id | a                   |
+----+---------------------+
|  1 | 1990-04-15 01:30:12 |
|  2 | 1990-04-15 17:30:12 |
+----+---------------------+
2 rows in set (0.01 sec)

mysql> select * from sql_mode.time_zone where id = 1 or id = 2;
+----+---------------------+
| id | a                   |
+----+---------------------+
|  1 | 1990-04-15 02:30:12 |
|  2 | 1990-04-15 17:30:12 |
+----+---------------------+
2 rows in set (0.00 sec)

mysql> set @@session.time_zone= "+8:00";
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT cast(TIMEDIFF(NOW(6), UTC_TIMESTAMP(6)) as time);
+--------------------------------------------------+
| cast(TIMEDIFF(NOW(6), UTC_TIMESTAMP(6)) as time) |
+--------------------------------------------------+
| 08:00:00                                         |
+--------------------------------------------------+
1 row in set (0.00 sec)

mysql> show variables like "%time_zone%";
+------------------+---------------+
| Variable_name    | Value         |
+------------------+---------------+
| system_time_zone | Asia/Shanghai |
| time_zone        | +8:00         |
+------------------+---------------+
2 rows in set (0.01 sec)

mysql> select * from sql_mode.time_zone;
+----+---------------------+
| id | a                   |
+----+---------------------+
|  1 | 1990-04-15 01:30:12 |
|  2 | 1990-04-15 16:30:12 |
+----+---------------------+
2 rows in set (0.00 sec)

mysql> select * from sql_mode.time_zone where id = 1 or id = 2;
+----+---------------------+
| id | a                   |
+----+---------------------+
|  1 | 1990-04-15 01:30:12 |
|  2 | 1990-04-15 16:30:12 |
+----+---------------------+
2 rows in set (0.00 sec)

2. What did you expect to see? (Required)

the results of timestamp should be the same.

3. What did you see instead (Required)

  1. SELECT with WHERE or not should have the same result when time_zone = SYSTEM.
  2. These 2 time_zone has the same timeoffset, but the results of timestamp are different.

4. What is your TiDB version? (Required)

v5.1.1, v5.2.0

@Leavrth Leavrth added the type/bug The issue is confirmed as a bug. label Nov 4, 2021
@kennytm
Copy link
Contributor

kennytm commented Nov 4, 2021

China was using DST in the year 1990, which springs forward from 1990-04-15T02:00:00+08 to 1990-04-15T03:00:00+09 and fall backs from 1990-09-16T02:00:00+09 to 1990-09-16T01:00:00+08.

The above is the rule used when the tz-database version is 2018f or above. Before this version, the spring forward / fall back time was wrongly set to 00:00:00 local time (eggert/tz@937286d).

If your system's tz version is below 2018f, the second "bug" is not a bug at all. Upgrade your system.

The "first bug" may be caused by using the coprocessor where TiKV and TiDB are using different tz-database versions.

@winoros
Copy link
Member

winoros commented Nov 5, 2021

The first bug is related to the PointGet executor.
If you change the where clause to a range filter. You would get the same result with the statement without the where clause.

@tiancaiamao
Copy link
Contributor

For the "first bug", I can't reproduce the different result with / without where

@kennytm
Copy link
Contributor

kennytm commented Nov 5, 2021

i suppose your system's tz version needs to be strictly below 2018f to reproduce.

TiKV is currently using chrono-tz 0.5.1 (chronotope/chrono-tz@119d365) which embeds tz version 2018i. perhaps you could reproduce using time zones which have changed between 2018i and 2021e.

@tiancaiamao
Copy link
Contributor

For the "first bug", I can't reproduce the different result with / without where

It can reproduce in one of our internal testing cluster.

The root cause for "1 SELECT with WHERE or not should have the same result when time_zone = SYSTEM" is that when timezone is "Asia/Shanghai", and the encoded UTC time value is 2240854276563795969, the calculated results in TiKV layer and TiDB layer is different.

This is how I debug:

dlv attach $PID, add break point and get the result of query "select * from t where id = 1", this is a point get and the kv value part from tikv is []uint8 len: 17, cap: 24, [128,0,1,0,0,0,2,8,0,0,0,0,140,23,157,68,25]
Decoding to types.Time, it's {coreTime: 2240854276563795969}, this will be convert from (UTC -> Asia/Shanghai)
The final result would be "1990-04-15 01:30:12"

Add break point and debug another code path, query "select * from t where id = 1 or id > 100", this will be the coprocessor code path. The result is calculated in TiKV layer, and the final result is "1990-04-15 02:30:12"

There is a session variable, tidb_enable_chunk_rpc, which is used the old protocol. When it's false, data is load from tikv to tidb and to chunk in TiDB :

mysql> set @@tidb_enable_chunk_rpc = 0;
Query OK, 0 rows affected (0.00 sec)

mysql> select * from timezone where id = 1 or id > 100;
+----+---------------------+
| id | a                   |
+----+---------------------+
|  1 | 1990-04-15 02:30:12 |
+----+---------------------+
1 row in set (0.01 sec)

mysql> set @@tidb_enable_chunk_rpc = 1;
Query OK, 0 rows affected (0.00 sec)

mysql> select * from timezone where id = 1 or id > 100;
+----+---------------------+
| id | a                   |
+----+---------------------+
|  1 | 1990-04-15 01:30:12 |
+----+---------------------+
1 row in set (0.00 sec)

It's now clear that decode the raw KV data to chunk, when handled in TiKV or TiDB, the result is different.

I test more timezone and find that 'Asia/Shanghai' seems to be the only case that TiDB and TiKV calculated the different result. All other timezone like "+8:00" "-8:00" "Australia/Perth" "America/Chicago" etc, the result is the same between TiDB and TiKV.

@kennytm
Copy link
Contributor

kennytm commented Nov 9, 2021

@tiancaiamao

All other timezone like "+8:00" "-8:00" "Australia/Perth" "America/Chicago" etc, the result is the same between TiDB and TiKV.

obviously you haven't performed targeted checks against changes since 2018i 😏.

example of difference when the TiDB node's tzdata version is 2019c or above:

set @@time_zone = '+00:00';
drop table if exists t;
create table t(id int primary key, a timestamp);
insert into t values 
(1, '1985-04-19 21:30:00'), 
(2, '1985-04-19 22:30:00'), 
(3, '1985-04-19 23:30:00');

set @@time_zone = 'Europe/Istanbul';  -- not "SYSTEM" but anyway
select * from t;
/*
+----+---------------------+
| id | a                   |
+----+---------------------+
|  1 | 1985-04-20 00:30:00 |
|  2 | 1985-04-20 01:30:00 |
|  3 | 1985-04-20 02:30:00 |
+----+---------------------+
3 rows in set (0.04 sec)
*/

select * from t where id in (1, 2, 3);
/*
+----+---------------------+
| id | a                   |
+----+---------------------+
|  1 | 1985-04-19 23:30:00 |
|  2 | 1985-04-20 00:30:00 |
|  3 | 1985-04-20 02:30:00 |
+----+---------------------+
3 rows in set (0.04 sec)
*/

@tiancaiamao
Copy link
Contributor

Do our TiDB distribution ship with a tzdata across every instance, when using TiUP or Cloud?
Considering this bug, if not, the result is always undetermined...

@morgo
Copy link
Contributor

morgo commented Nov 11, 2021

Do our TiDB distribution ship with a tzdata across every instance, when using TiUP or Cloud?
Considering this bug, if not, the result is always undetermined...

I believe we currently read the zoneinfo files provided by the OS on each instance, but GO does support embedding the tzdata database: https://pkg.go.dev/time/tzdata

If users keep their TiDB binaries up to date (and uniform), that would be one way of guaranteeing consistency.

@kennytm
Copy link
Contributor

kennytm commented Nov 17, 2021

The issue of the "first bug" is that both (a) TiDB's tzdata depends on the OS and (2) TiKV is stuck at 2018i (the latest chrono-tz 0.6.0 embeds 2021a; the newest tzdata is 2021e). So even if we embed the Go tzdata into TiDB, the versions can still be non-uniform if we are not pinning to 2018i (or 2021a if we upgrade chrono-tz at some point).

There are several solutions I can think of,

  1. BY DESIGN. WORKS AS IS. WON'T FIX.

    • Pros: We don't need to do any implementation, yay
    • Cons: People will get different result depending on the optimizer.
  2. Embed the same tzdata version to TiDB and TiKV (and TiFlash and TiSpark etc)

    • Pros: We can be sure the user can always get consistent result.
    • Pros: Perhaps the easiest solution without changing the implementation, just changing the build instruction is enough.
    • Cons: The TiDB binary grows by 450 KB.
    • Cons: Requires special preparation of the build environment to ensure all components are using the same tzdata version.
    • Cons: The TZ rules can become outdated, and the only way the user can update the TZ rules is by upgrading the TiDB cluster. (Of particular note is that parts of both the United States and the European Union are considering to abolish DST so updating the TZ rules will be important for customers over these regions.)
  3. TiDB sends the entire TZif rule to TiKV when pushing down time-zone dependent operations

    • Pros: Ensure single-source-to-truth without bloating the binary size
    • Pros: TiKV can get rid of chrono-tz
    • Cons: The TZif rule is comparatively huge (~ kilobytes)
    • Cons: TiDB needs to serialize the TZif rule, and TiKV needs to parse the TZif rule.
  4. Implement MySQL Server Time Zone Support

    • Pros: We actually follow the existing MySQL pattern and thus can reuse existing tools and documentations
    • Pros: Users can update the time zones easily.
    • Cons: Users need to populate mysql.time_zone* and manually maintain these databases.
    • Cons: TiDB and TiKV need to parse the mysql.time_zone* tables.
    • Cons: TiKV needs to do SELECT mysql.time_zone* in order to do the time-zone conversion.
  5. Disallow coprocessor push-down for any operation involving time zones.

    • Pros: This ensures only a single component touches time-zone-related code.
    • Pros: TiKV can get rid of chrono-tz
    • Cons: Many functions are no longer be eligible for co-processor, such as CAST(string AS DATETIME) and DATE_FORMAT.

@tiancaiamao
Copy link
Contributor

I vote for 2

@jebter jebter added affects-5.1 This bug affects 5.1.x versions. affects-5.2 This bug affects 5.2.x versions. affects-5.3 This bug affects 5.3.x versions. affects-5.4 This bug affects 5.4.x versions. labels Jan 11, 2022
@jebter jebter added the sig/execution SIG execution label Jul 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants