Skip to content

Index analysis doesn't work with toHour and Timezone condition #38333

@pavelnemirovsky

Description

Describe what's wrong
Filter condition doesn't work correctly when functions like toHour are involved, I have another examples involved toDayOfWeek function as well where produced results are incorrect but I'll report in separate ticket about it.

Does it reproduce on recent release?
Yes, verified as part of v22.3-lts

How to reproduce

  • Which ClickHouse server version to use 21.12.4.1
  • Which interface to use, if matters: clickhouse-client
  • Queries to run that lead to unexpected result and
Table definition
clickhouse-db-02.server.internal :) show create db.articles_ext_data;

SHOW CREATE TABLE db.articles_ext_data

Query id: 383c56fa-21e0-4dae-bc78-9eb9adfd03b2

[clickhouse-db-02.server.internal] 2022.06.23 08:19:47.831837 [ 65324 ] {383c56fa-21e0-4dae-bc78-9eb9adfd03b2} <Debug> executeQuery: (from 127.0.0.1:43080) show create db.articles_ext_data;
[clickhouse-db-02.server.internal] 2022.06.23 08:19:47.831935 [ 65324 ] {383c56fa-21e0-4dae-bc78-9eb9adfd03b2} <Trace> ContextAccess (default): Access granted: SHOW COLUMNS ON db.articles_ext_data
┌─statement──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ CREATE TABLE db.articles_ext_data
(
    `internal_id` String,
    `timestamp` Nullable(DateTime('UTC')),
    `url` Nullable(String),
    `data_provider` String,
    `document_length` UInt32,
    `domain_name` String,
    `is_near_duplicate` UInt8,
    `publish_date` DateTime('UTC'),
    `lang` Nullable(String),
    `frames.label` Array(String),
    `frames.score` Array(Float64),
    `frames.version` Array(UInt32),
    `frames.role` Array(Array(String)),
    `frames.value` Array(Array(String)),
    `frames.entity_id` Array(Array(UInt32)),
    `frames.salience_score` Array(Array(Float64)),
    `tags.id` Array(UInt32),
    `frames.num_mentions` Array(UInt32),
    `tags.name` Array(String),
    `tags.score` Array(Float64),
    `tags.tagger` Array(String),
    `tags.checksum` Array(String),
    `tags.type` Array(String),
    `kpis.entity_id` Array(UInt32),
    `kpis.salience_score` Array(Float64),
    `kpis.num_mentions` Array(UInt32)
)
ENGINE = ReplicatedMergeTree('/clickhouse/tables/replicated/db/articles_ext_data', 'clickhouse-db-02.server.internal')
PARTITION BY toYYYYMMDD(publish_date)
PRIMARY KEY cityHash64(internal_id)
ORDER BY cityHash64(internal_id)
SAMPLE BY cityHash64(internal_id)
SETTINGS index_granularity = 8192 │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
[clickhouse-db-02.server.internal] 2022.06.23 08:19:47.833999 [ 65324 ] {383c56fa-21e0-4dae-bc78-9eb9adfd03b2} <Information> executeQuery: Read 1 rows, 1.22 KiB in 0.002124662 sec., 470 rows/sec., 575.46 KiB/sec.
[clickhouse-db-02.server.internal] 2022.06.23 08:19:47.834029 [ 65324 ] {383c56fa-21e0-4dae-bc78-9eb9adfd03b2} <Debug> MemoryTracker: Peak memory usage (for query): 0.00 B.

1 rows in set. Elapsed: 0.003 sec.
# Working Use Case
SELECT
    toHour(toTimeZone(publish_date, 'UTC')) AS toHour_UTC,
    toHour(toTimeZone(publish_date, 'Asia/Jerusalem')) AS toHour_Israel
FROM db.articles_ext_data
WHERE (publish_date >= toTimeZone(toDateTime('2021-07-01 00:00:00'), 'Asia/Jerusalem')) AND (publish_date < toTimeZone(toDateTime('2021-09-30 23:59:59'), 'Asia/Jerusalem'))
HAVING toHour_UTC = 5
ORDER BY toHour_UTC DESC
LIMIT 10

Query id: 9032228d-a5ae-465b-985d-6cb5d8369ec8

[clickhouse-db-02.server.internal] 2022.06.23 08:18:11.827400 [ 65324 ] {9032228d-a5ae-465b-985d-6cb5d8369ec8} <Debug> executeQuery: (from 127.0.0.1:43080) SELECT toHour(toTimeZone(publish_date, 'UTC')) AS toHour_UTC, toHour(toTimeZone(publish_date, 'Asia/Jerusalem')) AS toHour_Israel FROM db.articles_ext_data WHERE (publish_date >= toTimeZone(toDateTime('2021-07-01 00:00:00'), 'Asia/Jerusalem')) AND (publish_date < toTimeZone(toDateTime('2021-09-30 23:59:59'), 'Asia/Jerusalem')) HAVING toHour_UTC = 5 ORDER BY toHour_UTC DESC LIMIT 10;
[clickhouse-db-02.server.internal] 2022.06.23 08:18:11.828408 [ 65324 ] {9032228d-a5ae-465b-985d-6cb5d8369ec8} <Trace> ContextAccess (default): Access granted: SELECT(publish_date) ON db.articles_ext_data
[clickhouse-db-02.server.internal] 2022.06.23 08:18:11.829203 [ 65324 ] {9032228d-a5ae-465b-985d-6cb5d8369ec8} <Trace> InterpreterSelectQuery: FetchColumns -> Complete
[clickhouse-db-02.server.internal] 2022.06.23 08:18:11.829851 [ 65324 ] {9032228d-a5ae-465b-985d-6cb5d8369ec8} <Debug> db.articles_ext_data (bf32f1f5-ce1c-44e3-bf32-f1f5ce1c24e3) (SelectExecutor): Key condition: unknown, unknown, and, unknown, and
[clickhouse-db-02.server.internal] 2022.06.23 08:18:11.841597 [ 65324 ] {9032228d-a5ae-465b-985d-6cb5d8369ec8} <Debug> db.articles_ext_data (bf32f1f5-ce1c-44e3-bf32-f1f5ce1c24e3) (SelectExecutor): MinMax index condition: (column 0 in [1625097600, +Inf)), (column 0 in (-Inf, 1633046398]), and, (toHour(toTimezone(column 0)) in [5, 5]), and
[clickhouse-db-02.server.internal] 2022.06.23 08:18:11.843542 [ 65324 ] {9032228d-a5ae-465b-985d-6cb5d8369ec8} <Debug> db.articles_ext_data (bf32f1f5-ce1c-44e3-bf32-f1f5ce1c24e3) (SelectExecutor): Selected 117/3088 parts by partition key, 117 parts by primary key, 5858/5858 marks by primary key, 5858 marks to read from 117 ranges
[clickhouse-db-02.server.internal] 2022.06.23 08:18:11.845501 [ 65324 ] {9032228d-a5ae-465b-985d-6cb5d8369ec8} <Debug> db.articles_ext_data (bf32f1f5-ce1c-44e3-bf32-f1f5ce1c24e3) (SelectExecutor): Reading approx. 6585652 rows with 32 streams
[clickhouse-db-02.server.internal] 2022.06.23 08:18:11.853734 [ 53567 ] {9032228d-a5ae-465b-985d-6cb5d8369ec8} <Debug> MergingSortedTransform: Merge sorted 1 blocks, 10 rows in 0.007999958 sec., 1250.0065625344535 rows/sec., 15.63 KiB/sec
┌─toHour_UTC─┬─toHour_Israel─┐
│          5 │             8 │
│          5 │             8 │
│          5 │             8 │
│          5 │             8 │
│          5 │             8 │
│          5 │             8 │
│          5 │             8 │
│          5 │             8 │
│          5 │             8 │
│          5 │             8 │
└────────────┴───────────────┘
[clickhouse-db-02.server.internal] 2022.06.23 08:18:11.854394 [ 65324 ] {9032228d-a5ae-465b-985d-6cb5d8369ec8} <Information> executeQuery: Read 6585652 rows, 25.12 MiB in 0.026937251 sec., 244481220 rows/sec., 932.62 MiB/sec.
[clickhouse-db-02.server.internal] 2022.06.23 08:18:11.854421 [ 65324 ] {9032228d-a5ae-465b-985d-6cb5d8369ec8} <Debug> MemoryTracker: Peak memory usage (for query): 0.00 B.

10 rows in set. Elapsed: 0.028 sec. Processed 6.59 million rows, 26.34 MB (235.49 million rows/s., 941.98 MB/s.)

------------------------------------------------------------------------------------------------------------------------------------------------------------------------

# Not Working Use Case
SELECT
    toHour(toTimeZone(publish_date, 'UTC')) AS toHour_UTC,
    toHour(toTimeZone(publish_date, 'Asia/Jerusalem')) AS toHour_Israel
FROM db.articles_ext_data
WHERE (publish_date >= toTimeZone(toDateTime('2021-07-01 00:00:00'), 'Asia/Jerusalem')) AND (publish_date < toTimeZone(toDateTime('2021-09-30 23:59:59'), 'Asia/Jerusalem'))
HAVING toHour_Israel = 8
ORDER BY toHour_Israel DESC
LIMIT 10

Query id: c1c83e59-af83-40cc-b93e-d2a774186fa1

[clickhouse-db-02.server.internal] 2022.06.23 08:18:34.523436 [ 65324 ] {c1c83e59-af83-40cc-b93e-d2a774186fa1} <Debug> executeQuery: (from 127.0.0.1:43080) SELECT toHour(toTimeZone(publish_date, 'UTC')) AS toHour_UTC, toHour(toTimeZone(publish_date, 'Asia/Jerusalem')) AS toHour_Israel FROM db.articles_ext_data WHERE (publish_date >= toTimeZone(toDateTime('2021-07-01 00:00:00'), 'Asia/Jerusalem')) AND (publish_date < toTimeZone(toDateTime('2021-09-30 23:59:59'), 'Asia/Jerusalem')) HAVING toHour_Israel = 8 ORDER BY toHour_Israel DESC LIMIT 10
[clickhouse-db-02.server.internal] 2022.06.23 08:18:34.524450 [ 65324 ] {c1c83e59-af83-40cc-b93e-d2a774186fa1} <Trace> ContextAccess (default): Access granted: SELECT(publish_date) ON db.articles_ext_data
[clickhouse-db-02.server.internal] 2022.06.23 08:18:34.525283 [ 65324 ] {c1c83e59-af83-40cc-b93e-d2a774186fa1} <Trace> InterpreterSelectQuery: FetchColumns -> Complete
[clickhouse-db-02.server.internal] 2022.06.23 08:18:34.526006 [ 65324 ] {c1c83e59-af83-40cc-b93e-d2a774186fa1} <Debug> db.articles_ext_data (bf32f1f5-ce1c-44e3-bf32-f1f5ce1c24e3) (SelectExecutor): Key condition: unknown, unknown, and, unknown, and
[clickhouse-db-02.server.internal] 2022.06.23 08:18:34.537831 [ 65324 ] {c1c83e59-af83-40cc-b93e-d2a774186fa1} <Debug> db.articles_ext_data (bf32f1f5-ce1c-44e3-bf32-f1f5ce1c24e3) (SelectExecutor): MinMax index condition: (column 0 in [1625097600, +Inf)), (column 0 in (-Inf, 1633046398]), and, (toHour(toTimezone(column 0)) in [8, 8]), and
[clickhouse-db-02.server.internal] 2022.06.23 08:18:34.537893 [ 65324 ] {c1c83e59-af83-40cc-b93e-d2a774186fa1} <Debug> db.articles_ext_data (bf32f1f5-ce1c-44e3-bf32-f1f5ce1c24e3) (SelectExecutor): Selected 0/3088 parts by partition key, 0 parts by primary key, 0/0 marks by primary key, 0 marks to read from 0 ranges
[clickhouse-db-02.server.internal] 2022.06.23 08:18:34.538562 [ 65324 ] {c1c83e59-af83-40cc-b93e-d2a774186fa1} <Debug> MemoryTracker: Peak memory usage (for query): 0.00 B.
Ok.

0 rows in set. Elapsed: 0.016 sec.

Expected behavior
The expected results should be identical to the first query response regardless of condition toHour_UTC = 5 or toHour_Israel = 8 applies

Additional context
On small tables with syntenic data, I was unable to reproduce the issue, I also tried to copy data from one table to another with the same structure and after data migration completion the issue reproduces consistently.
Might be related to #10977

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

Labels

bugConfirmed user-visible misbehaviour in official releasecomp-datetimedate & time & timezone relatedcomp-optimizersQuery optimizations

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions