Performance Issue: Slow Query Execution Despite Using `LIMIT` #3274

fredrikIOT · 2024-01-31T13:41:13Z

What type of bug is this?

Performance issue

What subsystems are affected?

Standalone mode

Minimal reproduce step

Populate a test database with data simulating 20000 sensors across 50 locations, with each location having its own table and each sensor having its own tag.
Insert data at a rate of 20000 data points per second (400 for each table), spread over 50 tables.
Run this simulation continuously for a few days.
Execute the query SELECT * FROM "location1" ORDER BY ts DESC LIMIT 100;.

What did you expect to see?

Expected efficient use of LIMIT and OFFSET, where the query execution should be quick, only processing the first 100 rows.

What did you see instead?

The query execution is slow, taking multiple seconds (8634 ms) despite using LIMIT. It seems that the LIMIT clause does not stop the query processing after reaching the limit but only restricts the number of rows returned.

Performance Metrics:

When running SELECT * FROM "location1" ORDER BY ts DESC LIMIT 100;. Metrics:

Returned	Metrics
Selected 100 rows	in 8634 ms

For comparison: When query the count of the table using SELECT count(*) FROM "location1";:

Returned	Metrics	Number of rows (result of the query)
Selected 1 rows	in 2333 ms	181459600

SQL Analyze:

EXPLAIN ANALYZE 
SELECT * FROM "location1" limit 5 offset 1

Result:

plan_type	plan
Plan with Metrics	MergeScanExec: peers=[4475355922432(1042, 0), ], metrics=[output_rows=100, ready_time=87.566768ms, finish_time=4.524580126s, first_consume_time=4.524555159s]

What operating system did you use?

Docker (Ubuntu)

What version of GreptimeDB did you use?

docker image: greptime/greptimedb:nightly-20240126-8bade8f8e4

Relevant log output and stack trace

No response

The text was updated successfully, but these errors were encountered:

waynexia · 2024-01-31T16:12:12Z

Hi @fredrikIOT, appreciate your detailed feedback!

I noticed you also run SELECT * FROM "location1" limit 5 offset 1 and it's slower than SELECT count(*) FROM "location1"; I'm wondering if it is reproducible? And how large is the result set? According to the analyze result it has output_rows=100 100 rows. Would the SQL SELECT * FROM "location1" limit 5 offset 1 return 100 rows (without the ANALYZE) or 5 rows

MichaelScofield · 2024-02-01T02:58:51Z

@fredrikIOT Can you share the SQL of creating table "location1"?

fredrikIOT · 2024-02-01T07:25:02Z

Hi @fredrikIOT, appreciate your detailed feedback!

I noticed you also run SELECT * FROM "location1" LIMIT 5 offset 1 and it's slower than SELECT count(*) FROM "location1"; I'm wondering if it is reproducible? And how large is the result set? According to the analyze result it has output_rows=100 100 rows. Would the SQL SELECT * FROM "location1" limit 5 offset 1 return 100 rows (without the ANALYZE) or 5 rows

@waynexia I need to correct an error in my previous message. I referenced the wrong query and result. Here's the accurate information:
Correct Query:
EXPLAIN ANALYZE SELECT * FROM "location1" ORDER BY ts DESC LIMIT 5 offset 1
Correct Result:
MergeScanExec: peers=[4629974745088(1078, 0), ], metrics=[output_rows=5, ready_time=379.715193ms, finish_time=9.498069549s, first_consume_time=9.498045694s]

While reviewing this, I've noticed the significant impact of the ORDER BY clause on the query performance. Here are some observations:

SELECT * FROM "location1" ORDER BY ts DESC LIMIT 5 offset 1 uses 5372 ms
SELECT * FROM "location1" LIMIT 5 offset 1 uses 213 ms
SELECT COUNT(*) FROM "location1"; uses 2787 ms

So, I guess the question of whether COUNT takes more time than LIMIT is no longer valid. However, it is reproducible that COUNT is slower when using ORDER BY.

fredrikIOT · 2024-02-01T07:56:24Z

@fredrikIOT Can you share the SQL of creating table "location1"?

@MichaelScofield Here's the SQL for creating the "location1" table:

CREATE TABLE IF NOT EXISTS "location1" (
  "location_uid" STRING NULL,
  "room_uid" STRING NULL,
  "log_uid" STRING NULL,
  "sensor_name" STRING NULL,
  "value" DOUBLE NULL,
  "ts" TIMESTAMP(3) NOT NULL,
  TIME INDEX ("ts"),
  PRIMARY KEY ("location_uid", "room_uid", "log_uid", "sensor_name")
) ENGINE=mito WITH(regions = 1)

As I mentioned in my previous message, the performance issue seems to be primarily due to the ORDER BY clause. However, having the table structure might provide further insights into any other potential performance implications.

waynexia · 2024-02-02T03:18:31Z

So, I guess the question of whether COUNT takes more time than LIMIT is no longer valid. However, it is reproducible that COUNT is slower when using ORDER BY.

Yes, that's correct. The main reason that SELECT count(*) FROM "location1"; runs faster than SELECT * FROM "location1" ORDER BY ts DESC LIMIT 100; is the ORDER BY ts. Unfortunately we haven't optimized for this scenario -- though we know it's important for time series scenarios. We used to have one optimization that could speed up ORDER BY ts by a fascinating percentage, but it's not adapted yet after a major refactor 🫣

We can keep this issue open as a tracker.

killme2008 · 2024-03-04T11:14:29Z

Do we have any plan to optimize for this case? @waynexia @evenyag

evenyag · 2024-03-04T14:37:11Z

I'm afraid that I don't have enough time to finish it in v0.8, maybe I can optimize some cases in v0.9.

evenyag added C-performance Category Performance A-storage Involves code in storage engines labels Mar 27, 2024

evenyag mentioned this issue Apr 24, 2024

fix: push down order hint of the query again #3797

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance Issue: Slow Query Execution Despite Using `LIMIT` #3274

Performance Issue: Slow Query Execution Despite Using `LIMIT` #3274

fredrikIOT commented Jan 31, 2024

waynexia commented Jan 31, 2024

MichaelScofield commented Feb 1, 2024

fredrikIOT commented Feb 1, 2024 •

edited

Loading

fredrikIOT commented Feb 1, 2024 •

edited

Loading

waynexia commented Feb 2, 2024

killme2008 commented Mar 4, 2024

evenyag commented Mar 4, 2024

Performance Issue: Slow Query Execution Despite Using LIMIT #3274

Performance Issue: Slow Query Execution Despite Using LIMIT #3274

Comments

fredrikIOT commented Jan 31, 2024

What type of bug is this?

What subsystems are affected?

Minimal reproduce step

What did you expect to see?

What did you see instead?

What operating system did you use?

What version of GreptimeDB did you use?

Relevant log output and stack trace

waynexia commented Jan 31, 2024

MichaelScofield commented Feb 1, 2024

fredrikIOT commented Feb 1, 2024 • edited Loading

fredrikIOT commented Feb 1, 2024 • edited Loading

waynexia commented Feb 2, 2024

killme2008 commented Mar 4, 2024

evenyag commented Mar 4, 2024

Performance Issue: Slow Query Execution Despite Using `LIMIT` #3274

Performance Issue: Slow Query Execution Despite Using `LIMIT` #3274

fredrikIOT commented Feb 1, 2024 •

edited

Loading

fredrikIOT commented Feb 1, 2024 •

edited

Loading