Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

substantial decline in performance of percentile_agg #397

Closed
Joyfred opened this issue Apr 22, 2022 · 11 comments
Closed

substantial decline in performance of percentile_agg #397

Joyfred opened this issue Apr 22, 2022 · 11 comments
Labels
bug Something isn't working
Milestone

Comments

@Joyfred
Copy link

Joyfred commented Apr 22, 2022

Relevant system information:

  • OS: Ubuntu 11.2.0
  • PostgreSQL version (output of SELECT version();): 12.10
  • TimescaleDB Toolkit version: 1.3.1
  • Installation method: "Timescale Cloud"

Describe the bug
Computation time of queries involving percentile_agg has increased by 5x - 10x recently. It has declined to an extent that percentile_cont is sometimes better performing now, which was not the case earlier.

To Reproduce
Steps to reproduce the behavior:

(Query 1)
Execute below query in timescaledb_toolkit v1.3.1. Time taken to execute is ~30s. whereas in v1.2 its just ~18s

SELECT
	approx_percentile (0.5, percentile_agg (inp))
FROM (
	SELECT
		random() * 1000 AS inp
	FROM
		generate_series(1, 100000000)
) subquery;

(Query 2)
In the above approach you might find the difference to be slightly insignificant. If so, please execute below query in v1.3.1. Time taken to execute is ~1600s (sometimes its executing forever & never completes) whereas in v1.2 its just ~116s

CREATE MATERIALIZED view _check AS SELECT
	grp1
	, grp2
	, grp3
	, grp4
	, percentile_agg (inp1) as agg1
	, percentile_agg (inp2) as agg2
	, percentile_agg (inp3) as agg3
	, percentile_agg (inp4) as agg4
FROM (
	SELECT
		random() * 10000 AS inp1
		, random() * 989 AS inp2
		, random() * 793 AS inp3
		, random() * 329 AS inp4
		, (random() * 38000)::int grp1
		, (random() * 8)::int grp2
		, (random() * 6)::int grp3
		, (random() * 12)::int grp4
	FROM
		generate_series(1, 13000000)
) subquery
GROUP BY
	grp1
	, grp2
	, grp3
	, grp4;
@Joyfred Joyfred added the bug Something isn't working label Apr 22, 2022
@jerryxwu
Copy link
Contributor

@Joyfred Unrelated but I'm curious. Are you actually on Ubuntu 11.2.0 or that's a typo? It's not an LTS version and quite old.

@Joyfred
Copy link
Author

Joyfred commented Apr 23, 2022

@Joyfred Unrelated but I'm curious. Are you actually on Ubuntu 11.2.0 or that's a typo? It's not an LTS version and quite old.

@jerryxwu
By OS I believe you mean the one used in the DB service rt?
On executing select version();, below output was received

PostgreSQL 12.10 (Ubuntu 12.10-1.pgdg21.10+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.2.0-7ubuntu2) 11.2.0, 64-bit

Looks like both Ubuntu 12.10 & Ubuntu 11.2 are not LTS versions.

But in case if you mean the OS in which I use to access the DB service, it MacOS 11.4

@Joyfred
Copy link
Author

Joyfred commented Apr 23, 2022

@jerryxwu

To make testing more simpler, you can also reproduce this issue using this below query, rather ingesting data to a table

SELECT
	approx_percentile (0.5, percentile_agg (inp))
FROM (
	SELECT
		random() * 1000 AS inp
	FROM
		generate_series(1, 100000000)
) subquery;

In my local(toolkit v1.2) its executed ~18s but in timescale-cloud(tookit v1.3.0) service it took ~29s.

Earlier in timescale-cloud it had on-par/better performance when compared to my local. I'm comparing execution time with my local instance, since I'm not aware on how to downgrade timescaledb_toolkit version in timescale cloud. If needed & possible I can test the same.

@Joyfred
Copy link
Author

Joyfred commented Apr 24, 2022

I've updated the steps to reproduce section so that its easier to reproduce.

@jerryxwu
Copy link
Contributor

@Joyfred thanks to your repro cases, one thing that I can confirm is that I do not see meaningful perf difference between toolkit 1.2 & 1.3.1. I also took the opportunity to compare them with 1.5.1 & 1.6.0. At least one my local laptop, I see the perf of both 1.5.1 & 1.6.0 is about 2x of 1.2 & 1.3.1. Is there any chance you could give the latest 1.6.0 a try?

Regarding select version(); reported:

PostgreSQL 12.10 (Ubuntu 12.10-1.pgdg21.10+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.2.0-7ubuntu2) 11.2.0, 64-bit

They are Postgre version and GCC version. I know it's very easy to read them as Ubuntu version.

@Joyfred
Copy link
Author

Joyfred commented Apr 25, 2022

@jerryxwu Can users upgrade toolkit versions in timescale-cloud. If so can you point me to the docs.

@jerryxwu
Copy link
Contributor

jerryxwu commented Apr 25, 2022

@Joyfred you should be able upgrade to a new version by either (within psql)

  • ALTER EXTENSION timescaledb_toolkit UPDATE TO "version number"; or
  • ALTER EXTENSION timescaledb_toolkit UPDATE to the default version

You can find the relevant documentation here , although it requires some improvement.

You can see a list of versions available by select * from pg_available_extension_versions where name like 'timescale_toolkit%';

@jerryxwu
Copy link
Contributor

jerryxwu commented Apr 25, 2022

@Joyfred In my early testing, I overlooked the difference in PG versions. That appears to be playing the main role for the perf difference that I was seeing. Specifically, the same query that you used early

SELECT
    time_bucket('1 h'::interval, ts) as bucket,
    api_id,
    approx_percentile(0.5, percentile_agg(response_time_ms) FILTER (WHERE response_time_ms > 1)) as approx_median
FROM response_times
GROUP BY 1, 2
ORDER BY 3 DESC LIMIT 15;

performed 2x better with {PG 13.5 + TimescaleDB 2.5.0 + Toolkit 1.2} than with {PG 12.9 + TimescaleDB 2.5.0 + Toolkit 1.2} running on the same hardware.

Once again, I don't see a meaningful perf difference between Toolkit 1.2 & 1.3 when other parameters are kept the same.

@jerryxwu
Copy link
Contributor

@Joyfred we still recommend upgrading to the latest version as there have been a number of feature additions and bug fixes since 1.3 including an important security update in 1.5.2 .

@Joyfred
Copy link
Author

Joyfred commented Apr 26, 2022

@jerryxwu I've updated the toolkit version to 1.6 in timescale-cloud as recommended {PG 12.0 + TimescaleDB 2.6.1 + Toolkit 1.6.0}, but still there is substantial performance difference, I couldn't see any improvement. I think something's wrong with the service, because I have lower version of toolkit in my local {PG 13.4 + TimescaleDB 2.4.1 + Toolkit 1.2} which is performing much better. If possible can you test in our timescale-cloud service.

@jerryxwu
Copy link
Contributor

jerryxwu commented Apr 27, 2022

@Joyfred I just did a test on the timescale cloud {2CPU, 8G RAM} running {PG 14.2 + TimescaleDB 2.6.1 + Toolkit 1.5.1 & 1.6.0}. I don't see meaningful perf difference between Toolkit 1.5.1 & 1.6.0. More importantly, for both versions of Toolkit, the query

SELECT
    time_bucket('1 h'::interval, ts) as bucket,
    api_id,
    approx_percentile(0.5, percentile_agg(response_time_ms) FILTER (WHERE response_time_ms > 1)) as approx_median
FROM response_times
GROUP BY 1, 2
ORDER BY 3 DESC LIMIT 15;

runs ~ 3x faster than

SELECT
    time_bucket('1 h'::interval, ts) as bucket,
    api_id,
    percentile_cont(0.5) WITHIN GROUP (ORDER BY response_time_ms) median
FROM response_times
GROUP BY 1, 2
ORDER BY 3 DESC LIMIT 15;

At this point, we can pretty safe to conclude that the performance difference that you've experienced after the recent refresh is not because of the toolkit version change. I'd encourage you to file a bug report to support@timescale.com so they can diagnose this further for you. You can link this issue to your bug report so that our support team has more context.

@jfjoly jfjoly added this to the 1.7.0 milestone May 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants