-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance Improvement Proposal for queries related with the Task Table #20505
Comments
Thank you for the improvement, this makes sense. one question does |
Hi @Vad1mo , Important points:
At summary: You can migrate a JSON column to JSONB simply by altering the table. However, it's not possible to revert back since JSONB does not store the JSON in raw format. If you want to revert back, you should first make a copy of the table. |
Hi, explain command on one of the select * from task : difference in CPU usage between today (yellow line) and previous week (gray dotted line) : |
Thank you for the update. Now I think everything is clear. Anyway, my recommendation is to change the data type ( to jsonb ) instead of performing the conversion to make the query. Performance tests have reduced CPU consumption by almost 50% with the change of the data type to how it was implemented before. |
hi @peresureda, we tried it on our staging env, indeed with the conversion to jsonb and the add of the index it's much more faster, thank you @wy65701436 @Vad1mo do you think this change could be included in a 2.10 version ? thank you |
Query analysis : Before applying the fix :
|
…trs_report_uuids fixes goharbor#20505 Signed-off-by: stonezdj <stone.zhang@broadcom.com>
…trs_report_uuids fixes goharbor#20505 Signed-off-by: stonezdj <stone.zhang@broadcom.com>
…trs_report_uuids fixes goharbor#20505 Signed-off-by: stonezdj <stone.zhang@broadcom.com>
@stonezdj @wy65701436, thank you for the swift fix of the issues. It is not clear to me why, in addition to changing the query, we can not change the Data Type for generally a better performance? ALTER TABLE task ALTER COLUMN extra_attrs SET DATA TYPE jsonb
CREATE INDEX indx_task_jsonb ON task USING GIN(extra_attrs jsonb_path_ops); |
We are reverting to the previous situation where we are casting the object on each execution in order to use an index. Tests show that CPU consumption decreased exponentially by avoiding this cast and converting the data type. Am I missing any place where you need this field to be in JSON and cannot be migrated to JSONB? @stonezdj , @wy65701436 ¿? |
…trs_report_uuids fixes goharbor#20505 Signed-off-by: stonezdj <stone.zhang@broadcom.com>
I would also suggest to transition to jsob data type, it is clearly recommended option for dealing with json in pg, specifically for performance reasons. There are also other opertunities to switch to jsonb. |
The PR merged in 2.11 just fixed the index not used issue, for converting json to jsonb type and more index, we need to do more testing to verify it. |
…trs_report_uuids fixes goharbor#20505 Signed-off-by: stonezdj <stone.zhang@broadcom.com>
…trs_report_uuids fixes goharbor#20505 Signed-off-by: stonezdj <stone.zhang@broadcom.com>
Since we didn't encounter any performance issues prior to v2.10.1, the index on report_uuids has proven beneficial. The recent commit did not leverage this existing index, leading to performance problems. The fix is to rewrite the SQL to ensure the DB engine utilizes the report_uuids index. This approach avoids updating the DB schema, which would require additional upgrade work and testing This fix will be in v2.9.5, v2.10.3 and v2.11.0. |
@stonezdj @wy65701436, thank you for more info, as a quick fix this is a good way forward. However, I am quite confident, that we can optimize this further. @flbla @thcdrt @peresureda, you have a rather large data sets, can you provide some data points.
This will also open up opportunities for further improvements, as have other json data types present, that we query. |
Hi @Vad1mo This morning I have done extra tests with the different scenarios commented and I can provide more data:
Execution Plan:
|
Hi @Vad1mo, I built a version 2.10 with the rollback on the SELECT FROM task query. The size of the task table when I started my tests was 11313 lines.
After I launched some "GC", "Scan All" and "Replications" actions :
Then test with same builded image + change of DATA TYPE to jsonb + add indexes + reset my pg_stat_statements table :
After I launched some "GC", "Scan All" and "Replications" actions :
As you can see, the average time : |
I then re-switch to the 2.10.2 version (the version with slowness) and kept the jsonb and indexes , reset pg state statement :
after some actions on harbor (scan all, gc, replications, ...) :
|
So from my point of view, it could be good to do the following:
|
We have reverted to using the index defined in the execution table. Please upgrade to the latest version (v2.11) to validate the performance changes. If the performance still does not meet expectations, we can continue considering the use of the JSONB type. However, based on our testing results, we have observed a significant performance improvement after reverting. |
goharbor#20545) Adjust the query by UUID sql so that it can use the idx_task_extra_attrs_report_uuids fixes goharbor#20505 Signed-off-by: stonezdj <stone.zhang@broadcom.com>
goharbor#20545) Adjust the query by UUID sql so that it can use the idx_task_extra_attrs_report_uuids fixes goharbor#20505 Signed-off-by: stonezdj <stone.zhang@broadcom.com>
Hi @wy65701436 , After reverting the manual changes (changing the type from JSONB to JSON and removing the index) and applying the patch, we have observed that in terms of the performance of the ScanAll process, it is very similar to what we had with the improvement of JSONB + index. Where there is a small change is in resource usage, as we mentioned, the execution cost of having to transform to JSONB at runtime is higher than already having the data in that format. CPU USage during ScanAll before Upgrade with jsonb + index: CPU USage during ScanAll with 2.11 without manual improvements: From here, the decision is yours. |
@peresureda thanks for your data, it's helpful for us to do the further investigation. The performance problem was resolved after upgrade to v2.11, but you still see high CPU usage, am I understand correct? cc @stonezdj |
@stonezdj
|
Yes,
|
I don't remember about the CPU and memory but in my previous tests I had really better time with Jsonb + gin indexes : #20505 (comment) |
SQL: The performance of this SQL doesn't impact the ScanAll API, only impact the list. If any user wants to apply this change, it is OK to apply it in database directly, no code change required: |
This issue is being marked stale due to a period of inactivity. If this issue is still relevant, please comment or remove the stale label. Otherwise, this issue will close in 30 days. |
Is your feature request related to a problem? Please describe.
When you have the Trivy scanner activated and want to perform one of the following operations:
The following query is executed in the database:
SELECT * FROM task WHERE extra_attrs::jsonb @> cast( $1 as jsonb )
This query becomes very heavy for the database CPU, and in some cases, Harbor timeouts when the number of artifacts is large.
This is because the extra_attrs field is of type JSON and cannot have any kind of index on it.
Solution Proposal
Since the JSON and JSONB data types accept the same input and produce the same output, my recommendation is to change the data type to JSONB to allow indexing:
ALTER TABLE task ALTER COLUMN extra_attrs SET DATA TYPE jsonb
CREATE INDEX indx_task_jsonb ON task USING GIN(extra_attrs jsonb_path_ops);
Improvements
With this improvement, we have achieved the following results:
The text was updated successfully, but these errors were encountered: