Skip to content

Optimize file and assessment size query #5068

@bjester

Description

@bjester

Current behavior

The following query performs poorly on Studio's hotfixes DB. Six queries averaged 6.12 minutes per call.

SELECT
  SUM("contentcuration_file"."file_size") AS "resource_size",
  SUM(T4."file_size") AS "assessment_size",
  COUNT("contentcuration_assessmentitem"."id") AS "assessment_count"
FROM
  "contentcuration_contentnode"
LEFT OUTER JOIN
  "contentcuration_file"
ON
  ("contentcuration_contentnode"."id" = "contentcuration_file"."contentnode_id")
LEFT OUTER JOIN
  "contentcuration_assessmentitem"
ON
  ("contentcuration_contentnode"."id" = "contentcuration_assessmentitem"."contentnode_id")
LEFT OUTER JOIN
  "contentcuration_file" T4
ON
  ("contentcuration_assessmentitem"."id" = T4."assessment_item_id")
WHERE
  ("contentcuration_contentnode"."lft" >= $1
    AND "contentcuration_contentnode"."lft" <= $2
    AND "contentcuration_contentnode"."tree_id" = $3)

Desired behavior

The query could be better served as 2 or more individual queries. For example, resource_size could be queried to avoid the table join and instead use an EXISTS clause and/or CTE. The same for assessment_size and assessment_count, although the addition of the 'count' would require a join, but it could still be a CTE. In any case, the CTE should filter on an indexed field. Any EXISTS subquery should also filter on an indexed field.

First and foremost, identify the corresponding Django ORM query and ensure all of its queried fields are necessary.

Value add

This allows us to continue the process of upgrading our postgres server to a more recent version.

References

Query ID: 13142966431373741158

Image

Metadata

Metadata

Assignees

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions