Prevent sqlalchemy Transparent SQL Compilation Caching from filling up during purge #71015

bdraco · 2022-04-28T17:57:11Z

Proposed change

Since each purge cycle and can result in a many different sized
unioned queries to find unused attribute ids, sqlalchemys
Transparent SQL Compilation Caching would be a bit too helpful
and cache each one which results in an extra 500MB of memory used.

Fixes the memory jump reported here and analyzed here

We now generate a single query using sqlalchemy's lambda_stmt
and fill unused values with NULLs to ensure that only one query is
cached and we only have to compile the query once.

Because of how lambda_stmt works, I had to write out the entire
statement since it relies on getting the bind values being passed
to the function as python arguments:
https://docs.sqlalchemy.org/en/14/core/connections.html#quick-guidelines-for-lambdas

If there is any way to improve this without having to write it out,
I haven't been able to find it after beating my head against the wall
for hours ( I have 12+ hours into this PR ). If anyone has a better way, please help!

Type of change

Dependency upgrade
Bugfix (non-breaking change which fixes an issue)
New integration (thank you!)
New feature (which adds functionality to an existing integration)
Breaking change (fix/feature causing existing functionality to break)
Code quality improvements to existing code or addition of tests

Additional information

This PR fixes or closes issue: fixes Auto_purge from database takes much longer then before 2022.4.x #70409 -- (the problem was reported in the original issue in Auto_purge from database takes much longer then before 2022.4.x #70409 (comment) instead of a new issue)
This PR is related to issue:
Link to documentation pull request:

Checklist

The code change is tested and works locally.
Local tests pass. Your PR cannot be merged unless tests pass
There is no commented out code in this PR.
I have followed the development checklist
The code has been formatted using Black (black --fast homeassistant tests)
Tests have been added to verify that the new code works.

If user exposed functionality or configuration variables are added/changed:

Documentation added/updated for www.home-assistant.io

If the code communicates with devices, web services, or third-party tools:

The manifest file has all fields filled out correctly.
Updated and included derived files by running: python3 -m script.hassfest.
New or updated dependencies have been added to requirements_all.txt.
Updated by running python3 -m script.gen_requirements_all.
For the updated dependencies - a link to the changelog, or at minimum a diff between library versions is added to the PR description.
Untested files have been added to .coveragerc.

The integration reached or maintains the following Integration Quality Scale:

No score or internal
🥈 Silver
🥇 Gold
🏆 Platinum

To help with the load of incoming pull requests:

I have reviewed two other open pull requests in this repository.

…p during purge Since each purge cycle and can result in a different amount unioned queries to find unused attribute ids, sqlalchemys Transparent SQL Compilation Caching would cache each one. We now generate a single query and fill unused values with NULLs to ensure that only one query is cached and we only have to compile the query once.

probot-home-assistant · 2022-04-28T17:57:18Z

Hey there @home-assistant/core, mind taking a look at this pull request as it has been labeled with an integration (recorder) you are listed as a code owner for? Thanks!
_{^{(message by CodeOwnersMention)}}

bdraco · 2022-04-28T19:18:10Z

Well it doesn't fill up the cache but now we don't cache that statement at all which is a cpu problem

bdraco · 2022-04-28T19:27:17Z

this is taking forever to test since I have to generate a database that takes long enough to purge and watch the memory, cpu, and profile during the purge

bdraco · 2022-04-28T19:28:37Z

Latest push is performing well

bdraco · 2022-04-28T19:29:10Z

Memory is now stable during purge

bdraco · 2022-04-28T21:15:11Z

homeassistant/components/recorder/purge.py

@@ -112,6 +114,118 @@ def _select_event_state_and_attributes_ids_to_purge(
    return event_ids, state_ids, attributes_ids


+def _generate_find_attr_lambda(attribute_ids: list[int]) -> StatementLambdaElement:


This works and its fast, but I need to figure out how to avoid writing it all out

This reverts commit e85d254.

This reverts commit 1f59da4.

This reverts commit 5d7df00.

bdraco · 2022-04-28T22:36:16Z

This works really well and doesn't have the memory problem

bdraco · 2022-04-28T22:37:52Z

I haven't been able to come up with a solution that doesn't involve writing out the whole query.

I tried generating it an using params but it was at last an order of magnitude slower

bdraco · 2022-04-28T22:51:51Z

This might be one of those cases where is better to bypass the ORM and generate the query ourselves but I guess it better to have a written out query than try to deal with the complexity of all the supported DBMS ourselves

bdraco · 2022-04-29T01:07:55Z

homeassistant/components/recorder/purge.py

+
+    https://docs.sqlalchemy.org/en/14/core/connections.html#quick-guidelines-for-lambdas
+    """
+    return lambda_stmt(


The lambda_stmt is actually looking at the code inside the lambda to generate the cache key so it seems like we are stuck writing this out if we want a single cache key with bindparams

bdraco · 2022-04-29T01:10:55Z

homeassistant/components/recorder/purge.py

+    return select(func.min(States.attributes_id)).where(States.attributes_id == attr)
+
+
+def _generate_find_attr_lambda(


The docs for sqlalchemy say

Avoid referring to non-SQL constructs inside of lambdas as they are not cacheable by default .... The best way to resolve the above situation is to not refer to foo inside of the lambda, and refer to it outside instead: >>> def my_stmt(foo): ... x_param, y_param = foo.x, foo.y ... stmt = lambda_stmt(lambda: select(func.max(x_param, y_param))) ... return stmt

So even if I pass this as a list I'm stuck unpacking them as attr1, attr2, attr3, attr4, .... = attrs outside the lambda

bdraco · 2022-04-29T03:10:04Z

Memory testing looks solid. Actually saw memory going down from gc during the purge

bdraco · 2022-04-29T03:10:23Z

Tested with timescale db / postgres addon and purge is still fast + memory looks good

bdraco · 2022-04-29T03:11:20Z

cpu during purge also looks good

bdraco · 2022-04-29T03:12:23Z

I don't love the syntax here, but I can't find a better way that doesn't involve using sqlalchemy internals which seems risky. I just don't like how the code looks even if though it functions well.

balloob · 2022-04-29T04:34:49Z

homeassistant/components/recorder/purge.py

-    to_remove = attributes_ids - {state[0] for state in id_query.all()}
+        # We used to generate a query based on how many attribute_ids to find but
+        # that meant sqlalchemy Transparent SQL Compilation Caching was working against
+        # us by cached up to MAX_ROWS_TO_PURGE different statements which could be


Not familiar with SQL Alchemy, but would it be possible to execute a query without it being cached ? We don't purge that often, so having slightly more expensive query generation is fine.

Ah, you wrote that above already, dealing with the dialects. That's fair, we can keep it as this. People will have questions when they see it 😅

My first thought was to disable the cache.

Unfortunately the query generates so many ORM objects (also the same reason the cache gets so big), generating it every time was more expensive run-time wise than all the other python code run in a purge cycle combined.

Sqlalchemy clearly isn't optimized for this use case.

I also tried caching a query with many bindparms but the code that generates the actual query from the cache clones the cached version and visits every part to fill in the bind values so that turned out to be even more expensive.

The only way I found that resulted in a single cache key (the lambda) was how it's currently implemented.

…p during purge (#71015)

homeassistant added the cla-signed label Apr 28, 2022

probot-home-assistant bot added core has-tests integration: recorder labels Apr 28, 2022

bdraco added this to the 2022.5.0 milestone Apr 28, 2022

simplify expression

df266f9

restore caching

d68e5ca

bdraco added 3 commits April 28, 2022 14:39

use an anonymous value to reduce sqlalchemy overhead

ce2231d

tweak

6e64672

tweak

e85d254

bdraco commented Apr 28, 2022

View reviewed changes

balloob changed the title ~~Prevent sqlalchemy Transparent SQL Compilation Caching from filling up during pruge~~ Prevent sqlalchemy Transparent SQL Compilation Caching from filling up during purge Apr 28, 2022

bdraco added 8 commits April 28, 2022 16:32

Revert "tweak"

5d7df00

This reverts commit e85d254.

100 block size

1f59da4

Revert "100 block size"

66106b4

This reverts commit 1f59da4.

Revert "Revert "tweak""

17e7158

This reverts commit 5d7df00.

write it back out again as the other method did not work

1c43462

tweak

4a220e3

correct typing

db25f23

merge

2b2fc93

read the graph wrong its only 500 not 800

e101b02

bdraco added 2 commits April 28, 2022 18:19

reduce some more

564ffd4

Merge branch 'dev' into sqlalchemy_caching

88cad5d

bdraco commented Apr 29, 2022

View reviewed changes

bdraco marked this pull request as ready for review April 29, 2022 01:08

bdraco requested a review from a team as a code owner April 29, 2022 01:08

bdraco commented Apr 29, 2022

View reviewed changes

balloob reviewed Apr 29, 2022

View reviewed changes

balloob approved these changes Apr 29, 2022

View reviewed changes

balloob merged commit b9c7a89 into home-assistant:dev Apr 29, 2022

balloob added the cherry-picked label Apr 29, 2022

balloob pushed a commit that referenced this pull request Apr 29, 2022

Prevent sqlalchemy Transparent SQL Compilation Caching from filling u…

0b144be

…p during purge (#71015)

github-actions bot locked and limited conversation to collaborators Apr 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prevent sqlalchemy Transparent SQL Compilation Caching from filling up during purge #71015

Prevent sqlalchemy Transparent SQL Compilation Caching from filling up during purge #71015

bdraco commented Apr 28, 2022 •

edited

Loading

probot-home-assistant bot commented Apr 28, 2022

bdraco commented Apr 28, 2022 •

edited

Loading

bdraco commented Apr 28, 2022

bdraco commented Apr 28, 2022

bdraco commented Apr 28, 2022

bdraco Apr 28, 2022

bdraco commented Apr 28, 2022

bdraco commented Apr 28, 2022

bdraco commented Apr 28, 2022 •

edited

Loading

bdraco Apr 29, 2022

bdraco Apr 29, 2022 •

edited

Loading

bdraco commented Apr 29, 2022

bdraco commented Apr 29, 2022

bdraco commented Apr 29, 2022

bdraco commented Apr 29, 2022 •

edited

Loading

balloob Apr 29, 2022

balloob Apr 29, 2022

bdraco Apr 29, 2022

		@@ -112,6 +114,118 @@ def _select_event_state_and_attributes_ids_to_purge(
		return event_ids, state_ids, attributes_ids


		def _generate_find_attr_lambda(attribute_ids: list[int]) -> StatementLambdaElement:

		return select(func.min(States.attributes_id)).where(States.attributes_id == attr)


		def _generate_find_attr_lambda(

Prevent sqlalchemy Transparent SQL Compilation Caching from filling up during purge #71015

Prevent sqlalchemy Transparent SQL Compilation Caching from filling up during purge #71015

Conversation

bdraco commented Apr 28, 2022 • edited Loading

Proposed change

Type of change

Additional information

Checklist

probot-home-assistant bot commented Apr 28, 2022

bdraco commented Apr 28, 2022 • edited Loading

bdraco commented Apr 28, 2022

bdraco commented Apr 28, 2022

bdraco commented Apr 28, 2022

bdraco Apr 28, 2022

Choose a reason for hiding this comment

bdraco commented Apr 28, 2022

bdraco commented Apr 28, 2022

bdraco commented Apr 28, 2022 • edited Loading

bdraco Apr 29, 2022

Choose a reason for hiding this comment

bdraco Apr 29, 2022 • edited Loading

Choose a reason for hiding this comment

bdraco commented Apr 29, 2022

bdraco commented Apr 29, 2022

bdraco commented Apr 29, 2022

bdraco commented Apr 29, 2022 • edited Loading

balloob Apr 29, 2022

Choose a reason for hiding this comment

balloob Apr 29, 2022

Choose a reason for hiding this comment

bdraco Apr 29, 2022

Choose a reason for hiding this comment

bdraco commented Apr 28, 2022 •

edited

Loading

bdraco commented Apr 28, 2022 •

edited

Loading

bdraco commented Apr 28, 2022 •

edited

Loading

bdraco Apr 29, 2022 •

edited

Loading

bdraco commented Apr 29, 2022 •

edited

Loading