-
Notifications
You must be signed in to change notification settings - Fork 224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increase the efficiency of the SqlaGroup.nodes
iterator
#4094
Increase the efficiency of the SqlaGroup.nodes
iterator
#4094
Conversation
The iterator was calling `dbnodes.__iter__()` which was violting the lazy loading of the nodes as they are needed during iteration leading to a huge load time for big groups even when only a slice is demanded.
Codecov Report
@@ Coverage Diff @@
## develop #4094 +/- ##
===========================================
- Coverage 78.80% 78.79% -0.00%
===========================================
Files 463 463
Lines 34414 34413 -1
===========================================
- Hits 27115 27114 -1
Misses 7299 7299
Continue to review full report at Codecov.
|
So, I don't understand how this changes so much in the performance. It was after all only passing the iterator to an intermediate variable, right? Not the whole list. What am I missing? |
I did some profiling to fully understand but it is quite complex and I also couldn't completely understand it. Calling the |
Additional database...what? Did you miss a word there or I'm not understanding something. Also, re-testing:
What do you mean "for a regression"? In any case, is there a way to set up maybe a separate suite of automatic tests that runs on develop on a nightly basis (so say, from 2-6 am) instead of with every PR/commit? Maybe we could build some performance tests that way (so, the test itself doesn't seem to be the slow part but building the DB, correct? so this would have to be designed so that for this suite the DB is built only once and is used in all performance tests). Anyways, yes, this is a two:one line change, it doesn't get any more specific than this, change aproved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is ascii art allowed?
_
_| |
_| | |
| | | |
| | | | __
| | | |/ \
| /\ \
| / \/
| \ /\
| \/ /
\ /
| /
| |
|
Fixes #4090
The iterator was calling
dbnodes.__iter__()
which was violtingthe lazy loading of the nodes as they are needed during iteration
leading to a huge load time for big groups even when only a slice
is demanded. For reference, on a group with roughly half a million
nodes, getting a single one through
group.nodes[0]
went fromtaking a minute to a few milliseconds.
P.S.: not sure how to test this. Could think of a dedicated performance test on Jenkins, but for a regression I am not sure it is worth effort