Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase the efficiency of the SqlaGroup.nodes iterator #4094

Merged

Conversation

sphuber
Copy link
Contributor

@sphuber sphuber commented May 20, 2020

Fixes #4090

The iterator was calling dbnodes.__iter__() which was violting
the lazy loading of the nodes as they are needed during iteration
leading to a huge load time for big groups even when only a slice
is demanded. For reference, on a group with roughly half a million
nodes, getting a single one through group.nodes[0] went from
taking a minute to a few milliseconds.

P.S.: not sure how to test this. Could think of a dedicated performance test on Jenkins, but for a regression I am not sure it is worth effort

The iterator was calling `dbnodes.__iter__()` which was violting
the lazy loading of the nodes as they are needed during iteration
leading to a huge load time for big groups even when only a slice
is demanded.
@codecov
Copy link

codecov bot commented May 20, 2020

Codecov Report

Merging #4094 into develop will decrease coverage by 0.01%.
The diff coverage is 100.00%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #4094      +/-   ##
===========================================
- Coverage    78.80%   78.79%   -0.00%     
===========================================
  Files          463      463              
  Lines        34414    34413       -1     
===========================================
- Hits         27115    27114       -1     
  Misses        7299     7299              
Flag Coverage Δ
#django 70.71% <0.00%> (+0.01%) ⬆️
#sqlalchemy 71.59% <100.00%> (-<0.01%) ⬇️
Impacted Files Coverage Δ
aiida/orm/implementation/sqlalchemy/groups.py 86.67% <100.00%> (-0.07%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f57a0b9...7470541. Read the comment docs.

@ramirezfranciscof
Copy link
Member

So, I don't understand how this changes so much in the performance. It was after all only passing the iterator to an intermediate variable, right? Not the whole list. What am I missing?

@sphuber
Copy link
Contributor Author

sphuber commented May 20, 2020

I did some profiling to fully understand but it is quite complex and I also couldn't completely understand it. Calling the __iter__() on the dbnodes attribute, which is a sqlalchemy.orm.dynamic.AppenderQuery, was causing additional database each time _genfunction was being called. In the end it does not matter much, because dbnodes already behaves like an iterator and so there is no need to create another wrapping generator.

@ramirezfranciscof
Copy link
Member

was causing additional database each time

Additional database...what? Did you miss a word there or I'm not understanding something.

Also, re-testing:

P.S.: not sure how to test this. Could think of a dedicated performance test on Jenkins, but for a regression I am not sure it is worth effort

What do you mean "for a regression"? In any case, is there a way to set up maybe a separate suite of automatic tests that runs on develop on a nightly basis (so say, from 2-6 am) instead of with every PR/commit? Maybe we could build some performance tests that way (so, the test itself doesn't seem to be the slow part but building the DB, correct? so this would have to be designed so that for this suite the DB is built only once and is used in all performance tests).

Anyways, yes, this is a two:one line change, it doesn't get any more specific than this, change aproved.

Copy link
Member

@ramirezfranciscof ramirezfranciscof left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is ascii art allowed?

     _
   _| |
 _| | |
| | | |
| | | | __
| | | |/  \
|       /\ \
|      /  \/
|      \  /\
|       \/ /
 \        /
  |     /
  |    |

@sphuber
Copy link
Contributor Author

sphuber commented May 20, 2020

Is ascii art allowed?

 _______  __    _  _______  _______  __   __  ______    _______  _______  _______  ______     _______  __   __  _______  __    _ 
|       ||  |  | ||       ||       ||  | |  ||    _ |  |   _   ||       ||       ||      |   |       ||  | |  ||       ||  |  | |
|    ___||   |_| ||       ||   _   ||  | |  ||   | ||  |  |_|  ||    ___||    ___||  _    |  |    ___||  |_|  ||    ___||   |_| |
|   |___ |       ||       ||  | |  ||  |_|  ||   |_||_ |       ||   | __ |   |___ | | |   |  |   |___ |       ||   |___ |       |
|    ___||  _    ||      _||  |_|  ||       ||    __  ||       ||   ||  ||    ___|| |_|   |  |    ___||       ||    ___||  _    |
|   |___ | | |   ||     |_ |       ||       ||   |  | ||   _   ||   |_| ||   |___ |       |  |   |___  |     | |   |___ | | |   |
|_______||_|  |__||_______||_______||_______||___|  |_||__| |__||_______||_______||______|   |_______|  |___|  |_______||_|  |__|

@sphuber sphuber merged commit 0f23507 into aiidateam:develop May 20, 2020
@sphuber sphuber deleted the fix/4090/sqla-group-nodes-iterator branch May 20, 2020 18:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Iterator of the SqlaGroup implementation incorrectly preloads all nodes when slicing
2 participants