Increase the efficiency of the `SqlaGroup.nodes` iterator #4094

sphuber · 2020-05-20T13:45:10Z

The iterator was calling dbnodes.__iter__() which was violting
the lazy loading of the nodes as they are needed during iteration
leading to a huge load time for big groups even when only a slice
is demanded. For reference, on a group with roughly half a million
nodes, getting a single one through group.nodes[0] went from
taking a minute to a few milliseconds.

P.S.: not sure how to test this. Could think of a dedicated performance test on Jenkins, but for a regression I am not sure it is worth effort

The iterator was calling `dbnodes.__iter__()` which was violting the lazy loading of the nodes as they are needed during iteration leading to a huge load time for big groups even when only a slice is demanded.

codecov · 2020-05-20T14:03:48Z

Codecov Report

Merging #4094 into develop will decrease coverage by 0.01%.
The diff coverage is 100.00%.

@@             Coverage Diff             @@
##           develop    #4094      +/-   ##
===========================================
- Coverage    78.80%   78.79%   -0.00%     
===========================================
  Files          463      463              
  Lines        34414    34413       -1     
===========================================
- Hits         27115    27114       -1     
  Misses        7299     7299

Flag	Coverage Δ
#django	`70.71% <0.00%> (+0.01%)`	⬆️
#sqlalchemy	`71.59% <100.00%> (-<0.01%)`	⬇️

Impacted Files	Coverage Δ
aiida/orm/implementation/sqlalchemy/groups.py	`86.67% <100.00%> (-0.07%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f57a0b9...7470541. Read the comment docs.

ramirezfranciscof · 2020-05-20T14:51:03Z

So, I don't understand how this changes so much in the performance. It was after all only passing the iterator to an intermediate variable, right? Not the whole list. What am I missing?

sphuber · 2020-05-20T16:50:00Z

I did some profiling to fully understand but it is quite complex and I also couldn't completely understand it. Calling the __iter__() on the dbnodes attribute, which is a sqlalchemy.orm.dynamic.AppenderQuery, was causing additional database each time _genfunction was being called. In the end it does not matter much, because dbnodes already behaves like an iterator and so there is no need to create another wrapping generator.

ramirezfranciscof · 2020-05-20T16:59:18Z

was causing additional database each time

Additional database...what? Did you miss a word there or I'm not understanding something.

Also, re-testing:

P.S.: not sure how to test this. Could think of a dedicated performance test on Jenkins, but for a regression I am not sure it is worth effort

What do you mean "for a regression"? In any case, is there a way to set up maybe a separate suite of automatic tests that runs on develop on a nightly basis (so say, from 2-6 am) instead of with every PR/commit? Maybe we could build some performance tests that way (so, the test itself doesn't seem to be the slow part but building the DB, correct? so this would have to be designed so that for this suite the DB is built only once and is used in all performance tests).

Anyways, yes, this is a two:one line change, it doesn't get any more specific than this, change aproved.

ramirezfranciscof

Is ascii art allowed?

     _
   _| |
 _| | |
| | | |
| | | | __
| | | |/  \
|       /\ \
|      /  \/
|      \  /\
|       \/ /
 \        /
  |     /
  |    |

sphuber · 2020-05-20T18:45:56Z

Is ascii art allowed?

 _______  __    _  _______  _______  __   __  ______    _______  _______  _______  ______     _______  __   __  _______  __    _ 
|       ||  |  | ||       ||       ||  | |  ||    _ |  |   _   ||       ||       ||      |   |       ||  | |  ||       ||  |  | |
|    ___||   |_| ||       ||   _   ||  | |  ||   | ||  |  |_|  ||    ___||    ___||  _    |  |    ___||  |_|  ||    ___||   |_| |
|   |___ |       ||       ||  | |  ||  |_|  ||   |_||_ |       ||   | __ |   |___ | | |   |  |   |___ |       ||   |___ |       |
|    ___||  _    ||      _||  |_|  ||       ||    __  ||       ||   ||  ||    ___|| |_|   |  |    ___||       ||    ___||  _    |
|   |___ | | |   ||     |_ |       ||       ||   |  | ||   _   ||   |_| ||   |___ |       |  |   |___  |     | |   |___ | | |   |
|_______||_|  |__||_______||_______||_______||___|  |_||__| |__||_______||_______||______|   |_______|  |___|  |_______||_|  |__|

Increase the efficiency of the SqlaGroup.nodes iterator

7470541

The iterator was calling `dbnodes.__iter__()` which was violting the lazy loading of the nodes as they are needed during iteration leading to a huge load time for big groups even when only a slice is demanded.

sphuber requested review from giovannipizzi and ramirezfranciscof May 20, 2020 14:03

ramirezfranciscof approved these changes May 20, 2020

View reviewed changes

sphuber merged commit 0f23507 into aiidateam:develop May 20, 2020

sphuber deleted the fix/4090/sqla-group-nodes-iterator branch May 20, 2020 18:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Increase the efficiency of the `SqlaGroup.nodes` iterator #4094

Increase the efficiency of the `SqlaGroup.nodes` iterator #4094

sphuber commented May 20, 2020 •

edited

Loading

codecov bot commented May 20, 2020

ramirezfranciscof commented May 20, 2020

sphuber commented May 20, 2020

ramirezfranciscof commented May 20, 2020

ramirezfranciscof left a comment

sphuber commented May 20, 2020

Increase the efficiency of the SqlaGroup.nodes iterator #4094

Increase the efficiency of the SqlaGroup.nodes iterator #4094

Conversation

sphuber commented May 20, 2020 • edited Loading

codecov bot commented May 20, 2020

Codecov Report

ramirezfranciscof commented May 20, 2020

sphuber commented May 20, 2020

ramirezfranciscof commented May 20, 2020

ramirezfranciscof left a comment

Choose a reason for hiding this comment

sphuber commented May 20, 2020

Increase the efficiency of the `SqlaGroup.nodes` iterator #4094

Increase the efficiency of the `SqlaGroup.nodes` iterator #4094

sphuber commented May 20, 2020 •

edited

Loading