PERF-#4445: Stop recomputing both indices for axis-wide applies.#4460
PERF-#4445: Stop recomputing both indices for axis-wide applies.#4460mvashishtha wants to merge 2 commits intomodin-project:masterfrom
Conversation
Codecov Report
@@ Coverage Diff @@
## master #4460 +/- ##
==========================================
- Coverage 86.82% 86.19% -0.63%
==========================================
Files 222 222
Lines 18041 18100 +59
==========================================
- Hits 15664 15602 -62
- Misses 2377 2498 +121
📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more |
There was a problem hiding this comment.
What if we would pass new row lengths and new column widths from the QC?
There was a problem hiding this comment.
The QC probably shouldn't know anything about this (partition lengths) though. However, what do you think on this?
There was a problem hiding this comment.
func_may_change_complementary_index_size looks a bit clumsy.
…applies. Signed-off-by: mvashishtha <mahesh@ponder.io>
Signed-off-by: mvashishtha <mahesh@ponder.io>
68c4302 to
a1b861d
Compare
devin-petersohn
left a comment
There was a problem hiding this comment.
@mvashishtha I like the idea of splitting the functionality like this, but maybe we need a new operator instead of handling it like this?
The difference between the new operator and the current |
|
If we do not want to pass lengths from the QC to the core dataframe, I would suggest renaming |
|
@mvashishtha For the algebra, typically there are two things that qualify an operator:
We typically want each operator to have certain guarantees without requiring too much information from upper layers. This is why I believe it should be a different operator, since both metadata and data can change at the same time, which isn't something the current operators explicitly support. |
|
@devin-petersohn The new param I introduced to modin/modin/core/dataframe/pandas/dataframe/dataframe.py Lines 2101 to 2102 in cca9468 |
|
@mvashishtha for now, let's keep this fix local to the axis-wide apply. we should revisit after. |
Signed-off-by: mvashishtha mahesh@ponder.io
What do these changes do?
Stop recomputing both indices for axis-wide applies. These changes make the slow computation in the snippet from #4445 (comment) asynchronous.
flake8 modin/ asv_bench/benchmarks scripts/doc_checker.pyblack --check modin/ asv_bench/benchmarks scripts/doc_checker.pygit commit -sdocs/development/architecture.rstis up-to-date