Redefining model parameter `compute`

Right now, setting `compute=True` triggers a sequential computation of the following steps:

1. the preprocessor scaler
2. optional NaN checks
3. SVD
4. scores and components

I’m starting to doubt the usefulness of the `compute` parameter in its current form. When @slevang managed to implement fully lazy evaluation, I thought it made sense to keep this parameter, but now I'm not so sure. My experience has been that it’s almost always faster to let Dask handle everything and find the most efficient way to compute, rather than forcing intermediate steps with `compute=True`. I plan to run some benchmarks when I get a chance to confirm this.

The only real benefit I see for this parameter might be in cross-decomposition models (like CCA, MCA, etc.), where PCA preprocessing is done on individual datasets to make subsequent steps more computationally feasible. Often, this results in a reduced PC space that fits into memory, so continuing with eager computation for the SVD, components, scores, etc., makes sense. Another advantage could be when defining the number of PCs based on explained variance rather than a fixed number—this requires computing individual PCA models upfront.

So, I’m thinking it might be more practical to redefine the `compute` parameter as something like `compute_preprocessor`. This would compute the following in one go, using Dask’s efficiency:

1. the preprocessor scaler
2. optional NaN checks
3. PCA preprocessing

From there, we can continue with eager computation for the SVD, scores, and components.

I’d love to hear your thoughts on this @slevang since you have probably much more experience with Dask’s magic! :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Redefining model parameter `compute` #187

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Redefining model parameter compute #187

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Redefining model parameter `compute` #187