Skip to content

Conversation

@Hakdag97
Copy link
Collaborator

Description

Added an implementation of the local outlier factor (lof) used for outlier classification. The bottleneck to reduce memory consumption lies in the pairwise distance matrix. Memory consumption can be significantly reduced by taking only the n smallest elements in the distance matrix into account (only these are needed to compute the lof). Thus, a new distance matrix, called cdist_small, was established that combines the functionality of cdist and topk, but has the advantage that the smallest n distances are dynamically choosen during computation without evaluating the whole distance matrix cdist at once.

Issue/s resolved: #1758

Changes proposed:

  • New implementation of lof
  • New implementation of distance matrix cdist_small with efficient memory consumption

Type of change

  • New feature

Does this change modify the behaviour of other functions? If so, which?

no

ClaudiaComito and others added 30 commits December 12, 2022 12:09
@github-actions
Copy link
Contributor

Thank you for the PR!

@codecov
Copy link

codecov bot commented Dec 19, 2025

Codecov Report

❌ Patch coverage is 81.67331% with 46 lines in your changes missing coverage. Please review.
✅ Project coverage is 90.40%. Comparing base (7ced436) to head (e08dfaf).

Files with missing lines Patch % Lines
heat/classification/localoutlierfactor.py 59.00% 41 Missing ⚠️
heat/spatial/distance.py 94.18% 5 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1825      +/-   ##
==========================================
- Coverage   91.73%   90.40%   -1.33%     
==========================================
  Files          89       90       +1     
  Lines       13942    14848     +906     
==========================================
+ Hits        12789    13423     +634     
- Misses       1153     1425     +272     
Flag Coverage Δ
unit 90.40% <81.67%> (-1.33%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions
Copy link
Contributor

Thank you for the PR!

@github-actions
Copy link
Contributor

Thank you for the PR!

@github-actions
Copy link
Contributor

Thank you for the PR!

mrfh92
mrfh92 previously approved these changes Dec 19, 2025
Copy link
Collaborator

@mrfh92 mrfh92 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comments so far are mostly on the docs
I now approve to let the CI matrix run

Define if the distances on each process are calculated iteratively. For example, if ``chunks=2``, the
each processes will first compute one half of the distance matrix and then the second half.
Returns
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually, we only have Parameters, Attributes, Notes and References, but no Raises or Returns section

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I have deleted them

idx : DNDarray
The indices used for advanced indexing.
Returns
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually, we dont use Returns as section
(also at the other functions)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adapted this in all functions

self.assertTrue(condition)

# test with memory-efficient implementation
lof = LocalOutlierFactor(n_neighbors=n_neighbors, fully_distributed=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could move the memory-efficient lof to a different test such that, in case it fails, its clear which configuration failed

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@github-project-automation github-project-automation bot moved this from In Progress to Merge queue in Roadmap Dec 19, 2025
Copy link
Collaborator

@mrfh92 mrfh92 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • lines 267-282 in _chunk_wise_topk (i.e. the whole "else-block" for chunks!=1) seem not to get tested.
  • lines 266-277 in _map_idx_to_proc are not covered.

If these parts of the code are somehow core ideas of the changes (are they?), they should be covered to ensure that they work.

@github-actions
Copy link
Contributor

Thank you for the PR!

@github-actions
Copy link
Contributor

Thank you for the PR!

@github-actions
Copy link
Contributor

github-actions bot commented Jan 5, 2026

Thank you for the PR!

@github-actions
Copy link
Contributor

github-actions bot commented Jan 5, 2026

Thank you for the PR!

@github-actions
Copy link
Contributor

github-actions bot commented Jan 5, 2026

Thank you for the PR!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Merge queue

Development

Successfully merging this pull request may close these issues.

Implementation of local outlier factor

5 participants