Make the imputation iterative #8

nicolassidoux · 2025-02-05T18:00:28Z

When given a dataset to impute, the imputer will check if a big enough index can be built. If not, it will temporarily exclude enough features to reach this threshold and try again. If an index still can't be built, it will use fallbacks values as defined by the user strategy.
The "big enough index" is decided by threshold that can be passed when constructing the imputer.
The imputation will be performed again for any feature that was left behind in the previous step, making the process iterative.
Use fallback strategy for extremely sparse data where data is simply (= mean/median) imputed (as implemented by @Zethson) if an index can't be built at all.
The tests were improved to include some common tests as used in ehrapy, also @Zethson added a lot of tests for edge cases.

Signed-off-by: Lukas Heumos <lukas.heumos@posteo.net>

Zethson

Generally great! Let's see what your benchmarks give us.

It's midnight and I'm too tired to verify the correctness of your implementation but I can have another look after the benchmarks.

src/fknni/faiss/faiss.py

codecov · 2025-02-06T11:21:18Z

Codecov Report

Attention: Patch coverage is 91.01124% with 8 lines in your changes missing coverage. Please review.

Please upload report for BASE (main@8fb7fe6). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
src/fknni/faiss/faiss.py	91.01%	8 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##             main       #8   +/-   ##
=======================================
  Coverage        ?   92.45%           
=======================================
  Files           ?        3           
  Lines           ?      106           
  Branches        ?        0           
=======================================
  Hits            ?       98           
  Misses          ?        8           
  Partials        ?        0

Files with missing lines	Coverage Δ
src/fknni/faiss/faiss.py	`92.07% <91.01%> (ø)`

Zethson

Just a few more minor things and I'd love to merge this. Thank you!

src/fknni/faiss/faiss.py

tests/test_faiss_imputation.py

src/fknni/faiss/faiss.py

Zethson and others added 10 commits January 17, 2025 21:17

Add more tests

63b63f4

Signed-off-by: Lukas Heumos <lukas.heumos@posteo.net>

Add more tests

e1ef7b4

Signed-off-by: Lukas Heumos <lukas.heumos@posteo.net>

Update CI

d077c4e

Signed-off-by: Lukas Heumos <lukas.heumos@posteo.net>

Fix extremely sparse case

c5f06bf

Signed-off-by: Lukas Heumos <lukas.heumos@posteo.net>

Refactor

d474513

Signed-off-by: Lukas Heumos <lukas.heumos@posteo.net>

Refactor

f2fdd3b

Signed-off-by: Lukas Heumos <lukas.heumos@posteo.net>

Notebook update

ce28448

Signed-off-by: Lukas Heumos <lukas.heumos@posteo.net>

CP

cdcfd49

CP

95706ba

Ready for preliminary tests

048deb1

nicolassidoux linked an issue Feb 5, 2025 that may be closed by this pull request

Imputation sometimes returns an incompletely imputed dataset #7

Closed

nicolassidoux requested a review from Zethson February 5, 2025 18:01

Assert only for no valid neighbors found by FAISS

60cd7cc

Zethson reviewed Feb 5, 2025

View reviewed changes

nicolas.sidoux and others added 4 commits February 6, 2025 10:58

After @Zethson review

75909d8

Updated tests

9e9efb0

Impute a copy of X instead of in place

4798803

Added missing dependency

1bccf4e

nicolas.sidoux added 2 commits February 6, 2025 12:29

Removed override as it's unsupported by python 10

680beef

Removed override as it's unsupported by python 10

cd25540

Zethson reviewed Feb 19, 2025

View reviewed changes

Zethson marked this pull request as ready for review February 19, 2025 19:51

After @Zethson review

df817bc

Zethson reviewed Feb 19, 2025

View reviewed changes

src/fknni/faiss/faiss.py Show resolved Hide resolved

Zethson mentioned this pull request Feb 19, 2025

Fix Faiss edge cases #6

Closed

nicolassidoux changed the title ~~Fix faiss edge cases (alternative)~~ Fix faiss edge cases, Switch to iterative imputation Feb 19, 2025

After @Zethson review

3068c88

nicolassidoux changed the title ~~Fix faiss edge cases, Switch to iterative imputation~~ Made the imputation iterative Feb 19, 2025

nicolassidoux changed the title ~~Made the imputation iterative~~ Make the imputation iterative Feb 20, 2025

nicolassidoux mentioned this pull request Feb 20, 2025

Create a notebook for recent improvements #11

Open

Zethson merged commit 6879667 into main Feb 20, 2025
4 checks passed

nicolassidoux deleted the fix/faiss_edge_cases_alternative branch February 20, 2025 09:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make the imputation iterative #8

Make the imputation iterative #8

Uh oh!

nicolassidoux commented Feb 5, 2025 •

edited

Loading

Uh oh!

Zethson left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Feb 6, 2025

Uh oh!

Zethson left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Make the imputation iterative #8

Make the imputation iterative #8

Uh oh!

Conversation

nicolassidoux commented Feb 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Zethson left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Feb 6, 2025

Codecov Report

Uh oh!

Zethson left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

nicolassidoux commented Feb 5, 2025 •

edited

Loading