Skip to content

Conversation

@nicolassidoux
Copy link
Collaborator

@nicolassidoux nicolassidoux commented Feb 5, 2025

  • When given a dataset to impute, the imputer will check if a big enough index can be built. If not, it will temporarily exclude enough features to reach this threshold and try again. If an index still can't be built, it will use fallbacks values as defined by the user strategy.
  • The "big enough index" is decided by threshold that can be passed when constructing the imputer.
  • The imputation will be performed again for any feature that was left behind in the previous step, making the process iterative.
  • Use fallback strategy for extremely sparse data where data is simply (= mean/median) imputed (as implemented by @Zethson) if an index can't be built at all.
  • The tests were improved to include some common tests as used in ehrapy, also @Zethson added a lot of tests for edge cases.

Zethson and others added 10 commits January 17, 2025 21:17
Signed-off-by: Lukas Heumos <lukas.heumos@posteo.net>
Signed-off-by: Lukas Heumos <lukas.heumos@posteo.net>
Signed-off-by: Lukas Heumos <lukas.heumos@posteo.net>
Signed-off-by: Lukas Heumos <lukas.heumos@posteo.net>
Signed-off-by: Lukas Heumos <lukas.heumos@posteo.net>
Signed-off-by: Lukas Heumos <lukas.heumos@posteo.net>
Signed-off-by: Lukas Heumos <lukas.heumos@posteo.net>
@nicolassidoux nicolassidoux linked an issue Feb 5, 2025 that may be closed by this pull request
@nicolassidoux nicolassidoux requested a review from Zethson February 5, 2025 18:01
Copy link
Owner

@Zethson Zethson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally great! Let's see what your benchmarks give us.

It's midnight and I'm too tired to verify the correctness of your implementation but I can have another look after the benchmarks.

@codecov
Copy link

codecov bot commented Feb 6, 2025

Codecov Report

Attention: Patch coverage is 91.01124% with 8 lines in your changes missing coverage. Please review.

Please upload report for BASE (main@8fb7fe6). Learn more about missing BASE report.

Files with missing lines Patch % Lines
src/fknni/faiss/faiss.py 91.01% 8 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main       #8   +/-   ##
=======================================
  Coverage        ?   92.45%           
=======================================
  Files           ?        3           
  Lines           ?      106           
  Branches        ?        0           
=======================================
  Hits            ?       98           
  Misses          ?        8           
  Partials        ?        0           
Files with missing lines Coverage Δ
src/fknni/faiss/faiss.py 92.07% <91.01%> (ø)

Copy link
Owner

@Zethson Zethson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few more minor things and I'd love to merge this. Thank you!

@Zethson Zethson marked this pull request as ready for review February 19, 2025 19:51
@Zethson Zethson mentioned this pull request Feb 19, 2025
@nicolassidoux nicolassidoux changed the title Fix faiss edge cases (alternative) Fix faiss edge cases, Switch to iterative imputation Feb 19, 2025
@nicolassidoux nicolassidoux changed the title Fix faiss edge cases, Switch to iterative imputation Made the imputation iterative Feb 19, 2025
@nicolassidoux nicolassidoux changed the title Made the imputation iterative Make the imputation iterative Feb 20, 2025
@Zethson Zethson merged commit 6879667 into main Feb 20, 2025
4 checks passed
@nicolassidoux nicolassidoux deleted the fix/faiss_edge_cases_alternative branch February 20, 2025 09:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Imputation sometimes returns an incompletely imputed dataset

2 participants