knn classification #310

s-weil · 2023-10-27T09:52:26Z

Thank you for contributing to FSharp.Stats. Please take the time to tell us a bit more about your PR.

Closes 300

Please list the changes introduced in this PR

added first version for the KNN classification

Description

The KNN algorithm is implemented in different versions [see ML\Unsupervised\KNN.fs], namely

for arrays
sequences
via a classification object (with convencience methods)
Code documentation, examples and unit tests are provided.

NOTE: further specialisations of the algorithm for improved performance exist and versions are possible (such as a vector version). Moreover one could consider the feature of parallelized predictions (may only make sense in case of a big point cloud).

The project builds without problems on your machine
Added unit tests regarding the added features

cleaned up, added docs and code examples, adjusted tests

codecov-commenter · 2023-10-27T10:11:45Z

Codecov Report

Attention: 32 lines in your changes are missing coverage. Please review.

Comparison is base (4719e96) 47.00% compared to head (8f887a5) 47.16%.
Report is 2 commits behind head on developer.

Files	Patch %	Lines
src/FSharp.Stats/ML/Unsupervised/KNN.fs	28.20%	25 Missing and 3 partials ⚠️
tests/FSharp.Stats.Tests/ML.fs	94.28%	0 Missing and 4 partials ⚠️

Additional details and impacted files

@@              Coverage Diff              @@
##           developer     #310      +/-   ##
=============================================
+ Coverage      47.00%   47.16%   +0.15%     
=============================================
  Files            148      149       +1     
  Lines          16458    16567     +109     
  Branches        2219     2230      +11     
=============================================
+ Hits            7736     7813      +77     
- Misses          8052     8077      +25     
- Partials         670      677       +7

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

smoothdeveloper · 2023-10-27T10:52:26Z

src/FSharp.Stats/ML/Unsupervised/KNN.fs

+            (x        : 'a) 
+            : 'l option =
+
+            if Seq.isEmpty points || Seq.length points <> Seq.length labels || k <= 0 then


Would it be better to pass on the consumer the need to put both points and labels in a single sequence of tuples?

It would make the interface more conforming to the array implementation and save two calls to Seq.length.

Many thanks for the valuable feedback!
Indeed, I also believe it would be clearer to have a single sequence of tuples, and the array version uses the type 'LabeledPoint' (which is nothing more than a more descriptive tuple).

The reason for this outcome, was the motivation by the existing Python versions, see e.g. here (Step 2)
which uses the seperate arguments, as well as the need to construct the tuple seq in case you started with seperate data (ofc. the converse argument also holds).

That being said, I am happy to change it as suggested. What do you think?

I agree with @smoothdeveloper that it would be beneficial if both points and labels are passed as single sequence. This reduces user errors by accidently resorting one of the collections. If you could make this small adjustment, I'll be happy to merge

Thanks!
i replaced the seperated params by a tupled sequence.
also i removed the tupe LabeledPoint by just tuples, because it seemed as just overhead

smoothdeveloper · 2023-10-27T10:59:56Z

src/FSharp.Stats/ML/Unsupervised/KNN.fs

+
+    open FSharp.Stats.DistanceMetrics
+
+    module Array =


Technically, it would be better to put RequireQualifiedAccess attribute on the Array and Seq modules, as it is possible to open KNN.Array/Seq otherwise.

I'm assuming:

those members aren't destined for raw usage from client code

the attribute on KNN is to discourage ambiguous usage of predict to surface in client code

we are encouraging the usage of Classifier type

Again, thanks!

So indeed I added the RequireQualifiedAccess to prevent misuse of the Classifier and predict functions.
Moving the RequireQualifiedAccess to Array and Seq would leave the Classifier 'unprotected' (currently it needs to constructued with KNN.Classifier) and I could move it to Array too or some new module or simply rename it to KnnClassifier.

The intention of the Classifier versus the Array.predict / Seq.predict versions was a more Python style version more functional style use. Moreover the Classifier provides convencience methods to construct the data in the required format, and to run single and multiple predictions.
Ultimately I am not sure of what the best versions are in terms of best 'user experience'.

Happy for your thoughts and feedback!

I really like the option to have this convenience layer! I think it is okay to have the requiredAccess added to KNN even if one could open KNN.Array. It allows to have Classifier protected and prevent confusion

smoothdeveloper · 2023-10-27T11:00:40Z

src/FSharp.Stats/ML/Unsupervised/KNN.fs

+        member this.fit(lps : LabeledPoint<'a, 'l> array) =
+            this.labeledPoints <- lps
+
+        member this.fit(points : 'a array, labels : 'l array) =


Suggesting adding a comment that explains it will fail if both arrays aren't of same length

Good point, will add it

It would be great to have XML comments here 👍

Added XML comments

bvenn · 2023-10-30T13:58:57Z

src/FSharp.Stats/ML/Unsupervised/KNN.fs

+
+    open FSharp.Stats.DistanceMetrics
+
+    module Array =


I really like the option to have this convenience layer! I think it is okay to have the requiredAccess added to KNN even if one could open KNN.Array. It allows to have Classifier protected and prevent confusion

bvenn · 2023-11-06T08:19:29Z

src/FSharp.Stats/ML/Unsupervised/KNN.fs

+            (x        : 'a) 
+            : 'l option =
+
+            if Seq.isEmpty points || Seq.length points <> Seq.length labels || k <= 0 then


I agree with @smoothdeveloper that it would be beneficial if both points and labels are passed as single sequence. This reduces user errors by accidently resorting one of the collections. If you could make this small adjustment, I'll be happy to merge

bvenn · 2023-11-06T08:20:09Z

src/FSharp.Stats/ML/Unsupervised/KNN.fs

+        member this.fit(lps : LabeledPoint<'a, 'l> array) =
+            this.labeledPoints <- lps
+
+        member this.fit(points : 'a array, labels : 'l array) =


It would be great to have XML comments here 👍

bvenn · 2023-11-15T09:05:51Z

Thanks for this awsome and well prepared addition @s-weil!

s-weil added 2 commits October 17, 2023 14:15

add KNN classifier and logic to ML Unsupervised, add unit tests

4b6f1cf

documentation, tests, clean up

9919c4d

cleaned up, added docs and code examples, adjusted tests

smoothdeveloper reviewed Oct 27, 2023

View reviewed changes

bvenn requested changes Nov 8, 2023

View reviewed changes

PR feedback - xml docs, tupled params, remove labeld point

8f887a5

bvenn approved these changes Nov 15, 2023

View reviewed changes

bvenn merged commit f9971c5 into fslaborg:developer Nov 15, 2023
2 checks passed

s-weil deleted the feature/knn branch November 15, 2023 09:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

knn classification #310

knn classification #310

s-weil commented Oct 27, 2023

codecov-commenter commented Oct 27, 2023 •

edited

Loading

smoothdeveloper Oct 27, 2023

s-weil Oct 29, 2023 •

edited

Loading

bvenn Nov 6, 2023

s-weil Nov 15, 2023

smoothdeveloper Oct 27, 2023

s-weil Oct 29, 2023

bvenn Oct 30, 2023

smoothdeveloper Oct 27, 2023

s-weil Oct 29, 2023

bvenn Nov 6, 2023

s-weil Nov 15, 2023

bvenn Oct 30, 2023

bvenn Nov 6, 2023

bvenn Nov 6, 2023

bvenn commented Nov 15, 2023

knn classification #310

knn classification #310

Conversation

s-weil commented Oct 27, 2023

codecov-commenter commented Oct 27, 2023 • edited Loading

Codecov Report

Choose a reason for hiding this comment

s-weil Oct 29, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bvenn commented Nov 15, 2023

codecov-commenter commented Oct 27, 2023 •

edited

Loading

s-weil Oct 29, 2023 •

edited

Loading