-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added Statistical Hierarchical Clusterer #18
base: master
Are you sure you want to change the base?
Conversation
Hi Dalibor, thank you for all the work. Here are a few comments/thoughts:
Thanks! -Michael |
Gr8 gr8. Anyways, we are not in a rush with this. I noticed that SHC C++ code somehow doesn't compile nicely with mingw. Yet it compiles perfectly under Unix/Linux OS-es, which leaves me puzzled. I mean, mingw should be Linux compilers and libraries ported to the Windows OS.... I need to see what happens there. You guessed right, I work on Unix all the time, so I don't have these issues. I'll throw ggplot2 out, since it is used only to nicely plot the SHC clustering results. This is probably not needed. About DSC registry. I understand the need to have a central point where users could query for clusterers based on their needs. For that reason I think that classifying only into micro and macro clusterer categories is not sufficient. As a business user I want to see which clusterers support my needs, what capabilities they have. I think that DSC registry should be more extensive and provide users some way to select clusterers based on many attributes, such as:
SHC is equally a clusterer and an outlier detector. That is the reason why "DSC_Outlier". But this is an interesting debate. For example COD, MCOD are being ONLY outlier detectors (with some rudimentary clustering capabilities), while SHC is all-in-one. Maybe "DSCO_SHC" and "DS_Outlier" (the abstract class)? We are definitely adding some new capabilities to the stream package, and I recognize that this should be carefully introduced. There will be no turning back later on. |
OK. Let me know if the following is good for the next release:
Once that is done, then you can adapt your new code (should only be a few lines). Let me know if you think this is a consistent way to deal with this and I will go ahead. -Michael |
A few more questions:
|
So, in stream 4.0 we introduced the abstract DSC_Outlier (which can be renamed as you suggested), and can be found in DSC.R, has the following methods:
Outlier are indeed NOT percevied as micro or macro clusters and are totally distinct category. We also need that for outlier accuraccy indices... which are added to the stream package in 4.0 as well. DSC_SinglePass |
I have added the Statistical Hierarchical Clusterer, which is also an outlier detector and a single pass clusterer.
You can find it as DSC_SHC in the source code.
Dalibor.