Efficient and fuzzy clustering based on the CLARA algorithm
- Authors: Maximilian Weigert, Alexander Bauer, Jana Gauss
- Contributors: Theresa Kriecherbauer, Asmik Nalmpatian
- Version: 1.0.1
The fuzzyclara package tackles two issues of cluster analysis applications.
First, it includes routines for fuzzy clustering which avoid the common hard
clustering assumption that each observation is a clear member of one sole
cluster. Instead, membership probabilities indicate to which extent the
characteristics of each observation are shaped by the characteristics of several
'typical' clusters. Second, the estimation of classical clustering algorithms
is often only hardly or not at all feasible in large data situations with
thousands of observations. Subsampling-based algorithms building on the CLARA
algorithm are implemented to make the estimation feasible in such situations.
Building on these two points, the 'fuzzyclara' package offers routines for all
aspects of a cluster analysis, including the use of user-defined distance
functions and diverse visualization techniques.
To get an overview of the functionalities of the package, check out the JOSS publication or the package vignette.
The most current version from GitHub can be installed via
devtools::install_github("bauer-alex/fuzzyclara")
# potential installation problems (specifically on MacOS) might be resolved
# by previously specifically installing some dependency packages
install.packages(c("vegclust", "ggwordcloud", "ggpubr", "factoextra"))If you encounter problems with the package, find bugs or have suggestions for additional functionalities please open a GitHub issue. Alternatively, feel free to contact us directly via email.
Contributions (via pull requests or otherwise) are welcome. Please adhere to the Advanced R style guide when contributing code. Before you open a pull request or share your updates with us, please make sure that all unit tests pass without errors or warning messages. You can run the unit tests by calling
devtools::test()