FSML (Fortran Statistics and Machine Learning) is a scientific toolkit consisting of common statistical and machine learning procedures, including basic statistics (e.g., mean, variance, correlation), common statistical tests (e.g., t-test, Mann–Whitney U), linear parametric methods and models (e.g., multiple OLS regression, discriminant analysis), and non-linear statistical and machine learning procedures (e.g., k-means clustering).
- Common statistics and machine learning techniques (as used in modern research).
- Familiar/intuitive interface (similarities to popular Python or R libs).
- Core procedures are kept pure (to simplify parallelisation and testing), while impure wrappers handle optional arguments and errors for safe conventional use.
- Minimal requirements/dependencies (Fortran 2008 or later, and stdlib).
FSML has five thematic modules: Basic statistics (STS), hypothesis tests (TST), linear procedures (LIN), non-linear procedures (NLP), and statistical distribution functions (DST).
The FSML Handbook. includes a short tutorial, detailed API documentation, as well as information for contributors and licence (MIT) details. The documentation pages were generated by FORD.
The aim is to create an easy-to-use library for modern Fortran applications that covers many statistics and machine learning procedures that are commonly used in research.
FSML started as an effort to rewrite, re-structure, clean-up, and enhance old Fortran code I've written in the past 15 years, and to bundle and publish it as a well organised and well documented library.
The published research below uses some of the code that was reworked for this project:
- Mutz and Ehlers (2019) (k-means and hierarchical clustering, and discriminant analysis).
- Mutz et al. (2015) (multiple regression in cross validation and bootstrap setting, principal component analysis, and Bayesian classifier).
Currently covered are procedures for basic statistics (STS), statistical distributions (DST), statistical tests (TST), procedures that rely heavily on linear algebra (LIN), and non-linear algorithmic procedures (NLP). See the full list here. Additionally planned are machine learning framework extensions (e.g., cross-validation) and further additions to the NLP module.
FSML is offered as an FPM package with examples and tests.
