Scikit-longitudinal (Sklong) is a machine learning library designed to analyse
longitudinal data (Classification tasks focussed as of today). It offers tools and models for processing, analysing,
and predicting longitudinal data, with a user-friendly interface that
integrates with the Scikit-learn ecosystem.
Wait, what is Longitudinal Data — In layman's terms ?
Longitudinal data is a "time-lapse" snapshot of the same subject, entity, or group tracked over time-periods, similar to checking in on patients to see how they change. For instance, doctors may monitor a patient's blood pressure, weight, and cholesterol every year for a decade to identify health trends or risk factors. This data is more useful for predicting future results than a one-time survey because it captures evolution, patterns, and cause-effect throughout time.
Not enough?
- For more scientific details, you can refer to our paper published in the Journal of Open Source Software (JOSS).
- For more technical details, visit the official documentation.
Note
Want to be using Jupyter Notebook, Marimo, Google Colab, or JupyterLab?
Head to the Getting Started section of the documentation, we explain it all! 🎉
Additionally, note that Scikit-longitudinal works on Python 3.10+ to 3.13.
To install Scikit-longitudinal:
-
✅ Install the latest version:
pip install Scikit-longitudinal
To install a specific version:
pip install Scikit-longitudinal==0.1.0
Need Ray-backed parallelism? Install the optional extra:
pip install Scikit-longitudinal[parallelisation]Parallel features automatically prompt you to install this extra when missing.
Here's how to analyse longitudinal data with Scikit-longitudinal:
from scikit_longitudinal.data_preparation import LongitudinalDataset
from scikit_longitudinal.estimators.ensemble.lexicographical.lexico_gradient_boosting import LexicoGradientBoostingClassifier
dataset = LongitudinalDataset('./stroke.csv') # Note this is a fictional dataset. Use yours!
dataset.load_data_target_train_test_split(
target_column="class_stroke_wave_4",
)
# Pre-set or manually set your temporal dependencies
dataset.setup_features_group(input_data="elsa")
model = LexicoGradientBoostingClassifier(
features_group=dataset.feature_groups(),
threshold_gain=0.00015 # Refer to the API for more hyper-parameters and their meaning
)
model.fit(dataset.X_train, dataset.y_train)
y_pred = model.predict(dataset.X_test)
# Classification report
print(classification_report(y_test, y_pred))If you use Sklong in your research, please cite our paper:
@article{Provost2025,
doi = {10.21105/joss.08481},
url = {https://doi.org/10.21105/joss.08481},
year = {2025},
publisher = {The Open Journal},
volume = {10},
number = {112},
pages = {8481},
author = {Provost, Simon and Freitas, Alex A.},
title = {Scikit-Longitudinal: A Machine Learning Library for Longitudinal Classification in Python},
journal = {Journal of Open Source Software}
}Scikit-longitudinal is licensed under the MIT License.