Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
d90c284
add barcode check
JelmerBot Nov 24, 2025
b87eba3
show non-zero self distances
JelmerBot Nov 24, 2025
5f0b782
add trial with other pruning strategies
JelmerBot Nov 25, 2025
9b51235
update pruning explanation; update barcode trial;
JelmerBot Nov 26, 2025
e8ffc2e
update plotting for paper template;
JelmerBot Dec 12, 2025
2e14a15
add 3.14 builds
JelmerBot Dec 12, 2025
cf814e0
exclude docs from sdist and wheels
JelmerBot Dec 15, 2025
5f0b927
update density profile figure
JelmerBot Dec 15, 2025
6d95c27
prep for release
JelmerBot Dec 16, 2025
14c7235
update figure sizing;
JelmerBot Dec 16, 2025
e1db4c9
prep for public builds
JelmerBot Dec 16, 2025
4cdae6f
update violin plot
JelmerBot Dec 16, 2025
a71c739
update doc dependencies
JelmerBot Dec 16, 2025
d2cad92
skip free-threading builds
JelmerBot Dec 16, 2025
bdb548c
fix build matrix
JelmerBot Dec 16, 2025
64a94ec
update macos deployment target
JelmerBot Dec 16, 2025
abdb58c
try platform independent test data;
JelmerBot Dec 16, 2025
b1578d2
revert trial;
JelmerBot Dec 16, 2025
4af39c4
run debug script on macos
JelmerBot Dec 16, 2025
f24bfd7
fix debug script
JelmerBot Dec 16, 2025
373eb2a
try larger min_samples
JelmerBot Dec 16, 2025
b46e813
reduce test sensitivity to MST ordering; fix edge-case in cluster lay…
JelmerBot Dec 16, 2025
5323af5
debug macos test plots
JelmerBot Dec 16, 2025
0a71406
continue after failing tests;
JelmerBot Dec 16, 2025
0666c40
increase tolerances after manually inspecting differences;
JelmerBot Dec 16, 2025
8e331d3
restore workflow
JelmerBot Dec 16, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 11 additions & 9 deletions .github/workflows/_build_wheels.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,16 +8,18 @@ jobs:
permissions:
contents: read
runs-on: ${{
(matrix.platform == 'macosx' && matrix.arch == 'x64' && 'macosx-15-intel') ||
(matrix.platform == 'macosx' && matrix.arch == 'arm64' && 'macosx-latest') ||
(matrix.platform == 'macosx' && matrix.arch == 'arm64' && 'macos-latest') ||
(matrix.platform == 'win' && matrix.arch == 'x64' && 'windows-latest') ||
(matrix.platform == 'win' && matrix.arch == 'arm64' && 'windows-11-arm64') ||
(endsWith(matrix.platform, 'linux') && matrix.arch == 'x64' && 'ubuntu-latest') ||
(endsWith(matrix.platform, 'linux') && matrix.arch == 'arm64' && 'ubuntu-24.04-arm64') }}
(endsWith(matrix.platform, 'linux') && matrix.arch == 'x64' && 'ubuntu-latest')}}
strategy:
matrix:
arch: [x64] # add arm64 and macosx when repo goes public
platform: [manylinux, win] # no scikit-learn wheels for musllinux
include: # no scikit-learn wheels for musllinux
- platform: win # arm64 runners not readily available
arch: x64 # macos-x64 is deprecated by Apple
- platform: manylinux
arch: x64
- platform: macosx
arch: arm64

steps:
- uses: actions/checkout@v5
Expand All @@ -32,14 +34,14 @@ jobs:
run: |
brew install libomp
echo "OpenMP_ROOT=$(brew --prefix libomp)" >> $GITHUB_ENV
echo "MACOSX_DEPLOYMENT_TARGET=${{ matrix.arch == 'arm64' && '14.0' || '13.0' }}" >> $GITHUB_ENV
echo "MACOSX_DEPLOYMENT_TARGET=${{ matrix.arch == 'arm64' && '15.0' || '13.0' }}" >> $GITHUB_ENV

- name: Build wheels
uses: pypa/cibuildwheel@v3.3.0
env:
CIBW_ARCHS: "native"
CIBW_BUILD: "*${{ matrix.platform }}*"
CIBW_SKIP: "*314*" # no scikit-learn wheels for Python 3.14(t) (yet)
CIBW_SKIP: "*t-${{ matrix.platform }}*" # skip free-threading builds
CIBW_BUILD_VERBOSITY : "1" # show build output for debugging
PIP_ONLY_BINARY: ":all:" # avoid compiling dependencies
CIBW_ENVIRONMENT_PASS_LINUX: "PIP_ONLY_BINARY" # also in the manylinux container
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/pull_request.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ jobs:
steps:
- uses: actions/checkout@v5
- name: Install linters
run: pip install black[jupyter]==25.1
run: pip install black[jupyter]==25.1
- uses: wearerequired/lint-action@v2
with:
black: true
Expand Down
28 changes: 25 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,15 @@
[![PyPi version](https://badge.fury.io/py/plscan.svg)](https://badge.fury.io/py/plscan)
![Conda version](https://anaconda.org/conda-forge/plscan/badges/version.svg)
[![Repository DOI](https://zenodo.org/badge/xxx.svg)](https://zenodo.org/doi/xxx/zenodo.yyy)

# Persistent Leaf Spatial Clustering for Applications with Noise

This library provides a new clustering algorithm based on HDBSCAN*. The primary
advantages of PLSCAN over the standard ``hdbscan`` library are:

- PLSCAN automatically finds the optimal minimum cluster size.
- PLSCAN can easily use all available cores to speed up computation;
- PLSCAN has much faster implementations of tree condensing and cluster extraction;
- PLSCAN can easily use all available cores to speed up computation.
- PLSCAN has much faster implementations of tree condensing and cluster extraction.
- PLSCAN does not rely on JIT compilation.

To use PLSCAN, you only need to set the ``min_samples`` parameter. This
Expand Down Expand Up @@ -130,6 +132,14 @@ Also update the `~/.zshrc` config file with:
export OpenMP_ROOT=$(brew --prefix)/opt/libomp
```

or pass `OpenMP_ROOT` as cmake argument:

```bash
pip install --no-deps --no-build-isolation \
-C cmake.args="-DOpenMP_ROOT=$(brew --prefix)/opt/libomp" \
-ve .
```

### Windows

The default MSVC C++ compiler on windows does not support
Expand Down Expand Up @@ -160,7 +170,19 @@ with the optional Clang compiler support enabled.

## Citing

TODO
When using this work, please cite our (upcoming) preprint:

```bibtex
@article{bot2025plscan,
title = {Persistent Multiscale Density-based Clustering},
author = {Dani{\"{e}}l M. Bot and Leland McInnes and Jan Aerts},
year = {2025},
month = {12},
archiveprefix = {arXiv},
eprint = {TODO},
primaryclass = {cs.CL}
}
```

## Licensing

Expand Down
1,515 changes: 1,338 additions & 177 deletions docs/_paper_figures.ipynb

Large diffs are not rendered by default.

416 changes: 416 additions & 0 deletions docs/_trial_barcodes.ipynb

Large diffs are not rendered by default.

529 changes: 529 additions & 0 deletions docs/_trial_persistence_pruning.ipynb

Large diffs are not rendered by default.

45 changes: 18 additions & 27 deletions docs/demo_computational_performance.ipynb

Large diffs are not rendered by default.

247 changes: 89 additions & 158 deletions docs/demo_parameter_sensitivity.ipynb

Large diffs are not rendered by default.

58 changes: 29 additions & 29 deletions docs/demo_selection_strategies.ipynb

Large diffs are not rendered by default.

Binary file modified docs/images/benchmark_time.pdf
Binary file not shown.
Binary file modified docs/images/leaf_tree_explainer_1.pdf
Binary file not shown.
Binary file modified docs/images/leaf_tree_explainer_2.pdf
Binary file not shown.
Binary file modified docs/images/parameter_sensitivity_articles_1442_5.pdf
Binary file not shown.
Binary file modified docs/images/parameter_sensitivity_articles_1442_80.pdf
Binary file not shown.
Binary file modified docs/images/parameter_sensitivity_audioset.pdf
Binary file not shown.
Binary file modified docs/images/parameter_sensitivity_authorship.pdf
Binary file not shown.
Binary file removed docs/images/parameter_sensitivity_boxplot.pdf
Binary file not shown.
Binary file modified docs/images/parameter_sensitivity_cardiotocography.pdf
Binary file not shown.
Binary file modified docs/images/parameter_sensitivity_cell_cycle_237.pdf
Binary file not shown.
Binary file modified docs/images/parameter_sensitivity_cifar_10.pdf
Binary file not shown.
Binary file modified docs/images/parameter_sensitivity_ecoli.pdf
Binary file not shown.
Binary file modified docs/images/parameter_sensitivity_elegans.pdf
Binary file not shown.
Binary file modified docs/images/parameter_sensitivity_fashion_mnist.pdf
Binary file not shown.
Binary file modified docs/images/parameter_sensitivity_header.pdf
Binary file not shown.
Binary file modified docs/images/parameter_sensitivity_iris.pdf
Binary file not shown.
Binary file modified docs/images/parameter_sensitivity_mfeat_factors.pdf
Binary file not shown.
Binary file modified docs/images/parameter_sensitivity_mfeat_karhunen.pdf
Binary file not shown.
Binary file modified docs/images/parameter_sensitivity_mnist.pdf
Binary file not shown.
Binary file modified docs/images/parameter_sensitivity_newsgroups.pdf
Binary file not shown.
Binary file modified docs/images/parameter_sensitivity_scaled.pdf
Binary file not shown.
Binary file modified docs/images/parameter_sensitivity_semeion.pdf
Binary file not shown.
Binary file modified docs/images/parameter_sensitivity_violinplot.pdf
Binary file not shown.
Binary file modified docs/images/parameter_sensitivity_yeast_galactose.pdf
Binary file not shown.
Binary file modified docs/images/plscan_cluster_layers.pdf
Binary file not shown.
Binary file modified docs/images/plscan_condensed_trees.pdf
Binary file not shown.
Binary file modified docs/images/plscan_density_smoothing.pdf
Binary file not shown.
Binary file added docs/images/plscan_density_smoothing_data.pdf
Binary file not shown.
Binary file modified docs/images/plscan_larger_layers.pdf
Binary file not shown.
Binary file modified docs/images/plscan_leaf_tree.pdf
Binary file not shown.
Binary file modified docs/images/plscan_parameter_example.pdf
Binary file not shown.
Binary file modified docs/images/plscan_persistence_trace.pdf
Binary file not shown.
22 changes: 19 additions & 3 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@

local_development

|PyPI version|
|PyPI version| |Conda version| |DOI badge|

Persistent Leaves Spatial Clustering of Applications with Noise
===============================================================
Expand Down Expand Up @@ -126,7 +126,20 @@ strengths.
Citing
------

TODO
When using this work, please cite our (upcoming) preprint:

.. code-block:: bibtex

@article{bot2025plscan,
title = {Persistent Multiscale Density-based Clustering},
author = {Dani{\"{e}}l M. Bot and Leland McInnes and Jan Aerts},
year = {2025},
month = {12},
archiveprefix = {arXiv},
eprint = {TODO},
primaryclass = {cs.CL}
}


Licensing
---------
Expand All @@ -135,4 +148,7 @@ The ``plscan`` package has a 3-Clause BSD license.

.. |PyPI version| image:: https://badge.fury.io/py/plscan.svg
:target: https://badge.fury.io/py/plscan

.. |Conda version| image:: https://anaconda.org/conda-forge/plscan/badges/version.svg
:target: https://anaconda.org/conda-forge/plscan
.. |DOI badge| image:: https://zenodo.org/badge/xxx.svg
:target: https://zenodo.org/doi/xxx/zenodo.yyy
30 changes: 21 additions & 9 deletions docs/lib/plotting.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,10 @@

# LaTeX font sizes on 10pt document:
# https://latex-tutorial.com/changing-font-size/
fontsize = dict(tiny=5, script=7, footnote=8, small=9, normal=10)
# for the pre-print template!
# fontsize = dict(tiny=5, script=7, footnote=8, small=9, normal=10)
# for the journal template!
fontsize = dict(tiny=6, script=8, footnote=9, small=10, normal=10.95)


def configure_matplotlib():
Expand Down Expand Up @@ -38,16 +41,24 @@ def configure_matplotlib():
"savefig.format": "png",
"font.family": "serif",
"text.usetex": True,
# For the pre-print template:
# "text.latex.preamble": dedent(
# r"""
# \usepackage[english]{babel}
# \usepackage[T1]{fontenc}
# \usepackage[varqu,varl]{inconsolata}
# \usepackage[
# theoremfont,trueslanted,largesc,p,
# amsthm,smallerops
# ]{newpx}
# \usepackage[scr=rsfso]{mathalpha}
# \usepackage[stretch=10,shrink=10,tracking,spacing,kerning,babel]{microtype}
# """
# ),
# For the journal template:
"text.latex.preamble": dedent(
r"""
\usepackage[english]{babel}
\usepackage[T1]{fontenc}
\usepackage[varqu,varl]{inconsolata}
\usepackage[
theoremfont,trueslanted,largesc,p,
amsthm,smallerops
]{newpx}
\usepackage[scr=rsfso]{mathalpha}
\usepackage[stretch=10,shrink=10,tracking,spacing,kerning,babel]{microtype}
"""
),
Expand All @@ -61,7 +72,8 @@ def sized_fig(width=0.5, aspect=0.618, dpi=None):
"""Create a figure with width as fraction of A4 page."""
if dpi is None:
dpi = 150
page_width_inch = 6.9305
# page_width_inch = 6.93050 # For the pre-print template
page_width_inch = 6.00117 # For the journal template
w = width * page_width_inch
h = aspect * w
return plt.figure(figsize=(w, h), dpi=dpi)
Expand Down
8 changes: 8 additions & 0 deletions docs/local_development.rst
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,14 @@ Also update the ``~/.zshrc`` config file with:

export OpenMP_ROOT=$(brew --prefix)/opt/libomp

or pass `OpenMP_ROOT` as cmake argument:

.. code-block:: bash

pip install --no-deps --no-build-isolation \
-C cmake.args="-DOpenMP_ROOT=$(brew --prefix)/opt/libomp" \
-ve .

Windows
-------

Expand Down
8 changes: 4 additions & 4 deletions docs/using_multiple_components.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": 6,
"id": "c91190a6",
"metadata": {},
"outputs": [
Expand All @@ -64,7 +64,7 @@
"from sklearn.neighbors import NearestNeighbors\n",
"\n",
"knn = NearestNeighbors(n_neighbors=50).fit(X).kneighbors(X)\n",
"c = PLSCAN(metric='precomputed').fit(knn)\n",
"c = PLSCAN(metric=\"precomputed\").fit(knn)\n",
"\n",
"plt.scatter(*X.T, c=c.labels_ % 10, s=1, linewidth=0, cmap=\"tab10\")\n",
"plt.axis(\"off\")\n",
Expand All @@ -83,7 +83,7 @@
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": 7,
"id": "2e0b7ea7",
"metadata": {},
"outputs": [
Expand All @@ -105,7 +105,7 @@
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": 8,
"id": "148fdcbc",
"metadata": {},
"outputs": [
Expand Down
14 changes: 11 additions & 3 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -20,13 +20,14 @@ dependencies = [
"scikit-learn>=1.6,<2",
]
classifiers = [
"Development Status :: 4 - Beta",
"Development Status :: 5 - Production/Stable",
"Programming Language :: C++",
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3.10",
"Programming Language :: Python :: 3.11",
"Programming Language :: Python :: 3.12",
"Programming Language :: Python :: 3.13",
"Programming Language :: Python :: 3.14",
"Operating System :: OS Independent",
"Intended Audience :: Science/Research",
"Intended Audience :: Developers",
Expand All @@ -36,7 +37,14 @@ classifiers = [

[project.optional-dependencies]
tests = ["pytest", "networkx", "pandas"]
docs = ["furo", "sphinx", "pandoc", "nbsphinx", "sphinx-copybutton"]
docs = [
"furo",
"sphinx",
"pandoc",
"nbsphinx",
"sphinx-copybutton",
"sphinx_autodoc_typehints",
]

[project.urls]
Documentation = "https://JelmerBot.github.io/plscan"
Expand All @@ -48,7 +56,7 @@ minimum-version = "0.4"
build-dir = "build/{wheel_tag}"
wheel.py-api = "cp312"
wheel.exclude = ["**.h", "**.cpp", "**/CMakeLists.txt"]
sdist.exclude = [".*", ".*/", "docs/data/", "docs/lib/", "docs/images/"]
sdist.exclude = [".*", ".*/", "docs/"]
metadata.version.provider = "scikit_build_core.metadata.setuptools_scm"

[tool.setuptools_scm]
Expand Down
5 changes: 4 additions & 1 deletion src/plscan/sklearn.py
Original file line number Diff line number Diff line change
Expand Up @@ -476,8 +476,11 @@ def cluster_layers(

"""
check_is_fitted(self, "_persistence_trace")
# Pad persistence with zero so maxima at the edges can be detected as peaks
x, y = self._persistence_trace
peaks = find_peaks(y, height=height, threshold=threshold, **kwargs)[0]
zero = np.array([0], dtype=y.dtype)
signal = np.hstack((zero, y, zero))
peaks = find_peaks(signal, height=height, threshold=threshold, **kwargs)[0] - 1

if min_size is not None:
peaks = peaks[x[peaks] >= min_size]
Expand Down
Binary file removed tests/baseline_images/test_plots/condensed_tree.png
Binary file not shown.
Binary file modified tests/baseline_images/test_plots/condensed_tree_args.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified tests/baseline_images/test_plots/condensed_tree_dens.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified tests/baseline_images/test_plots/condensed_tree_dist.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified tests/baseline_images/test_plots/condensed_tree_rank.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified tests/baseline_images/test_plots/leaf_tree.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified tests/baseline_images/test_plots/leaf_tree_args.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified tests/baseline_images/test_plots/persistence_trace.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified tests/baseline_images/test_plots/persistence_trace_args.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading