You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This library requires PHP 8.3 or newer. Support of older versions like [markrogoyski/math-php](https://github.com/markrogoyski/math-php) provides for PHP 7.2+ is not planned.
22
24
25
+
26
+
23
27
## Installation
24
28
25
29
```bash
26
30
composer require tomkyle/binning
27
31
```
28
32
29
33
34
+
30
35
## Usage
31
36
32
37
The **BinSelection** class provides several methods for determining the optimal number of bins for histogram creation and optimal bin width. You can either use specific methods directly or the general `suggestBins()` and `suggestBinWidth()` methods with different strategies.
33
38
39
+
40
+
34
41
### Determine Bin Width
35
42
36
43
Use the **suggestBinWidth** method to get the *optimal bin width* based on the selected method. The method returns the bin width, often referred to as 𝒉, as a float value.
Uses the cube root of the sample size, generally provides more bins than *Sturges*. Formula as taught by David M. Lane at Rice University. — **N.B.** This *Rice Rule* seems to be not the original. In fact, *Terrell-Scott’s* (1985) seems to be. Also note that both variants can yield different results under certain circumstances. This Lane’s variant from the early 2000s is however more commonly cited:
|**Freedman–Diaconis**| Uses the IQR to set 𝒉, so it is robust against outliers and adapts to data spread. <br />⚠️ May over‐smooth heavily skewed or multi‐modal data when IQR is small. |
280
+
|**Sturges’ Rule**| Very simple, works well for roughly normal, moderate-sized datasets. <br />⚠️ Ignores outliers and underestimates bin count for large or skewed samples. |
281
+
|**Rice Rule**| Independent of data shape and easy to compute. <br />⚠️ Prone to over‐ or under‐smoothing when the distribution is heavy‐tailed or skewed. |
282
+
|**Terrell–Scott**| Similar approach as *Rice Rule* but with asymptotically optimal MISE properties; gives more bins than Sturges and adapts better at large 𝒏. <br />⚠️ Still ignores skewness and outliers. |
283
+
|**Square Root Rule**| Simply the square root, so it requires no distributional estimates. <br />⚠️ May produce too few bins for complex distributions — or too many for very noisy data. |
284
+
|**Doane’s Rule**| Extends *Sturges’ Rule* by adding a skewness correction. Improving performance on asymmetric data.<br />⚠️ Requires estimating the third moment (skewness), which can be unstable for small 𝒏. |
285
+
|**Scott’s Rule**| Uses standard deviation to minimize MISE, providing good balance for unimodal, symmetric data. <br />⚠️ Sensitive to outliers (inflated $\sigma$) and may underperform on skewed distributions. |
286
+
287
+
288
+
289
+
## Literature
290
+
291
+
Rubia, J.M.D.L. (2024):
292
+
**Rice University Rule to Determine the Number of Bins.**
|**Freedman–Diaconis**| Uses the IQR to set 𝒉, so it is robust against outliers and adapts to data spread. <br />⚠️ May over‐smooth heavily skewed or multi‐modal data when IQR is small. |
266
-
|**Sturges’ Rule**| Very simple, works well for roughly normal, moderate-sized datasets. <br />⚠️ Ignores outliers and underestimates bin count for large or skewed samples. |
267
-
|**Rice Rule**| Independent of data shape and easy to compute. <br />⚠️ Prone to over‐ or under‐smoothing when the distribution is heavy‐tailed or skewed. |
268
-
|**Terrell–Scott**| Similar approach as *Rice Rule* but with asymptotically optimal MISE properties; gives more bins than Sturges and adapts better at large 𝒏. <br />⚠️ Still ignores skewness and outliers. |
269
-
|**Square Root Rule**| Simply the square root, so it requires no distributional estimates. <br />⚠️ May produce too few bins for complex distributions — or too many for very noisy data. |
270
-
|**Doane’s Rule**| Extends *Sturges’ Rule* by adding a skewness correction. Improving performance on asymmetric data.<br />⚠️ Requires estimating the third moment (skewness), which can be unstable for small 𝒏. |
271
-
|**Scott’s Rule**| Uses standard deviation to minimize MISE, providing good balance for unimodal, symmetric data. <br />⚠️ Sensitive to outliers (inflated $\sigma$) and may underperform on skewed distributions. |
272
357
273
-
## Literature
274
-
275
-
Rubia, J.M.D.L. (2024):
276
-
**Rice University Rule to Determine the Number of Bins.**
0 commit comments