Update README.md

tomkyle · tomkyle · commit 2ac56d4d5a8f · 2025-06-25T14:20:55.000+02:00
diff --git a/README.md b/README.md
@@ -16,21 +16,28 @@
 - Terrell-Scott’s Rule (1985)
 - Rice University Rule
 
+
+
 ## Requirements
 
 This library requires PHP 8.3 or newer. Support of older versions like [markrogoyski/math-php](https://github.com/markrogoyski/math-php) provides for PHP 7.2+ is not planned.
 
+
+
 ## Installation
 
 ```bash
 composer require tomkyle/binning
 ```
 
 
+
 ## Usage
 
 The **BinSelection** class provides several methods for determining the optimal number of bins for histogram creation and optimal bin width. You can either use specific methods directly or the general `suggestBins()` and `suggestBinWidth()` methods with different strategies.
 
+
+
 ### Determine Bin Width
 
 Use the **suggestBinWidth** method to get the *optimal bin width* based on the selected method. The method returns the bin width, often referred to as 𝒉, as a float value.
@@ -92,6 +99,10 @@ $k = BinSelection::suggestBins($data, BinSelection::RICE);
 
 
 
+---
+
+
+
 ### Explicit method calls
 
 You can also call the specific methods directly to get the bin width 𝒉 or number of bins 𝒌.
@@ -103,35 +114,53 @@ The result array contains additional information like the data range 𝑹, the i
 
 
 
+---
+
+
+
 #### 1. Pearson’s Square Root Rule (1892)
 
 Simple rule using the square root of the sample size.
 
-$ k = \left \lceil \sqrt{n} \; \right \rceil $
+$$
+k = \left \lceil \sqrt{n} \; \right \rceil 
+$$
 
 ```php
 $k = BinSelection::squareRoot($data);
 ```
 
 
 
+---
+
+
+
 #### 2. Sturges’s Rule (1926)
 
 Based on the logarithm of the sample size. Good for normal distributions.
 
-$ k = 1 + \left \lceil \; \log_2(n) \; \right \rceil $
+$$
+k = 1 + \left \lceil \; \log_2(n) \; \right \rceil
+$$
 
 ```php
 $k = BinSelection::sturges($data);
 ```
 
 
 
+---
+
+
+
 #### 3. Doane’s Rule (1976)
 
 Improvement of *Sturges*’ rule that accounts for data skewness.
 
-$ k = 1 + \left\lceil \; \log_2(n) + \log_2\left(1 + \frac{|g_1|}{\sigma_{g_1}}\right) \; \right \rceil $
+$$
+k = 1 + \left\lceil \; \log_2(n) + \log_2\left(1 + \frac{|g_1|}{\sigma_{g_1}}\right) \; \right \rceil 
+$$
 
 ```php
 // Using sample-based calculation (default)
@@ -143,15 +172,25 @@ $k = BinSelection::doane($data, population: true);
 
 
 
+---
+
+
+
 #### 4. Scott’s Rule (1979)
 
 Based on the standard deviation and sample size. Good for continuous data.
 
-$ h = \frac{3.49\,\hat{\sigma}}{\sqrt[3]{n}} $ 
+$$
+h = \frac{3.49\,\hat{\sigma}}{\sqrt[3]{n}} 
+$$
 
-$ R = \max_i x_i - \min_i x_i $
+$$
+R = \max_i x_i - \min_i x_i 
+$$
 
-$ k = \left \lceil \frac{R}{h} \right \rceil $
+$$
+k = \left \lceil \frac{R}{h} \right \rceil 
+$$
 
 The result is an array with keys `width`, `bins`, `range`, and `stddev`. Map them to variables like so:
 
@@ -161,17 +200,29 @@ list($h, $k, $R, stddev) = BinSelection::scott($data);
 
 
 
+---
+
+
+
 #### 5. Freedman-Diaconis Rule (1981)
 
 Based on the interquartile range (IQR). Robust against outliers.
 
-$ IQR = Q_3 - Q_1 $
+$$
+IQR = Q_3 - Q_1 
+$$
 
-$ h = 2 \times \frac{\mathrm{IQR}}{\sqrt[3]{n}} $
+$$
+h = 2 \times \frac{\mathrm{IQR}}{\sqrt[3]{n}} 
+$$
 
-$ R = \text{max}_i x_i - \text{min}_i x_i $
+$$
+R = \text{max}_i x_i - \text{min}_i x_i 
+$$
 
-$ k = \left \lceil \frac{R}{h} \right \rceil $
+$$
+k = \left \lceil \frac{R}{h} \right \rceil 
+$$
 
 The result is an array with keys `width`, `bins`, `range`, and `IQR`. Map them to variables like so:
 
@@ -181,30 +232,73 @@ list($h, $k, $R, $IQR) = BinSelection::freedmanDiaconis($data);
 
 
 
+---
+
+
+
 #### 6. Terrell-Scott’s Rule (1985)
 
 Uses the cube root of the sample size, generally provides more bins than *Sturges*. This is the original *Rice Rule*:
 
-$ k = \left \lceil \; \sqrt[3]{2n} \enspace \right \rceil = \left \lceil \; (2n)^{1/3} \; \right \rceil $
+$$
+k = \left \lceil \; \sqrt[3]{2n} \enspace \right \rceil = \left \lceil \; (2n)^{1/3} \; \right \rceil 
+$$
 
 ```php
 $k = BinSelection::terrellScott($data);
 ```
 
 
 
+---
+
+
+
 #### 7. Rice University Rule
 
 Uses the cube root of the sample size, generally provides more bins than *Sturges*. Formula as taught by David M. Lane at Rice University. — **N.B.** This *Rice Rule* seems to be not the original. In fact, *Terrell-Scott’s* (1985) seems to be. Also note that both variants can yield different results under certain circumstances. This Lane’s variant from the early 2000s is however more commonly cited:
 
-$ k = 2 \times \left \lceil \; \sqrt[3]{n} \enspace \right \rceil =  2 \times \left \lceil \; n^{1/3} \; \right \rceil $
+$$
+k = 2 \times \left \lceil \; \sqrt[3]{n} \enspace \right \rceil =  2 \times \left \lceil \; n^{1/3} \; \right \rceil 
+$$
 
 ```php
 $k = BinSelection::rice($data);
 ```
 
 
 
+---
+
+
+
+## Method Selection Guidelines
+
+| Rule                  | Strengths & Weaknesses                                       |
+| --------------------- | ------------------------------------------------------------ |
+| **Freedman–Diaconis** | Uses the IQR to set 𝒉, so it is robust against outliers and adapts to data spread. <br />⚠️ May over‐smooth heavily skewed or multi‐modal data when IQR is small. |
+| **Sturges’ Rule**     | Very simple, works well for roughly normal, moderate-sized datasets. <br />⚠️ Ignores outliers and underestimates bin count for large or skewed samples. |
+| **Rice Rule**         | Independent of data shape and easy to compute. <br />⚠️ Prone to over‐ or under‐smoothing when the distribution is heavy‐tailed or skewed. |
+| **Terrell–Scott**     | Similar approach as *Rice Rule* but with asymptotically optimal MISE properties; gives more bins than Sturges and adapts better at large 𝒏. <br />⚠️ Still ignores skewness and outliers. |
+| **Square Root Rule**  | Simply the square root, so it requires no distributional estimates. <br />⚠️ May produce too few bins for complex distributions — or too many for very noisy data. |
+| **Doane’s Rule**      | Extends *Sturges’ Rule* by adding a skewness correction. Improving performance on asymmetric data.<br />⚠️ Requires estimating the third moment (skewness), which can be unstable for small 𝒏. |
+| **Scott’s Rule**      | Uses standard deviation to minimize MISE, providing good balance for unimodal, symmetric data. <br />⚠️  Sensitive to outliers (inflated $\sigma$) and may underperform on skewed distributions. |
+
+
+
+## Literature
+
+Rubia, J.M.D.L. (2024): 
+**Rice University Rule to Determine the Number of Bins.**
+Open Journal of Statistics, 14, 119-149.
+DOI: [10.4236/ojs.2024.141006](https://doi.org/10.4236/ojs.2024.141006) 
+
+Wikipedia: 
+**Histogram / Number of bins and width**
+https://en.wikipedia.org/wiki/Histogram#Number_of_bins_and_width
+
+
+
 ## Practical Example
 
 ```php
@@ -236,6 +330,8 @@ foreach ($methods as $name => $method) {
 }
 ```
 
+
+
 ## Error Handling
 
 All methods will throw `InvalidArgumentException` for invalid inputs:
@@ -258,28 +354,7 @@ try {
 }
 ```
 
-## Method Selection Guidelines
-
-| Rule                  | Strengths & Weaknesses                                       |
-| --------------------- | ------------------------------------------------------------ |
-| **Freedman–Diaconis** | Uses the IQR to set 𝒉, so it is robust against outliers and adapts to data spread. <br />⚠️ May over‐smooth heavily skewed or multi‐modal data when IQR is small. |
-| **Sturges’ Rule**     | Very simple, works well for roughly normal, moderate-sized datasets. <br />⚠️ Ignores outliers and underestimates bin count for large or skewed samples. |
-| **Rice Rule**         | Independent of data shape and easy to compute. <br />⚠️ Prone to over‐ or under‐smoothing when the distribution is heavy‐tailed or skewed. |
-| **Terrell–Scott**     | Similar approach as *Rice Rule* but with asymptotically optimal MISE properties; gives more bins than Sturges and adapts better at large 𝒏. <br />⚠️ Still ignores skewness and outliers. |
-| **Square Root Rule**  | Simply the square root, so it requires no distributional estimates. <br />⚠️ May produce too few bins for complex distributions — or too many for very noisy data. |
-| **Doane’s Rule**      | Extends *Sturges’ Rule* by adding a skewness correction. Improving performance on asymmetric data.<br />⚠️ Requires estimating the third moment (skewness), which can be unstable for small 𝒏. |
-| **Scott’s Rule**      | Uses standard deviation to minimize MISE, providing good balance for unimodal, symmetric data. <br />⚠️  Sensitive to outliers (inflated $\sigma$) and may underperform on skewed distributions. |
 
-## Literature
-
-Rubia, J.M.D.L. (2024): 
-**Rice University Rule to Determine the Number of Bins.**
-Open Journal of Statistics, 14, 119-149.
-DOI: [10.4236/ojs.2024.141006](https://doi.org/10.4236/ojs.2024.141006) 
-
-Wikipedia: 
-**Histogram / Number of bins and width**
-https://en.wikipedia.org/wiki/Histogram#Number_of_bins_and_width