From 293a4799509d7223e80d68b802067f8d82e78e5a Mon Sep 17 00:00:00 2001 From: "Miss Islington (bot)" <31488909+miss-islington@users.noreply.github.com> Date: Tue, 26 Mar 2024 00:56:36 +0100 Subject: [PATCH] [3.12] Sync main docs and docstring for median_grouped(). (gh-117214) (gh-117241) --- Doc/library/statistics.rst | 77 +++++++++++++++++++------------------- 1 file changed, 39 insertions(+), 38 deletions(-) diff --git a/Doc/library/statistics.rst b/Doc/library/statistics.rst index a79b94c85cd168..d0274e8b20f74e 100644 --- a/Doc/library/statistics.rst +++ b/Doc/library/statistics.rst @@ -79,7 +79,7 @@ or sample. :func:`median` Median (middle value) of data. :func:`median_low` Low median of data. :func:`median_high` High median of data. -:func:`median_grouped` Median, or 50th percentile, of grouped data. +:func:`median_grouped` Median (50th percentile) of grouped data. :func:`mode` Single mode (most common value) of discrete or nominal data. :func:`multimode` List of modes (most common values) of discrete or nominal data. :func:`quantiles` Divide data into intervals with equal probability. @@ -329,55 +329,56 @@ However, for reading convenience, most of the examples show sorted sequences. be an actual data point rather than interpolated. -.. function:: median_grouped(data, interval=1) +.. function:: median_grouped(data, interval=1.0) - Return the median of grouped continuous data, calculated as the 50th - percentile, using interpolation. If *data* is empty, :exc:`StatisticsError` - is raised. *data* can be a sequence or iterable. + Estimates the median for numeric data that has been `grouped or binned + `_ around the midpoints + of consecutive, fixed-width intervals. - .. doctest:: + The *data* can be any iterable of numeric data with each value being + exactly the midpoint of a bin. At least one value must be present. - >>> median_grouped([52, 52, 53, 54]) - 52.5 + The *interval* is the width of each bin. - In the following example, the data are rounded, so that each value represents - the midpoint of data classes, e.g. 1 is the midpoint of the class 0.5--1.5, 2 - is the midpoint of 1.5--2.5, 3 is the midpoint of 2.5--3.5, etc. With the data - given, the middle value falls somewhere in the class 3.5--4.5, and - interpolation is used to estimate it: + For example, demographic information may have been summarized into + consecutive ten-year age groups with each group being represented + by the 5-year midpoints of the intervals: .. doctest:: - >>> median_grouped([1, 2, 2, 3, 4, 4, 4, 4, 4, 5]) - 3.7 - - Optional argument *interval* represents the class interval, and defaults - to 1. Changing the class interval naturally will change the interpolation: + >>> from collections import Counter + >>> demographics = Counter({ + ... 25: 172, # 20 to 30 years old + ... 35: 484, # 30 to 40 years old + ... 45: 387, # 40 to 50 years old + ... 55: 22, # 50 to 60 years old + ... 65: 6, # 60 to 70 years old + ... }) + ... + + The 50th percentile (median) is the 536th person out of the 1071 + member cohort. That person is in the 30 to 40 year old age group. + + The regular :func:`median` function would assume that everyone in the + tricenarian age group was exactly 35 years old. A more tenable + assumption is that the 484 members of that age group are evenly + distributed between 30 and 40. For that, we use + :func:`median_grouped`: .. doctest:: - >>> median_grouped([1, 3, 3, 5, 7], interval=1) - 3.25 - >>> median_grouped([1, 3, 3, 5, 7], interval=2) - 3.5 - - This function does not check whether the data points are at least - *interval* apart. - - .. impl-detail:: - - Under some circumstances, :func:`median_grouped` may coerce data points to - floats. This behaviour is likely to change in the future. - - .. seealso:: + >>> data = list(demographics.elements()) + >>> median(data) + 35 + >>> round(median_grouped(data, interval=10), 1) + 37.5 - * "Statistics for the Behavioral Sciences", Frederick J Gravetter and - Larry B Wallnau (8th Edition). + The caller is responsible for making sure the data points are separated + by exact multiples of *interval*. This is essential for getting a + correct result. The function does not check this precondition. - * The `SSMEDIAN - `_ - function in the Gnome Gnumeric spreadsheet, including `this discussion - `_. + Inputs may be any numeric type that can be coerced to a float during + the interpolation step. .. function:: mode(data)