Skip to content

[MRG] fixed default values in kmeans doc #15754

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Dec 10, 2019
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 14 additions & 13 deletions sklearn/cluster/_k_means.py
Original file line number Diff line number Diff line change
Expand Up @@ -654,11 +654,12 @@ class KMeans(TransformerMixin, ClusterMixin, BaseEstimator):
Parameters
----------

n_clusters : int, optional, default: 8
n_clusters : int, default=8
The number of clusters to form as well as the number of
centroids to generate.

init : {'k-means++', 'random' or an ndarray}
init : {'k-means++', 'random'} or ndarray of shape \
(n_clusters, n_features), default='k-means++'
Method for initialization, defaults to 'k-means++':

'k-means++' : selects initial cluster centers for k-mean
Expand All @@ -671,19 +672,19 @@ class KMeans(TransformerMixin, ClusterMixin, BaseEstimator):
If an ndarray is passed, it should be of shape (n_clusters, n_features)
and gives the initial centers.

n_init : int, default: 10
n_init : int, default=10
Number of time the k-means algorithm will be run with different
centroid seeds. The final results will be the best output of
n_init consecutive runs in terms of inertia.

max_iter : int, default: 300
max_iter : int, default=300
Maximum number of iterations of the k-means algorithm for a
single run.

tol : float, default: 1e-4
tol : float, default=1e-4
Relative tolerance with regards to inertia to declare convergence.

precompute_distances : {'auto', True, False}
precompute_distances : 'auto' or bool, default='auto'
Precompute distances (faster but takes more memory).

'auto' : do not precompute distances if n_samples * n_clusters > 12
Expand All @@ -694,15 +695,15 @@ class KMeans(TransformerMixin, ClusterMixin, BaseEstimator):

False : never precompute distances.

verbose : int, default 0
verbose : int, default=0
Verbosity mode.

random_state : int, RandomState instance or None (default)
random_state : int, RandomState instance, default=None
Determines random number generation for centroid initialization. Use
an int to make the randomness deterministic.
See :term:`Glossary <random_state>`.

copy_x : bool, optional
copy_x : bool, default=True
When pre-computing distances it is more numerically accurate to center
the data first. If copy_x is True (default), then the original data is
not modified, ensuring X is C-contiguous. If False, the original data
Expand All @@ -711,28 +712,28 @@ class KMeans(TransformerMixin, ClusterMixin, BaseEstimator):
the data mean, in this case it will also not ensure that data is
C-contiguous which may cause a significant slowdown.

n_jobs : int or None, optional (default=None)
n_jobs : int, default=None
The number of jobs to use for the computation. This works by computing
each of the n_init runs in parallel.

``None`` means 1 unless in a :obj:`joblib.parallel_backend` context.
``-1`` means using all processors. See :term:`Glossary <n_jobs>`
for more details.

algorithm : "auto", "full" or "elkan", default="auto"
algorithm : {"auto", "full", "elkan"}, default="auto"
K-means algorithm to use. The classical EM-style algorithm is "full".
The "elkan" variation is more efficient by using the triangle
inequality, but currently doesn't support sparse data. "auto" chooses
"elkan" for dense data and "full" for sparse data.

Attributes
----------
cluster_centers_ : array, [n_clusters, n_features]
cluster_centers_ : ndarray of shape (n_clusters, n_features)
Coordinates of cluster centers. If the algorithm stops before fully
converging (see ``tol`` and ``max_iter``), these will not be
consistent with ``labels_``.

labels_ : array, shape (n_samples,)
labels_ : ndarray of shape (n_samples,)
Labels of each point

inertia_ : float
Expand Down