doc update

pandas-dev · jreback · Nov 29, 2018 · Oct 30, 2018 · Nov 6, 2018 · Nov 11, 2018
commit 95f19bc41d69ff74f13e24b2da88f8aa7887d62a
diff --git a/doc/source/10min.rst b/doc/source/10min.rst
@@ -121,17 +121,33 @@ Display the index, columns:
    df.index
    df.columns
 
-:attr:`DataFrame.values` gives a NumPy representation of the underlying data.
-However, this can be an expensive operation when your :class:`DataFrame` has
-columns with different data types. **NumPy arrays have a single dtype for
-the entire array, so accessing ``df.values`` may have to coerce data**. We
-recommend using ``df.values`` only when you know that your data has a single
-data type.
+:meth:`DataFrame.to_numpy` gives a NumPy representation of the underlying data.
+Note that his can be an expensive operation when your :class:`DataFrame` has
+columns with different data types, which comes down to a fundamental difference
+between pandas and NumPy: **NumPy arrays have one dtype for the entire array,
+while pandas DataFrames have one dtype per column**. When you call
+:meth:`DataFrame.to_numpy`, pandas will find the NumPy dtype that can hold *all*
+of the dtypes in the DataFrame. This may end up being ``object``, which requires
+casting every value to a Python object.
+
+For ``df``, our :class:`DataFrame` of all floating-point values,
+:meth:`DataFrame.to_numpy` is fast and doesn't require copying data.
 
 .. ipython:: python
 
-   df.values
+   df.to_numpy()
+
+For ``df2``, the :class:`DataFrame` with multiple dtypes,
+:meth:`DataFrame.to_numpy` is relatively expensive.
+
+.. ipython:: python
+
+   df2.to_numpy()
+
+.. note::
 
+   :meth:`DataFrame.to_numpy` does *not* include the index or column
+   labels in the output.
 
 :func:`~DataFrame.describe` shows a quick statistic summary of your data:
 

diff --git a/doc/source/basics.rst b/doc/source/basics.rst
@@ -69,7 +69,7 @@ thought of as containers for arrays, which hold the actual data and do the
 actual computation. For many types, the underlying array is a
 :class:`numpy.ndarray`. However, pandas and 3rd party libraries may *extend*
 NumPy's type system to add support for custom arrays
-(see :ref:`dsintro.data_types`).
+(see :ref:`basics.dtypes`).
 
 To get the actual data inside a :class:`Index` or :class:`Series`, use
 the **array** property
@@ -1951,17 +1951,29 @@ dtypes
 ------
 
 For the most part, pandas uses NumPy arrays and dtypes for Series or individual
-columns of a DataFrame. The main types allowed in pandas objects are ``float``,
-``int``, ``bool``, and ``datetime64[ns]`` (note that NumPy does not support
-timezone-aware datetimes).
-
-In addition to NumPy's types, pandas :ref:`extends <extending.extension-types>`
-NumPy's type-system for a few cases.
-
-* :ref:`Categorical <categorical>`
-* :ref:`Datetime with Timezone <timeseries.timezone_series>`
-* :ref:`Period <timeseries.periods>`
-* :ref:`Interval <indexing.intervallindex>`
+columns of a DataFrame. NumPy provides support for ``float``,
+``int``, ``bool``, ``timedelta64[ns]`` and ``datetime64[ns]`` (note that NumPy
+does not support timezone-aware datetimes).
+
+Pandas and third-party libraries *extend* NumPy's type system in a few places.
+This section describes the extensions pandas has made internally.
+See :ref:`extending.extension-types` for how to write your own extension that
+works with pandas. See :ref:`ecosystem.extensions` for a list of third-party
+libraries that have implemented an extension.
+
+The following table lists all of pandas extension types. See the respective
+documentation sections for more on each type.
+
+=================== ========================= ================== ============================= =============================
+Kind of Data        Data Type                 Scalar             Array                         Documentation
+=================== ========================= ================== ============================= =============================
+tz-aware datetime   :class:`DatetimeArray`    :class:`Timestamp` :class:`arrays.DatetimeArray` :ref:`timeseries.timezone`
+Categorical         :class:`CategoricalDtype` (none)             :class:`Categorical`          :ref:`categorical`
+period (time spans) :class:`PeriodDtype`      :class:`Period`    :class:`arrays.PeriodArray`   :ref:`timeseries.periods`
+sparse              :class:`SparseDtype`      (none)             :class:`arrays.SparseArray`   :ref:`sparse`
+intervals           :class:`IntervalDtype`    :class:`Interval`  :class:`arrays.IntervalArray` :ref:`advanced.intervalindex`
+nullable integer    :clsas:`Int64Dtype`, ...  (none)             :class:`arrays.IntegerArray`  :ref:`integer_na`
+=================== ========================= ================== ============================= =============================
 
 Pandas uses the ``object`` dtype for storing strings.
 

diff --git a/doc/source/dsintro.rst b/doc/source/dsintro.rst
@@ -142,7 +142,7 @@ However, operations such as slicing will also slice the index.
    We will address array-based indexing like ``s[[4, 3, 1]]``
    in :ref:`section <indexing>`.
 
-Like a NumPy array, a pandas Series as a :attr:`Series.dtype`.
+Like a NumPy array, a pandas Series has a :attr:`~Series.dtype`.
 
 .. ipython:: python
 
@@ -151,7 +151,8 @@ Like a NumPy array, a pandas Series as a :attr:`Series.dtype`.
 This is often a NumPy dtype. However, pandas and 3rd-party libraries
 extend NumPy's type system in a few places, in which case the dtype would
 be a :class:`~pandas.api.extensions.ExtensionDtype`. Some examples within
-pandas are :ref:`categorical` and :ref:`integer_na`. See :ref:`dsintro.data_type` for more.
+pandas are :ref:`categorical` and :ref:`integer_na`. See :ref:`basics.dtypes`
+for more.
 
 If you need the actual array backing a ``Series``, use :attr:`Series.array`.
 
@@ -160,7 +161,7 @@ If you need the actual array backing a ``Series``, use :attr:`Series.array`.
    s.array
 
 Again, this is often a NumPy array, but may instead be a
-:class:`~pandas.api.extensions.ExtensionArray`. See :ref:`dsintro.data_type` for more.
+:class:`~pandas.api.extensions.ExtensionArray`. See :ref:`basics.dtypes` for more.
 Accessing the array can be useful when you need to do some operation without the
 index (to disable :ref:`automatic alignment <dsintro.alignment>`, for example).
 
@@ -859,35 +860,6 @@ completion mechanism so they can be tab-completed:
     In [5]: df.fo<TAB>
     df.foo1  df.foo2
 
-.. _dsintro.data_type:
-
-Data Types
-----------
-
-Pandas type system is mostly built on top of `NumPy's <https://docs.scipy.org/doc/numpy-1.15.1/reference/arrays.dtypes.html>`__.
-NumPy provides the basic arrays and data types for numeric
-string, *tz-naive* datetime, and others types of data.
-
-Pandas and third-party libraries *extend* NumPy's type system in a few places.
-This section describes the extensions pandas has made internally.
-See :ref:`extending.extension-types` for how to write your own extension that
-works with pandas. See :ref:`ecosystem.extensions` for a list of third-party
-libraries that have implemented an extension.
-
-The following table lists all of pandas extension types. See the respective
-documentation sections for more on each type.
-
-=================== ========================= ================== ============================= =============================
-Kind of Data        Data Type                 Scalar             Array                         Documentation
-=================== ========================= ================== ============================= =============================
-tz-aware datetime   :class:`DatetimeArray`    :class:`Timestamp` :class:`arrays.DatetimeArray` :ref:`timeseries.timezone`
-Categorical         :class:`CategoricalDtype` (none)             :class:`Categorical`          :ref:`categorical`
-period (time spans) :class:`PeriodDtype`      :class:`Period`    :class:`arrays.PeriodArray`   :ref:`timeseries.periods`
-sparse              :class:`SparseDtype`      (none)             :class:`arrays.SparseArray`   :ref:`sparse`
-intervals           :class:`IntervalDtype`    :class:`Interval`  :class:`arrays.IntervalArray` :ref:`advanced.intervalindex`
-nullable integer    :clsas:`Int64Dtype`, ...  (none)             :class:`arrays.IntegerArray`  :ref:`integer_na`
-=================== ========================= ================== ============================= =============================
-
 .. _basics.panel:
 
 Panel

diff --git a/doc/source/whatsnew/v0.24.0.rst b/doc/source/whatsnew/v0.24.0.rst
@@ -67,6 +67,8 @@ as ``.values``).
    ser.array
    ser.to_numpy()
 
+See :ref:`basics.dtypes` and :ref:`dsintro.attrs` for more.
+
 .. _whatsnew_0240.enhancements.extension_array_operators:
 
 ``ExtensionArray`` operator support

diff --git a/pandas/core/base.py b/pandas/core/base.py
@@ -778,7 +778,7 @@ def array(self):
         Union[ndarray, ExtensionArray]
             This is the actual array stored within this object. This differs
             from ``.values`` which may require converting the data
-            to a different form. We recommend using :
+            to a different form.
 
         Notes
         -----

diff --git a/pandas/core/frame.py b/pandas/core/frame.py
@@ -1144,7 +1144,7 @@ def to_numpy(self):
         >>> df = pd.DataFrame({"A": [1, 2], "B": [3.0, 4.5]})
         >>> df.to_numpy()
 
-        When numeric and non-numeric types, the output array will
+        For a mix of numeric and non-numeric types, the output array will
         have object dtype.
 
         >>> df['C'] = pd.date_range('2000', periods=2)

diff --git a/pandas/core/generic.py b/pandas/core/generic.py
@@ -4928,6 +4928,10 @@ def values(self):
         """
         Return a Numpy representation of the DataFrame.
 
+        .. warning::
+
+           We recommend using :meth:`DataFrame.to_numpy` instead.
+
         Only the values in the DataFrame will be returned, the axes labels
         will be removed.
 

diff --git a/pandas/core/indexes/base.py b/pandas/core/indexes/base.py
@@ -724,8 +724,9 @@ def values(self):
 
         .. warning::
 
-           We recommend you use :attr:`Index.array` or
-           :meth:`Index.to_numpy` instead of ``.values``.
+           We recommend using :attr:`Index.array` or
+           :meth:`Index.to_numpy`, depending on whether you need
+           a reference to the underlying data or a NumPy array.
 
         Returns
         -------

diff --git a/pandas/core/series.py b/pandas/core/series.py
@@ -410,8 +410,13 @@ def ftypes(self):
     @property
     def values(self):
         """
-        Return Series as ndarray or ndarray-like
-        depending on the dtype
+        Return Series as ndarray or ndarray-like depending on the dtype.
+
+        .. warning::
+
+           We recommend using :attr:`Series.array` or
+           :meth:`Series.to_numpy`, depending on whether you need
+           a reference to the underlying data or a NumPy array.
 
         Returns
         -------