Skip to content
This repository has been archived by the owner on Nov 16, 2023. It is now read-only.

Commit

Permalink
Update the transform output formats documentation. (#395)
Browse files Browse the repository at this point in the history
* Update the transform output formats documentation.

* Add whitespace change to restart CI run. The mac build did not start correctly.

* Add whitespace change to restart CI run. The mac build did not start correctly.

Co-authored-by: Gani Nazirov <ganinz@hotmail.com>
  • Loading branch information
pieths and ganik committed Jan 2, 2020
1 parent 9387651 commit 5f1a6f9
Showing 1 changed file with 13 additions and 7 deletions.
20 changes: 13 additions & 7 deletions src/python/docs/sphinx/concepts/datasources.rst
Original file line number Diff line number Diff line change
Expand Up @@ -120,15 +120,21 @@ Example:
Output Data Types of Transforms
-------------------------------

The return type of all of the transforms is a ``pandas.DataFrame``, when they
are used inside a `sklearn.pipeline.Pipeline
<https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html>`_
or when they are used individually.

However, when used inside a :py:class:`nimbusml.Pipeline`, the outputs are often stored in
When used inside a `sklearn.pipeline.Pipeline
<https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html>`_,
the return type of all of the transforms is a ``pandas.DataFrame``.

When used individually or inside a :py:class:`nimbusml.Pipeline`
that contains only transforms, the default output is a ``pandas.DataFrame``. To instead output an
`IDataView <https://github.com/dotnet/machinelearning/blob/master/docs/code/IDataViewImplementation.md>`_,
pass ``as_binary_data_stream=True`` to either ``transform()`` or ``fit_transform()``.
To output a sparse CSR matrix, pass ``as_csr=True``.
See :py:class:`nimbusml.Pipeline` for more information.

Note, when used inside a :py:class:`nimbusml.Pipeline`, the outputs are often stored in
a more optimized :ref:`VectorDataViewType`, which minimizes data conversion to
dataframes. When several transforms are combined inside an :py:class:`nimbusml.Pipeline`,
the intermediate transforms will store the data in the optimized format and only
the last transform will return a ``pandas.DataFrame``.
the last transform will return a ``pandas.DataFrame`` (or IDataView/CSR; see above).


0 comments on commit 5f1a6f9

Please sign in to comment.