Skip to content

DOCS-#4188: Modify tables in Supported APIs section #4286

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 5 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 4 additions & 3 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ jobs:
python-version: "3.8.x"
architecture: "x64"
- run: pip install black
- run: black --check --diff modin/ asv_bench/benchmarks scripts/doc_checker.py
- run: black --check --diff modin/ asv_bench/benchmarks scripts/doc_checker.py scripts/supported_apis.py

build-docs:
name: build docs
Expand Down Expand Up @@ -54,8 +54,8 @@ jobs:
architecture: "x64"
# The `numpydoc` version here MUST match the versions in the dev requirements files.
- run: pip install pytest pytest-cov pydocstyle numpydoc==1.1.0 xgboost
- run: pytest scripts/test
- run: pip install -e .[all]
- run: pytest scripts/test
Copy link
Contributor Author

@amyskov amyskov Mar 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line is moved since we need pandas in the scripts/test now (pandas is installed during modin installation on the previous step).

- run: |
python scripts/doc_checker.py --add-ignore=D101,D102,D103,D105 --disable-numpydoc \
modin/pandas/dataframe.py modin/pandas/series.py \
Expand Down Expand Up @@ -98,6 +98,7 @@ jobs:
python scripts/doc_checker.py modin/core/execution/dispatching/factories/factories.py \
modin/core/execution/dispatching/factories/dispatcher.py \
- run: python scripts/doc_checker.py scripts/doc_checker.py
- run: python scripts/doc_checker.py scripts/supported_apis.py
- run: |
python scripts/doc_checker.py modin/experimental/pandas/io.py \
modin/experimental/pandas/numpy_wrap.py modin/experimental/pandas/__init__.py
Expand Down Expand Up @@ -132,7 +133,7 @@ jobs:
python-version: "3.8.x"
architecture: "x64"
- run: pip install flake8 flake8-print flake8-no-implicit-concat
- run: flake8 modin/ asv_bench/benchmarks scripts/doc_checker.py
- run: flake8 modin/ asv_bench/benchmarks scripts/doc_checker.py scripts/supported_apis.py

test-api:
runs-on: ubuntu-latest
Expand Down
31 changes: 31 additions & 0 deletions docs/_templates/autosummary/class.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
{% extends "!autosummary/class.rst" %}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have a template for general functions, IO, etc.?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can leave default templates for functions since we list functions manually (we need to rewrite class template to insert :toctree: directive to present all class methods in the side bar, not only class).


{% block methods %}
{% if methods %}
.. rubric:: {{ _('Methods') }}

.. autosummary::
:toctree:
{% for item in all_methods %}
{%- if not item.startswith('_') %}
{{ name }}.{{ item }}
{%- endif -%}
{%- endfor %}

{% endif %}
{% endblock %}

{% block attributes %}
{% if attributes %}
.. rubric:: {{ _('Attributes') }}

.. autosummary::
:toctree:
{% for item in all_attributes %}
{%- if not item.startswith('_') %}
{{ name }}.{{ item }}
{%- endif -%}
{%- endfor %}

{% endif %}
{% endblock %}
6 changes: 6 additions & 0 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,12 +42,14 @@ def noop_decorator(*args, **kwargs):
import modin

from modin.config.__main__ import export_config_help
from scripts.supported_apis import generate_supported_apis_csvs

configs_file_path = os.path.abspath(
os.path.join(os.path.dirname(__file__), "flow/modin/configs_help.csv")
)
# Export configs help to create configs table in the docs/flow/modin/config.rst
export_config_help(configs_file_path)
generate_supported_apis_csvs()

project = "Modin"
copyright = "2018-2022, Modin Developers."
Expand Down Expand Up @@ -76,6 +78,7 @@ def noop_decorator(*args, **kwargs):
"sphinx.ext.mathjax",
"sphinx.ext.githubpages",
"sphinx.ext.graphviz",
"sphinx.ext.autosummary",
"sphinxcontrib.plantuml",
"sphinx_issues",
]
Expand All @@ -84,6 +87,9 @@ def noop_decorator(*args, **kwargs):
# Add any paths that contain templates here, relative to this directory.
templates_path = ["_templates"]

# Turn on sphinx.ext.autosummary
autosummary_generate = True

# The suffix(es) of source filenames.
# You can specify multiple suffix as a list of string:
#
Expand Down
4 changes: 4 additions & 0 deletions docs/development/architecture.rst
Original file line number Diff line number Diff line change
Expand Up @@ -273,9 +273,13 @@ API
The API is the outer-most layer that faces users. The following classes contain Modin's implementation of the pandas API:

.. toctree::
:maxdepth: 1

/flow/modin/pandas/base
/flow/modin/pandas/dataframe
/flow/modin/pandas/series
/flow/modin/pandas/io
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about such modules as groupby, resample and others applicable to be put here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, probaly we can put these modules as pandas does, but i think it should be done in the separate PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, let's do this as part of a separate issue in order to not bloat this PR.

/flow/modin/pandas/general

Module/Class View
-----------------
Expand Down
1 change: 0 additions & 1 deletion docs/flow/modin/pandas/base.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,3 @@ Public API

.. autoclass:: modin.pandas.base.BasePandasDataset
:noindex:
:members:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this removed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is done to avoid excessive duplication since all BasePandasDataset methods already present in the DataFrame and Series sections.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. We should probably add a note in the docstring of BasePandasDataset about the user shouldn't directly interact with this object. Also, we should correct the methods' docstrings of this object. For example, we say Return a BasePandasDataset with absolute numeric value of each element. for abs whereas the return object is either a DataFrame or Series. This looks like a separate issue. Could you create one for this please?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, sure, note was added and issue #4512 created.

9 changes: 7 additions & 2 deletions docs/flow/modin/pandas/dataframe.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,12 @@
DataFrame Module Overview
"""""""""""""""""""""""""

.. currentmodule:: modin.pandas

Modin's ``pandas.DataFrame`` API
''''''''''''''''''''''''''''''''

Modin's ``pandas.DataFrame`` API is backed by a distributed object providing an identical
Modin's ``pandas.DataFrame`` API is backed by a distributed object(s) providing an identical
API to pandas. After the user calls some ``DataFrame`` function, this call is internally
rewritten into a representation that can be processed in parallel by the partitions. These
results can be e.g., reduced to single output, identical to the single threaded
Expand All @@ -19,7 +21,10 @@ pandas ``DataFrame`` method output.
Public API
----------

.. autoclass:: modin.pandas.dataframe.DataFrame
.. autosummary::
:toctree: api/

DataFrame

Usage Guide
'''''''''''
Expand Down
33 changes: 33 additions & 0 deletions docs/flow/modin/pandas/general.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
General Functions
~~~~~~~~~~~~~~~~~

.. currentmodule:: modin.pandas

Modin's general functions API is backed by a distributed object(s) providing an identical
API to pandas. After the user calls some general function, this call is internally
rewritten into a representation that can be processed in parallel by the partitions. These
results can be e.g., reduced to single output, identical to the single threaded
pandas method output.

.. autosummary::
:toctree: api/

concat
crosstab
get_dummies
isna
isnull
lreshape
melt
merge
merge_asof
merge_ordered
notna
notnull
pivot
pivot_table
to_datetime
to_numeric
unique
value_counts
wide_to_long
36 changes: 36 additions & 0 deletions docs/flow/modin/pandas/io.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
Input/Output
~~~~~~~~~~~~

.. currentmodule:: modin.pandas

Modin's I/O functions API is backed by a distributed object(s) providing an identical
API to pandas. After the user calls some I/O function, this call is internally
rewritten into a representation that can be processed in parallel by the partitions.
Once I/O function call is finished, each partition will contain chunk of data, and then
these partitions can be processed in parallel using Modin API.

.. autosummary::
:toctree: api/

json_normalize
read_clipboard
read_csv
read_excel
read_feather
read_fwf
read_gbq
read_hdf
read_html
read_json
read_orc
read_parquet
read_pickle
read_sas
read_spss
read_sql
read_sql_query
read_sql_table
read_stata
read_table
read_xml
to_pickle
9 changes: 7 additions & 2 deletions docs/flow/modin/pandas/series.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,12 @@
Series Module Overview
""""""""""""""""""""""

.. currentmodule:: modin.pandas

Modin's ``pandas.Series`` API
'''''''''''''''''''''''''''''

Modin's ``pandas.Series`` API is backed by a distributed object providing an identical
Modin's ``pandas.Series`` API is backed by a distributed object(s) providing an identical
API to pandas. After the user calls some ``Series`` function, this call is internally rewritten
into a representation that can be processed in parallel by the partitions. These
results can be e.g., reduced to single output, identical to the single threaded
Expand All @@ -19,7 +21,10 @@ pandas ``Series`` method output.
Public API
----------

.. autoclass:: modin.pandas.series.Series
.. autosummary::
:toctree: api/

Series

Usage Guide
'''''''''''
Expand Down
1 change: 1 addition & 0 deletions docs/requirements-doc.txt
Original file line number Diff line number Diff line change
Expand Up @@ -13,3 +13,4 @@ git+https://github.com/modin-project/modin.git@master#egg=modin[all]
sphinxcontrib_plantuml
sphinx-issues
xgboost
numpydoc==1.1.0
Loading