-
Notifications
You must be signed in to change notification settings - Fork 666
DOCS-#4188: Modify tables in Supported APIs section #4286
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Codecov Report
@@ Coverage Diff @@
## master #4286 +/- ##
==========================================
+ Coverage 86.36% 88.31% +1.94%
==========================================
Files 228 229 +1
Lines 18417 18693 +276
==========================================
+ Hits 15906 16508 +602
+ Misses 2511 2185 -326
📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more |
14e039f
to
4be76e6
Compare
Hi @modin-project/modin-core and all interested reviewers, please see the initial concept of supported apis dynamic table creation (docs build). The concept proposals are next:
Parameter properties can be described by flags (flags description can be found here)
|
Hi @modin-project/modin-core ! What do you think about #4286 (comment)? It would be nice to have a decision on the second clause first since it blocks other PR actions. |
Note: #4286 (comment) was updated in accordance with the further plans. |
For functions that are completely default to pandas - how is this supposed to be marked? |
In cases when functions default to pandas we are going to mark these functions with |
Hi @modin-project/modin-core , please take a look at #4286 (comment): the first clause was updated according to the last offline discussion - the first efforts on docstrings filing with supported API data can be started as soon as we approve table format. |
Hi @modin-project/modin-core what do you think about the table format (the first proposal)? |
+----------+-----------------------------------------------------------------------------------------------+ | ||
| Flag | Meaning | | ||
+==========+===============================================================================================+ | ||
| Harmful | Usage of this parameter can be harmfull for performance of your application | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the difference between Harmful and Partial? Also, I am wondering if there is a better and clearer name instead of Harmful?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Harmful flag is intended to parameters which are completely unsupported (all combinations/parameter types), while Partial intended to the cases when some parameters combinations/types can be supported. Also we can use Partial flag in the readthedocs Supported APIs
combined tables for short method/function summary section to mark cases when some parameters are supported (note, this functionality is not implemented yet in this PR).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding "default-to-pandas" flag - what do you think about "PoorPerf"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perf-poor
or Perf-harmful
look good to me. I wonder if there is a good one word explaining this?
- run: pip install -e .[all] | ||
- run: pytest scripts/test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line is moved since we need pandas
in the scripts/test
now (pandas
is installed during modin installation on the previous step).
scripts/test/examples.py
Outdated
+-----------------+-----------------+----------------+----------------+----------------+ | ||
| Parameters | PandasOnRay | PandasOnDask | OmniSci | Notes | | ||
+=================+=================+================+================+================+ | ||
| parameter_name1 | P | H | HE | test multiline | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the designations in the table correct here? In other files , the designations were different
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, test wasn't updated. Fixed.
e753327
to
cb3cb84
Compare
Hi Modin core team!
The first line ( |
89a6e64
to
786bc32
Compare
I wonder what should we do with eg |
@Garra1980, such a functionality should match |
Ah, missed this, thanks! |
] | ||
|
||
# these utilities just imported from pandas in `modin.pandas.__init__` | ||
pandas_utilities = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering can we somehow avoid hardcoding these names since all they are taken from init anyway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can be tricky, i think - not all the objects imported from pandas in the init file are intended to be added in the "general functions" section (RangeIndex
for example - it is imported in the init file but shouldn't to be processed here), so we have to filter this list somehow and it can be hard to make without such hardcoding.
We need to rebase this PR on master, right? |
Recent changes were added, but there is no significant for this PR changes are merged yet. Did you expect something specific to be added? |
No, nothing specific, I was just thinking about keeping PR up-to-date |
7949a06
to
6a26586
Compare
/flow/modin/pandas/base | ||
/flow/modin/pandas/dataframe | ||
/flow/modin/pandas/series | ||
/flow/modin/pandas/io |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about such modules as groupby, resample and others applicable to be put here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, probaly we can put these modules as pandas does, but i think it should be done in the separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, let's do this as part of a separate issue in order to not bloat this PR.
@@ -57,4 +57,5 @@ Public API | |||
'''''''''' | |||
|
|||
.. automodule:: modin.core.execution.dispatching.factories.factories | |||
:noindex: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this added?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ping
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was temporary workaround for some strange bug. When autosummary
extension was added in the conf.py
, ExperimentalPandasOnRayFactory
started to list two io_cls
attributes (link), that leads to docs build fail. Have no idea why this happens, further investigation is needed(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issue will be fixed by #4510.
@@ -8,4 +8,3 @@ Public API | |||
|
|||
.. autoclass:: modin.pandas.base.BasePandasDataset | |||
:noindex: | |||
:members: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is done to avoid excessive duplication since all BasePandasDataset
methods already present in the DataFrame
and Series
sections.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. We should probably add a note in the docstring of BasePandasDataset
about the user shouldn't directly interact with this object. Also, we should correct the methods' docstrings of this object. For example, we say Return a
BasePandasDataset with absolute numeric value of each element.
for abs
whereas the return object is either a DataFrame
or Series
. This looks like a separate issue. Could you create one for this please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, sure, note was added and issue #4512 created.
@alexander3774, please note that build docs job failed. |
6a26586
to
2cc1c1a
Compare
docs/flow/modin/pandas/io.rst
Outdated
|
||
Modin's I/O functions API is backed by a distributed object providing an identical | ||
API to pandas. After the user calls some general function, this call is internally | ||
rewritten into a representation that can be processed in parallel by the partitions. These |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The last sentence probably requires a change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, sentence was rewritten.
@@ -0,0 +1,31 @@ | |||
{% extends "!autosummary/class.rst" %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we have a template for general functions, IO, etc.?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can leave default templates for functions since we list functions manually (we need to rewrite class template to insert :toctree:
directive to present all class methods in the side bar, not only class).
of an operation that is listed as not implemented, feel free to open an issue on the | ||
`GitHub repository`_, or give a thumbs up to already created issues. Contributions are | ||
also welcome! | ||
The ``pd.DataFrame`` supported APIs table lists both implemented and not implemented methods. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As of now we do not list both implemented and not implemented methods. We only say about a degree of support for a parameter/method. Thus, this paragraph should be rephrased considering that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually we can list fully implemeted method in the table (in this case we can leave execution cell empty). Not sure that understood you correctly, what do you mean?
the method in the left column. ``Y`` stands for yes, ``N`` stands for no, ``P`` stands | ||
for partial (meaning some parameters may not be supported yet), and ``D`` stands for | ||
default to pandas. | ||
Supported APIs table is structured as follows: The first column contains the method name, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tables are structured in bit other way (link). Am I correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a table for all methods and a table for a particular method. We should reflect this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this part looks outdated. Fixed.
+-------------+-----------------------------------------------------------------------------------------------+ | ||
| Flag | Meaning | | ||
+=============+===============================================================================================+ | ||
| Supported | Parameter is supported, it's usage brings performance improvement | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this flag applicable to methods, isn't?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea was to use this flag in the Supported APIs table in the cases when method is fully supported, but it will deviate from suported parameters tables rule in the docstrings (where we leave supported parameter cell empty), so this flag was removed.
2788c8a
to
1b81754
Compare
1b81754
to
55b143a
Compare
Signed-off-by: Alexander Myskov <alexander.myskov@intel.com> DOCS-modin-project#4188: refactoring, call script during docs build Signed-off-by: Alexander Myskov <alexander.myskov@intel.com> DOCS-modin-project#4188: remove outdated table, change new table Signed-off-by: Alexander Myskov <alexander.myskov@intel.com> DOCS-modin-project#4188: handle notes Signed-off-by: Alexander Myskov <alexander.myskov@intel.com> DOCS-modin-project#4186: fix CI Signed-off-by: Alexander Myskov <alexander.myskov@intel.com> DOCS-modin-project#4185: add table for IO functions Signed-off-by: Alexander Myskov <alexander.myskov@intel.com> DOCS-modin-project#4185: add table for utility functions Signed-off-by: Alexander Myskov <alexander.myskov@intel.com> DOCS-modin-project#4185: remove class selectivity for get_methods Signed-off-by: Alexander Myskov <alexander.myskov@intel.com> DOCS-modin-project#4185: migrate existing io table Signed-off-by: Alexander Myskov <alexander.myskov@intel.com> DOCS-modin-project#4185: migrate existing utilities table Signed-off-by: Alexander Myskov <alexander.myskov@intel.com> DOCS-modin-project#4185: add docstrings for supported_apis functions Signed-off-by: Alexander Myskov <alexander.myskov@intel.com> DOCS-modin-project#4185: add test for the script Signed-off-by: Alexander Myskov <alexander.myskov@intel.com> DOCS-modin-project#4185: remove pandas dependency Signed-off-by: Alexander Myskov <alexander.myskov@intel.com> DOCS-modin-project#4185: use extended flags Signed-off-by: Alexander Myskov <alexander.myskov@intel.com> DOCS-modin-project#4185: modify table parameters Signed-off-by: Alexander Myskov <alexander.myskov@intel.com> DOCS-modin-project#4188: Apply suggestions from code review Co-authored-by: Yaroslav Igoshev <Poolliver868@mail.ru> Signed-off-by: Alexander Myskov <alexander.myskov@intel.com> DOCS-modin-project#4188: Update docs/supported_apis/dataframe_supported.rst Co-authored-by: Yaroslav Igoshev <Poolliver868@mail.ru> DOCS-modin-project#4188: fix CI Signed-off-by: Alexander Myskov <alexander.myskov@intel.com> DOCS-modin-project#4188: address review comment Signed-off-by: Alexander Myskov <alexander.myskov@intel.com> DOCS-modin-project#4188: add autosummaries Signed-off-by: Alexander Myskov <alexander.myskov@intel.com> DOCS-modin-project#4188: correct API section Signed-off-by: Alexander Myskov <alexander.myskov@intel.com> DOCS-modin-project#4188: add links in the supported apis tables Signed-off-by: Alexander Myskov <alexander.myskov@intel.com> DOCS-modin-project#4188: propose new table format Signed-off-by: alexander3774 <myskova977@gmail.com> DOCS-modin-project#4188: correct df/series base merge Signed-off-by: alexander3774 <myskova977@gmail.com> DOCS-modin-project#4188: refactor supported_apis.py Signed-off-by: alexander3774 <myskova977@gmail.com> DOCS-modin-project#4188: fix CI Signed-off-by: alexander3774 <myskova977@gmail.com> DOCS-modin-project#4188: drop module autosummary template Signed-off-by: alexander3774 <myskova977@gmail.com> FIX-modin-project#4330: fix warnings Signed-off-by: alexander3774 <myskova977@gmail.com> FIX-modin-project#4330: drop dummy tables Signed-off-by: alexander3774 <myskova977@gmail.com> FIX-modin-project#4330: add Supported flag Signed-off-by: alexander3774 <myskova977@gmail.com> FIX-modin-project#4330: use acronyms Signed-off-by: alexander3774 <myskova977@gmail.com> FIX-modin-project#4330: update rst tables Signed-off-by: alexander3774 <myskova977@gmail.com> DOCS-modin-project#4188: use pure pandas flag Signed-off-by: alexander3774 <myskova977@gmail.com> DOCS-modin-project#4188: Update docs/flow/modin/pandas/general.rst Co-authored-by: Yaroslav Igoshev <Poolliver868@mail.ru> DOCS-modin-project#4188: drop unrelated code Signed-off-by: alexander3774 <myskova977@gmail.com>
Signed-off-by: alexander3774 <myskova977@gmail.com>
Signed-off-by: alexander3774 <myskova977@gmail.com>
Co-authored-by: Yaroslav Igoshev <Poolliver868@mail.ru>
Signed-off-by: alexander3774 <myskova977@gmail.com>
55b143a
to
d868393
Compare
What do these changes do?
flake8 modin/ asv_bench/benchmarks scripts/doc_checker.py
black --check modin/ asv_bench/benchmarks scripts/doc_checker.py
git commit -s
docs/development/architecture.rst
is up-to-datedocs build: https://modin--4286.org.readthedocs.build/en/4286/supported_apis/dataframe_supported.html