Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOCS-#3904: Improving Modin README #3929

Merged
merged 24 commits into from
Jan 25, 2022
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
fix broken links in documentation
Signed-off-by: Naren Krishna <naren@ponder.io>
  • Loading branch information
naren-ponder committed Jan 24, 2022
commit 7cbbc015dad149c8bb6a096bc70db22749dcd671
10 changes: 5 additions & 5 deletions docs/getting_started/faq.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ The :py:class:`~modin.pandas.dataframe.DataFrame` is a highly
scalable, parallel DataFrame. Modin transparently distributes the data and computation so
that you can continue using the same pandas API while being able to work with more data faster.
Modin lets you use all the CPU cores on your machine, and because it is lightweight, it
often has less memory overhead than pandas. See this :doc:`page <getting_started/why_modin/pandas>` to
often has less memory overhead than pandas. See this :doc:`page </getting_started/why_modin/pandas>` to
learn more about how Modin is different from pandas.

Why not just improve pandas?
Expand Down Expand Up @@ -54,14 +54,14 @@ with dataframes that don't fit into the available memory. As a result, pandas wo
for prototyping on a few MBs of data do not scale to tens or hundreds of GBs (depending on the size
of your machine). Modin supports operating on data that does not fit in memory, so that you can comfortably
work with hundreds of GBs without worrying about substantial slowdown or memory errors. For more information,
see :doc:`out-of-memory support <getting_started/why_modin/out_of_core>` for Modin.
see :doc:`out-of-memory support </getting_started/why_modin/out_of_core>` for Modin.

How does Modin compare to Dask DataFrame and Koalas?
""""""""""""""""""""""""""""""""""""""""""""""""""""

TLDR: Modin has better coverage of the pandas API, has a flexible backend, better ordering semantics,
and supports both row and column-parallel operations.
Check out this :doc:`page <getting_started/why_modin/modin_vs_dask_vs_koalas>` detailing the differences!
Check out this :doc:`page </getting_started/why_modin/modin_vs_dask_vs_koalas>` detailing the differences!

How does Modin work under the hood?
"""""""""""""""""""""""""""""""""""
Expand All @@ -75,7 +75,7 @@ The Modin Core DataFrame is our efficient DataFrame implementation that utilizes
which allows for distributing tasks and queries. From here, the Modin DataFrame works with engines like
Ray or Dask to execute computation, and then return the results to the user.

For more details, take a look at our system :doc:`architecture <development/architecture>`.
For more details, take a look at our system :doc:`architecture </development/architecture>`.

FAQs: How to use Modin?
-----------------------
Expand Down Expand Up @@ -172,7 +172,7 @@ How can I contribute to Modin?

**Modin is currently under active development. Requests and contributions are welcome!**

If you are interested in contributing please check out the :doc:`Contributing Guide<development/contributing>`
If you are interested in contributing please check out the :doc:`Contributing Guide</development/contributing>`
and then refer to the :doc:`Development Documentation</development/index>`,
where you can find system architecture, internal implementation details, and other useful information.
Also check out the `Github`_ to view open issues and make contributions.
Expand Down
6 changes: 3 additions & 3 deletions docs/getting_started/installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ To install the most recent stable release run the following:

pip install -U modin # -U for upgrade in case you have an older version

Modin can be used with :doc:`Ray</developer/using_pandas_on_ray>`, :doc:`Dask</developer/using_pandas_on_dask>`, or :doc:`OmniSci</developer/using_omnisci>` engines. If you don't have Ray_ or Dask_ installed, you will need to install Modin with one of the targets:
Modin can be used with :doc:`Ray</development/using_pandas_on_ray>`, :doc:`Dask</development/using_pandas_on_dask>`, or :doc:`OmniSci</development/using_omnisci>` engines. If you don't have Ray_ or Dask_ installed, you will need to install Modin with one of the targets:

.. code-block:: bash

Expand Down Expand Up @@ -147,8 +147,8 @@ that these changes have not made it into a release and may not be completely sta
Windows
-------

All Modin engines except :doc:`OmniSci</developer/using_omnisci>` are available both on Windows and Linux as mentioned above.
Default engine on Windows is :doc:`Ray</developer/using_pandas_on_ray>`.
All Modin engines except :doc:`OmniSci</development/using_omnisci>` are available both on Windows and Linux as mentioned above.
Default engine on Windows is :doc:`Ray</development/using_pandas_on_ray>`.
It is also possible to use Windows Subsystem For Linux (WSL_), but this is generally
not recommended due to the limitations and poor performance of Ray on WSL, a roughly
2-3x worse than native Windows.
Expand Down
2 changes: 1 addition & 1 deletion docs/getting_started/troubleshooting.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ Please note, that while Modin covers a large portion of the pandas API, not all

UserWarning: `DataFrame.asfreq` defaulting to pandas implementation.

To understand which functions will lead to this warning, we have compiled a list of :doc:`currently supported methods </supported_apis/index>`. When you see this warning, Modin defaults to pandas by converting the Modin dataframe to pandas to perform the operation. Once the operation is complete in pandas, it is converted back to a Modin dataframe. These operations will have a high overhead due to the communication involved and will take longer than pandas. When this is happening, a warning will be given to the user to inform them that this operation will take longer than usual. You can learn more about this :doc:`here <supported_apis/defaulting_to_pandas>`.
To understand which functions will lead to this warning, we have compiled a list of :doc:`currently supported methods </supported_apis/index>`. When you see this warning, Modin defaults to pandas by converting the Modin dataframe to pandas to perform the operation. Once the operation is complete in pandas, it is converted back to a Modin dataframe. These operations will have a high overhead due to the communication involved and will take longer than pandas. When this is happening, a warning will be given to the user to inform them that this operation will take longer than usual. You can learn more about this :doc:`here </supported_apis/defaulting_to_pandas>`.

If you would like to request a particular method be implemented, feel free to `open an
issue`_. Before you open an issue please make sure that someone else has not already
Expand Down
4 changes: 2 additions & 2 deletions docs/getting_started/using_modin/using_modin_locally.rst
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ cluster for you:

Finally, if you already have an Ray or Dask engine initialized, Modin will
automatically attach to whichever engine is available. If you are interested in using
Modin with OmniSci engine, please refer to :doc:`these instructions </developer/using_omnisci>`. For additional information on other settings you can configure, see
Modin with OmniSci engine, please refer to :doc:`these instructions </development/using_omnisci>`. For additional information on other settings you can configure, see
:doc:`this page </flow/modin/config>` for more details.

Advanced: Configuring the resources Modin uses
Expand Down Expand Up @@ -116,4 +116,4 @@ specify more processors than you have available on your machine; however this wi
improve the performance (and might end up hurting the performance of the system).

.. note::
Make sure to update the ``MODIN_CPUS`` configuration and initialize your preferred engine before you start working with the first operation using Modin! Otherwise, Modin will opt for the default setting.
Make sure to update the ``MODIN_CPUS`` configuration and initialize your preferred engine before you start working with the first operation using Modin! Otherwise, Modin will opt for the default setting.
2 changes: 1 addition & 1 deletion docs/getting_started/why_modin/pandas.rst
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,6 @@ smaller code footprint while still guaranteeing that it covers the entire pandas
Modin has an internal algebra, which is roughly 15 operators, narrowed down from the
original >200 that exist in pandas. The algebra is grounded in both practical and
theoretical work. Learn more in our `VLDB 2020 paper`_. More information about this
algebra can be found in the :doc:`../development/architecture` documentation.
algebra can be found in the :doc:`architecture </development/architecture>` documentation.

.. _VLDB 2020 paper: https://arxiv.org/abs/2001.00888