Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOCS-#3904: Improving Modin README #3929

Merged
merged 24 commits into from
Jan 25, 2022
Merged
Changes from 1 commit
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Apply suggestions from code review 6
Co-authored-by: Doris Lee <dorisjunglinlee@gmail.com>
  • Loading branch information
naren-ponder and dorisjlee committed Jan 24, 2022
commit 85fac028a0ea1ab4254deca76774877fdb93d244
7 changes: 4 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -127,7 +127,8 @@ For the complete documentation on Modin, visit our [ReadTheDocs](https://modin.r

_Note: In local mode (without a cluster), Modin will create and manage a local (Dask or Ray) cluster for the execution._

To use Modin, you do not need to know how many cores your system has and you do not need
To use Modin, you do not need to specify how to distribute the data, or even know how many
cores your system has. In fact, you can continue using your previous
to specify how to distribute the data. In fact, you can continue using your previous
naren-ponder marked this conversation as resolved.
Show resolved Hide resolved
pandas notebooks while experiencing a considerable speedup from Modin, even on a single
machine. Once you've changed your import statement, you're ready to use Modin just like
Expand Down Expand Up @@ -156,7 +157,7 @@ df = pd.read_csv("my_dataset.csv")

#### Modin is a DataFrame designed for datasets from 1MB to 1TB+

Often data scientists have to use different tools
Often data scientists have to switch between different tools
for operating on datasets of different sizes. Processing large dataframes with pandas
is slow, and pandas does not support working with dataframes that are too large to fit
into the available memory. As a result, pandas workflows that work well
Expand All @@ -174,7 +175,7 @@ scalability in a cluster.
We designed Modin to be modular so we can plug in different components as they develop
and improve:

![Architecture](docs/img/modin_architecture.png)
<img src="docs/img/modin_architecture.png" alt="Modin's architecture" width="75%"></img>

Visit the [Documentation](https://modin.readthedocs.io/en/latest/development/architecture.html) for
more information!
Expand Down