Skip to content

Suggested improvements for the Polars user guide book #2243

@MrPowers

Description

@MrPowers

The Polars user guide book is awesome. Great job making Polars approachable to newbies like me!

I reproduced a lot of the examples in the polars-fun repo so people have easy access to Jupyter notebooks where they can run these computations too.

Here are some of the issues / discussion points on the user guide:

Getting Started

This code snippet has a small typo (change scan_scv to scan_csv). The snippet doesn't work on my computer when the typo is fixed.

df = (pl.scan_scv("https://j.mp/iriscsv")
      .filter(pl.col("sepal_length") > 5)
      .groupby("species")
      .agg(pl.all().sum())
      .collect()
)

The error is "RuntimeError: Any(Io(Os { code: 2, kind: NotFound, message: "No such file or directory" }))" which I don't understand.

Expressions

Here's a code snippet in the guide:

df = df[
    [
        pl.col("names").n_unique().alias("unique_names_1"),
        pl.col("names").unique().count().alias("unique_names_2"),
    ]
]
print(df)

My personal preferences are to avoid bracket notation and avoid reassigning df. Perhaps we could use this snippet:

res_df = df.select(
    [
        pl.col("names").n_unique().alias("unique_names_1"),
        pl.col("names").unique().count().alias("unique_names_2"),
    ]
)
print(res_df)

Feel free to push back on this one.

Indexing

I don't really understand this section. I'd prefer to rewrite this section with queries that are run on an example DataFrame, similar to the Polars expressions section.

The Polars design decision to not have an index seems like one of the main differentiators from Pandas. I'd like to expand upon "They are not needed. Not having them makes things easier. Convince me otherwise" cause it seems so important.

Here are some specific questions I'd find interesting:

  • setting a Pandas index makes certain queries faster. Can Polars perform just as fast as Pandas, even when the index isn't set?
  • Pandas indexes make things more complicated. How can we demonstrate this with code snippets?

Other points to hit

I have other points to hit, but will save them for another PR. Let me know if you find this helpful. I'm really excited about this project and want to help make it as accessible and compelling as possible for new users to try!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions