-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Description
The Polars user guide book is awesome. Great job making Polars approachable to newbies like me!
I reproduced a lot of the examples in the polars-fun repo so people have easy access to Jupyter notebooks where they can run these computations too.
Here are some of the issues / discussion points on the user guide:
Getting Started
This code snippet has a small typo (change scan_scv
to scan_csv
). The snippet doesn't work on my computer when the typo is fixed.
df = (pl.scan_scv("https://j.mp/iriscsv")
.filter(pl.col("sepal_length") > 5)
.groupby("species")
.agg(pl.all().sum())
.collect()
)
The error is "RuntimeError: Any(Io(Os { code: 2, kind: NotFound, message: "No such file or directory" }))" which I don't understand.
Expressions
Here's a code snippet in the guide:
df = df[
[
pl.col("names").n_unique().alias("unique_names_1"),
pl.col("names").unique().count().alias("unique_names_2"),
]
]
print(df)
My personal preferences are to avoid bracket notation and avoid reassigning df
. Perhaps we could use this snippet:
res_df = df.select(
[
pl.col("names").n_unique().alias("unique_names_1"),
pl.col("names").unique().count().alias("unique_names_2"),
]
)
print(res_df)
Feel free to push back on this one.
Indexing
I don't really understand this section. I'd prefer to rewrite this section with queries that are run on an example DataFrame, similar to the Polars expressions section.
The Polars design decision to not have an index seems like one of the main differentiators from Pandas. I'd like to expand upon "They are not needed. Not having them makes things easier. Convince me otherwise" cause it seems so important.
Here are some specific questions I'd find interesting:
- setting a Pandas index makes certain queries faster. Can Polars perform just as fast as Pandas, even when the index isn't set?
- Pandas indexes make things more complicated. How can we demonstrate this with code snippets?
Other points to hit
I have other points to hit, but will save them for another PR. Let me know if you find this helpful. I'm really excited about this project and want to help make it as accessible and compelling as possible for new users to try!