+ Documentation: + Python + - + Rust + - + Node.js + - + R + | + StackOverflow: + Python + - + Rust + - + Node.js + - + R + | + User guide + | + Discord +
-Polars is in rapid development, but it already supports most features needed for a useful DataFrame library. Do you -miss something, please make an issue and/or sent a PR. +## Polars: Blazingly fast DataFrames in Rust, Python, Node.js, R, and SQL -## First run -Take a look at the [10 minutes to Polars notebook](examples/10_minutes_to_polars.ipynb) to get you started. -Want to run the notebook yourself? Clone the repo and run `$ cargo c && docker-compose up`. This will spin up a jupyter -notebook on `http://localhost:8891`. The notebooks are in the `/examples` directory. - -Oh yeah.. and get a cup of coffee because compilation will take while during the first run. +Polars is a DataFrame interface on top of an OLAP Query Engine implemented in Rust using +[Apache Arrow Columnar Format](https://arrow.apache.org/docs/format/Columnar.html) as the memory +model. +- Lazy | eager execution +- Multi-threaded +- SIMD +- Query optimization +- Powerful expression API +- Hybrid Streaming (larger-than-RAM datasets) +- Rust | Python | NodeJS | R | ... -## Documentation -Want to know what features Polars support? [Check the current master docs](https://ritchie46.github.io/polars). +To learn more, read the [user guide](https://docs.pola.rs/). -Most features are described on the [DataFrame](https://ritchie46.github.io/polars/polars/frame/struct.DataFrame.html), -[Series](https://ritchie46.github.io/polars/polars/series/enum.Series.html), and [ChunkedArray](https://ritchie46.github.io/polars/polars/chunked_array/struct.ChunkedArray.html) -structs in that order. For `ChunkedArray` a lot of functionality is also defined by `Traits` in the -[ops module](https://ritchie46.github.io/polars/polars/chunked_array/ops/index.html). +## Python -## Performance -Polars is written to be performant. Below are some comparisons with the (also very fast) Pandas DataFrame library. +```python +>>> import polars as pl +>>> df = pl.DataFrame( +... { +... "A": [1, 2, 3, 4, 5], +... "fruits": ["banana", "banana", "apple", "apple", "banana"], +... "B": [5, 4, 3, 2, 1], +... "cars": ["beetle", "audi", "beetle", "beetle", "beetle"], +... } +... ) -#### GroupBy - +# embarrassingly parallel execution & very expressive query language +>>> df.sort("fruits").select( +... "fruits", +... "cars", +... pl.lit("fruits").alias("literal_string_fruits"), +... pl.col("B").filter(pl.col("cars") == "beetle").sum(), +... pl.col("A").filter(pl.col("B") > 2).sum().over("cars").alias("sum_A_by_cars"), +... pl.col("A").sum().over("fruits").alias("sum_A_by_fruits"), +... pl.col("A").reverse().over("fruits").alias("rev_A_by_fruits"), +... pl.col("A").sort_by("B").over("fruits").alias("sort_A_by_B_by_fruits"), +... ) +shape: (5, 8) +┌──────────┬──────────┬──────────────┬─────┬─────────────┬─────────────┬─────────────┬─────────────┐ +│ fruits ┆ cars ┆ literal_stri ┆ B ┆ sum_A_by_ca ┆ sum_A_by_fr ┆ rev_A_by_fr ┆ sort_A_by_B │ +│ --- ┆ --- ┆ ng_fruits ┆ --- ┆ rs ┆ uits ┆ uits ┆ _by_fruits │ +│ str ┆ str ┆ --- ┆ i64 ┆ --- ┆ --- ┆ --- ┆ --- │ +│ ┆ ┆ str ┆ ┆ i64 ┆ i64 ┆ i64 ┆ i64 │ +╞══════════╪══════════╪══════════════╪═════╪═════════════╪═════════════╪═════════════╪═════════════╡ +│ "apple" ┆ "beetle" ┆ "fruits" ┆ 11 ┆ 4 ┆ 7 ┆ 4 ┆ 4 │ +│ "apple" ┆ "beetle" ┆ "fruits" ┆ 11 ┆ 4 ┆ 7 ┆ 3 ┆ 3 │ +│ "banana" ┆ "beetle" ┆ "fruits" ┆ 11 ┆ 4 ┆ 8 ┆ 5 ┆ 5 │ +│ "banana" ┆ "audi" ┆ "fruits" ┆ 11 ┆ 2 ┆ 8 ┆ 2 ┆ 2 │ +│ "banana" ┆ "beetle" ┆ "fruits" ┆ 11 ┆ 4 ┆ 8 ┆ 1 ┆ 1 │ +└──────────┴──────────┴──────────────┴─────┴─────────────┴─────────────┴─────────────┴─────────────┘ +``` -#### Joins - +## SQL -## Functionality +```python +>>> df = pl.scan_csv("docs/assets/data/iris.csv") +>>> ## OPTION 1 +>>> # run SQL queries on frame-level +>>> df.sql(""" +... SELECT species, +... AVG(sepal_length) AS avg_sepal_length +... FROM self +... GROUP BY species +... """).collect() +shape: (3, 2) +┌────────────┬──────────────────┐ +│ species ┆ avg_sepal_length │ +│ --- ┆ --- │ +│ str ┆ f64 │ +╞════════════╪══════════════════╡ +│ Virginica ┆ 6.588 │ +│ Versicolor ┆ 5.936 │ +│ Setosa ┆ 5.006 │ +└────────────┴──────────────────┘ +>>> ## OPTION 2 +>>> # use pl.sql() to operate on the global context +>>> df2 = pl.LazyFrame({ +... "species": ["Setosa", "Versicolor", "Virginica"], +... "blooming_season": ["Spring", "Summer", "Fall"] +...}) +>>> pl.sql(""" +... SELECT df.species, +... AVG(df.sepal_length) AS avg_sepal_length, +... df2.blooming_season +... FROM df +... LEFT JOIN df2 ON df.species = df2.species +... GROUP BY df.species, df2.blooming_season +... """).collect() +``` -### Read and write CSV | JSON | IPC | Parquet +SQL commands can also be run directly from your terminal using the Polars CLI: -```rust - use polars::prelude::*; - use std::fs::File; - - fn example() -> Result