Skip to content

Commit

Permalink
Revert "docs: Improve structure of user guide" (pola-rs#13945)
Browse files Browse the repository at this point in the history
  • Loading branch information
c-peters authored Jan 24, 2024
1 parent d6c6cef commit 25537c2
Show file tree
Hide file tree
Showing 20 changed files with 360 additions and 354 deletions.
2 changes: 1 addition & 1 deletion docs/_build/overrides/404.html
Original file line number Diff line number Diff line change
Expand Up @@ -217,6 +217,6 @@ <h2>404 - You're lost.</h2>
How you got here is a mystery. But you can click the button below
to go back to the homepage or use the search bar in the navigation menu to find what you are looking for.
</p>
<a class="md-button" href="/user-guide/overview/">Home</a>
<a class="md-button" href="/polars">Home</a>
</div>
{% endblock %}
2 changes: 1 addition & 1 deletion docs/api/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ It's the best place to look if you need information on a specific function.
## Python

The Python API reference is built using Sphinx.
It's available in [our docs](https://docs.pola.rs/py-polars/html/reference/index.html).
It's available on [GitHub Pages](https://docs.pola.rs/py-polars/html/reference/index.html).

## Rust

Expand Down
58 changes: 58 additions & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
---
hide:
- navigation
---

# Polars

![logo](https://raw.githubusercontent.com/pola-rs/polars-static/master/logos/polars_github_logo_rect_dark_name.svg)

<h1 style="text-align:center">Blazingly Fast DataFrame Library </h1>
<div align="center">
<a href="https://docs.rs/polars/latest/polars/">
<img src="https://docs.rs/polars/badge.svg" alt="rust docs"/>
</a>
<a href="https://crates.io/crates/polars">
<img src="https://img.shields.io/crates/v/polars.svg"/>
</a>
<a href="https://pypi.org/project/polars/">
<img src="https://img.shields.io/pypi/v/polars.svg" alt="PyPI Latest Release"/>
</a>
<a href="https://doi.org/10.5281/zenodo.7697217">
<img src="https://zenodo.org/badge/DOI/10.5281/zenodo.7697217.svg" alt="DOI Latest Release"/>
</a>
</div>

Polars is a highly performant DataFrame library for manipulating structured data. The core is written in Rust, but the library is also available in Python. Its key features are:

- **Fast**: Polars is written from the ground up, designed close to the machine and without external dependencies.
- **I/O**: First class support for all common data storage layers: local, cloud storage & databases.
- **Easy to use**: Write your queries the way they were intended. Polars, internally, will determine the most efficient way to execute using its query optimizer.
- **Out of Core**: Polars supports out of core data transformation with its streaming API. Allowing you to process your results without requiring all your data to be in memory at the same time
- **Parallel**: Polars fully utilises the power of your machine by dividing the workload among the available CPU cores without any additional configuration.
- **Vectorized Query Engine**: Polars uses [Apache Arrow](https://arrow.apache.org/), a columnar data format, to process your queries in a vectorized manner. It uses [SIMD](https://en.wikipedia.org/wiki/Single_instruction,_multiple_data) to optimize CPU usage.

## Performance :rocket: :rocket:

Polars is very fast, and in fact is one of the best performing solutions available.
See the results in h2oai's [db-benchmark](https://duckdblabs.github.io/db-benchmark/), revived by the DuckDB project.

Polars [TPC-H Benchmark results](https://www.pola.rs/benchmarks.html) are now available on the official website.

## Example

{{code_block('home/example','example',['scan_csv','filter','group_by','collect'])}}

## Community

Polars has a very active community with frequent releases (approximately weekly). Below are some of the top contributors to the project:

--8<-- "docs/people.md"

## Contributing

We appreciate all contributions, from reporting bugs to implementing new features. Read our [contributing guide](development/contributing/index.md) to learn more.

## License

This project is licensed under the terms of the [MIT license](https://github.com/pola-rs/polars/blob/main/LICENSE).
23 changes: 13 additions & 10 deletions docs/src/python/user-guide/basics/expressions.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,16 +6,19 @@

df = pl.DataFrame(
{
"a": range(5),
"b": np.random.rand(5),
"a": range(8),
"b": np.random.rand(8),
"c": [
datetime(2025, 12, 1),
datetime(2025, 12, 2),
datetime(2025, 12, 3),
datetime(2025, 12, 4),
datetime(2025, 12, 5),
datetime(2022, 12, 1),
datetime(2022, 12, 2),
datetime(2022, 12, 3),
datetime(2022, 12, 4),
datetime(2022, 12, 5),
datetime(2022, 12, 6),
datetime(2022, 12, 7),
datetime(2022, 12, 8),
],
"d": [1, 2.0, float("nan"), -42, None],
"d": [1, 2.0, float("nan"), float("nan"), 0, -5, -42, None],
}
)
# --8<-- [end:setup]
Expand All @@ -33,12 +36,12 @@
# --8<-- [end:select3]

# --8<-- [start:exclude]
df.select(pl.exclude(["a", "c"]))
df.select(pl.exclude("a"))
# --8<-- [end:exclude]

# --8<-- [start:filter]
df.filter(
pl.col("c").is_between(datetime(2025, 12, 2), datetime(2025, 12, 3)),
pl.col("c").is_between(datetime(2022, 12, 2), datetime(2022, 12, 8)),
)
# --8<-- [end:filter]

Expand Down
7 changes: 3 additions & 4 deletions docs/src/python/user-guide/basics/reading-writing.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,11 @@
{
"integer": [1, 2, 3],
"date": [
datetime(2025, 1, 1),
datetime(2025, 1, 2),
datetime(2025, 1, 3),
datetime(2022, 1, 1),
datetime(2022, 1, 2),
datetime(2022, 1, 3),
],
"float": [4.0, 5.0, 6.0],
"string": ["a", "b", "c"],
}
)

Expand Down
25 changes: 14 additions & 11 deletions docs/src/rust/user-guide/basics/expressions.rs
Original file line number Diff line number Diff line change
Expand Up @@ -6,16 +6,19 @@ fn main() -> Result<(), Box<dyn std::error::Error>> {
let mut rng = rand::thread_rng();

let df: DataFrame = df!(
"a" => 0..5,
"b"=> (0..5).map(|_| rng.gen::<f64>()).collect::<Vec<f64>>(),
"a" => 0..8,
"b"=> (0..8).map(|_| rng.gen::<f64>()).collect::<Vec<f64>>(),
"c"=> [
NaiveDate::from_ymd_opt(2025, 12, 1).unwrap().and_hms_opt(0, 0, 0).unwrap(),
NaiveDate::from_ymd_opt(2025, 12, 2).unwrap().and_hms_opt(0, 0, 0).unwrap(),
NaiveDate::from_ymd_opt(2025, 12, 3).unwrap().and_hms_opt(0, 0, 0).unwrap(),
NaiveDate::from_ymd_opt(2025, 12, 4).unwrap().and_hms_opt(0, 0, 0).unwrap(),
NaiveDate::from_ymd_opt(2025, 12, 5).unwrap().and_hms_opt(0, 0, 0).unwrap(),
NaiveDate::from_ymd_opt(2022, 12, 1).unwrap().and_hms_opt(0, 0, 0).unwrap(),
NaiveDate::from_ymd_opt(2022, 12, 2).unwrap().and_hms_opt(0, 0, 0).unwrap(),
NaiveDate::from_ymd_opt(2022, 12, 3).unwrap().and_hms_opt(0, 0, 0).unwrap(),
NaiveDate::from_ymd_opt(2022, 12, 4).unwrap().and_hms_opt(0, 0, 0).unwrap(),
NaiveDate::from_ymd_opt(2022, 12, 5).unwrap().and_hms_opt(0, 0, 0).unwrap(),
NaiveDate::from_ymd_opt(2022, 12, 6).unwrap().and_hms_opt(0, 0, 0).unwrap(),
NaiveDate::from_ymd_opt(2022, 12, 7).unwrap().and_hms_opt(0, 0, 0).unwrap(),
NaiveDate::from_ymd_opt(2022, 12, 8).unwrap().and_hms_opt(0, 0, 0).unwrap(),
],
"d"=> [Some(1.0), Some(2.0), None, Some(-42.), None]
"d"=> [Some(1.0), Some(2.0), None, None, Some(0.0), Some(-5.0), Some(-42.), None]
)
.unwrap();

Expand Down Expand Up @@ -43,17 +46,17 @@ fn main() -> Result<(), Box<dyn std::error::Error>> {
let out = df
.clone()
.lazy()
.select([col("*").exclude(["a", "c"])])
.select([col("*").exclude(["a"])])
.collect()?;
println!("{}", out);
// --8<-- [end:exclude]

// --8<-- [start:filter]
let start_date = NaiveDate::from_ymd_opt(2025, 12, 2)
let start_date = NaiveDate::from_ymd_opt(2022, 12, 2)
.unwrap()
.and_hms_opt(0, 0, 0)
.unwrap();
let end_date = NaiveDate::from_ymd_opt(2025, 12, 3)
let end_date = NaiveDate::from_ymd_opt(2022, 12, 8)
.unwrap()
.and_hms_opt(0, 0, 0)
.unwrap();
Expand Down
6 changes: 3 additions & 3 deletions docs/src/rust/user-guide/basics/reading-writing.rs
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,9 @@ fn main() -> Result<(), Box<dyn std::error::Error>> {
let mut df: DataFrame = df!(
"integer" => &[1, 2, 3],
"date" => &[
NaiveDate::from_ymd_opt(2025, 1, 1).unwrap().and_hms_opt(0, 0, 0).unwrap(),
NaiveDate::from_ymd_opt(2025, 1, 2).unwrap().and_hms_opt(0, 0, 0).unwrap(),
NaiveDate::from_ymd_opt(2025, 1, 3).unwrap().and_hms_opt(0, 0, 0).unwrap(),
NaiveDate::from_ymd_opt(2022, 1, 1).unwrap().and_hms_opt(0, 0, 0).unwrap(),
NaiveDate::from_ymd_opt(2022, 1, 2).unwrap().and_hms_opt(0, 0, 0).unwrap(),
NaiveDate::from_ymd_opt(2022, 1, 3).unwrap().and_hms_opt(0, 0, 0).unwrap(),
],
"float" => &[4.0, 5.0, 6.0]
)
Expand Down
130 changes: 130 additions & 0 deletions docs/user-guide/basics/expressions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,130 @@
# Expressions

`Expressions` are the core strength of Polars. The `expressions` offer a versatile structure that both solves easy queries and is easily extended to complex ones. Below we will cover the basic components that serve as building block (or in Polars terminology contexts) for all your queries:

- `select`
- `filter`
- `with_columns`
- `group_by`

To learn more about expressions and the context in which they operate, see the User Guide sections: [Contexts](../concepts/contexts.md) and [Expressions](../concepts/expressions.md).

### Select statement

To select a column we need to do two things. Define the `DataFrame` we want the data from. And second, select the data that we need. In the example below you see that we select `col('*')`. The asterisk stands for all columns.

{{code_block('user-guide/basics/expressions','select',['select'])}}

```python exec="on" result="text" session="getting-started/expressions"
--8<-- "python/user-guide/basics/expressions.py:setup"
print(
--8<-- "python/user-guide/basics/expressions.py:select"
)
```

You can also specify the specific columns that you want to return. There are two ways to do this. The first option is to pass the column names, as seen below.

{{code_block('user-guide/basics/expressions','select2',['select'])}}

```python exec="on" result="text" session="getting-started/expressions"
print(
--8<-- "python/user-guide/basics/expressions.py:select2"
)
```

The second option is to specify each column using `pl.col`. This option is shown below.

{{code_block('user-guide/basics/expressions','select3',['select'])}}

```python exec="on" result="text" session="getting-started/expressions"
print(
--8<-- "python/user-guide/basics/expressions.py:select3"
)
```

If you want to exclude an entire column from your view, you can simply use `exclude` in your `select` statement.

{{code_block('user-guide/basics/expressions','exclude',['select'])}}

```python exec="on" result="text" session="getting-started/expressions"
print(
--8<-- "python/user-guide/basics/expressions.py:exclude"
)
```

### Filter

The `filter` option allows us to create a subset of the `DataFrame`. We use the same `DataFrame` as earlier and we filter between two specified dates.

{{code_block('user-guide/basics/expressions','filter',['filter'])}}

```python exec="on" result="text" session="getting-started/expressions"
print(
--8<-- "python/user-guide/basics/expressions.py:filter"
)
```

With `filter` you can also create more complex filters that include multiple columns.

{{code_block('user-guide/basics/expressions','filter2',['filter'])}}

```python exec="on" result="text" session="getting-started/expressions"
print(
--8<-- "python/user-guide/basics/expressions.py:filter2"
)
```

### With_columns

`with_columns` allows you to create new columns for your analyses. We create two new columns `e` and `b+42`. First we sum all values from column `b` and store the results in column `e`. After that we add `42` to the values of `b`. Creating a new column `b+42` to store these results.

{{code_block('user-guide/basics/expressions','with_columns',['with_columns'])}}

```python exec="on" result="text" session="getting-started/expressions"
print(
--8<-- "python/user-guide/basics/expressions.py:with_columns"
)
```

### Group by

We will create a new `DataFrame` for the Group by functionality. This new `DataFrame` will include several 'groups' that we want to group by.

{{code_block('user-guide/basics/expressions','dataframe2',['DataFrame'])}}

```python exec="on" result="text" session="getting-started/expressions"
--8<-- "python/user-guide/basics/expressions.py:dataframe2"
print(df2)
```

{{code_block('user-guide/basics/expressions','group_by',['group_by'])}}

```python exec="on" result="text" session="getting-started/expressions"
print(
--8<-- "python/user-guide/basics/expressions.py:group_by"
)
```

{{code_block('user-guide/basics/expressions','group_by2',['group_by'])}}

```python exec="on" result="text" session="getting-started/expressions"
print(
--8<-- "python/user-guide/basics/expressions.py:group_by2"
)
```

### Combining operations

Below are some examples on how to combine operations to create the `DataFrame` you require.

{{code_block('user-guide/basics/expressions','combine',['select','with_columns'])}}

```python exec="on" result="text" session="getting-started/expressions"
--8<-- "python/user-guide/basics/expressions.py:combine"
```

{{code_block('user-guide/basics/expressions','combine2',['select','with_columns'])}}

```python exec="on" result="text" session="getting-started/expressions"
--8<-- "python/user-guide/basics/expressions.py:combine2"
```
18 changes: 18 additions & 0 deletions docs/user-guide/basics/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# Introduction

This chapter is intended for new Polars users.
The goal is to provide a quick overview of the most common functionality.
Feel free to skip ahead to the [next chapter](../concepts/data-types/overview.md) to dive into the details.

!!! rust "Rust Users Only"

Due to historical reasons, the eager API in Rust is outdated. In the future, we would like to redesign it as a small wrapper around the lazy API (as is the design in Python / NodeJS). In the examples, we will use the lazy API instead with `.lazy()` and `.collect()`. For now you can ignore these two functions. If you want to know more about the lazy and eager API, go [here](../concepts/lazy-vs-eager.md).

To enable the Lazy API ensure you have the feature flag `lazy` configured when installing Polars
```
# Cargo.toml
[dependencies]
polars = { version = "x", features = ["lazy", ...]}
```

Because of the ownership ruling in Rust, we can not reuse the same `DataFrame` multiple times in the examples. For simplicity reasons we call `clone()` to overcome this issue. Note that this does not duplicate the data but just increments a pointer (`Arc`).
26 changes: 26 additions & 0 deletions docs/user-guide/basics/joins.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# Combining DataFrames

There are two ways `DataFrame`s can be combined depending on the use case: join and concat.

## Join

Polars supports all types of join (e.g. left, right, inner, outer). Let's have a closer look on how to `join` two `DataFrames` into a single `DataFrame`. Our two `DataFrames` both have an 'id'-like column: `a` and `x`. We can use those columns to `join` the `DataFrames` in this example.

{{code_block('user-guide/basics/joins','join',['join'])}}

```python exec="on" result="text" session="getting-started/joins"
--8<-- "python/user-guide/basics/joins.py:setup"
--8<-- "python/user-guide/basics/joins.py:join"
```

To see more examples with other types of joins, go the [User Guide](../transformations/joins.md).

## Concat

We can also `concatenate` two `DataFrames`. Vertical concatenation will make the `DataFrame` longer. Horizontal concatenation will make the `DataFrame` wider. Below you can see the result of an horizontal concatenation of our two `DataFrames`.

{{code_block('user-guide/basics/joins','hstack',['hstack'])}}

```python exec="on" result="text" session="getting-started/joins"
--8<-- "python/user-guide/basics/joins.py:hstack"
```
Loading

0 comments on commit 25537c2

Please sign in to comment.