Revert "docs: Improve structure of user guide" (pola-rs#13945)

r-brink · Jan 24, 2024 · 25537c2 · 25537c2
1 parent d6c6cef
commit 25537c2
Show file tree

Hide file tree

Showing 20 changed files with 360 additions and 354 deletions.
diff --git a/docs/_build/overrides/404.html b/docs/_build/overrides/404.html
@@ -217,6 +217,6 @@ <h2>404 - You're lost.</h2>
       How you got here is a mystery. But you can click the button below
       to go back to the homepage or use the search bar in the navigation menu to find what you are looking for.
    </p>
-   <a class="md-button" href="/user-guide/overview/">Home</a>
+   <a class="md-button" href="/polars">Home</a>
 </div>
 {% endblock %}
diff --git a/docs/api/index.md b/docs/api/index.md
@@ -11,7 +11,7 @@ It's the best place to look if you need information on a specific function.
 ## Python
 
 The Python API reference is built using Sphinx.
-It's available in [our docs](https://docs.pola.rs/py-polars/html/reference/index.html).
+It's available on [GitHub Pages](https://docs.pola.rs/py-polars/html/reference/index.html).
 
 ## Rust
 

diff --git a/docs/index.md b/docs/index.md
@@ -0,0 +1,58 @@
+---
+hide:
+  - navigation
+---
+
+# Polars
+
+![logo](https://raw.githubusercontent.com/pola-rs/polars-static/master/logos/polars_github_logo_rect_dark_name.svg)
+
+<h1 style="text-align:center">Blazingly Fast DataFrame Library </h1>
+<div align="center">
+  <a href="https://docs.rs/polars/latest/polars/">
+    <img src="https://docs.rs/polars/badge.svg" alt="rust docs"/>
+  </a>
+  <a href="https://crates.io/crates/polars">
+    <img src="https://img.shields.io/crates/v/polars.svg"/>
+  </a>
+  <a href="https://pypi.org/project/polars/">
+    <img src="https://img.shields.io/pypi/v/polars.svg" alt="PyPI Latest Release"/>
+  </a>
+  <a href="https://doi.org/10.5281/zenodo.7697217">
+    <img src="https://zenodo.org/badge/DOI/10.5281/zenodo.7697217.svg" alt="DOI Latest Release"/>
+  </a>
+</div>
+
+Polars is a highly performant DataFrame library for manipulating structured data. The core is written in Rust, but the library is also available in Python. Its key features are:
+
+- **Fast**: Polars is written from the ground up, designed close to the machine and without external dependencies.
+- **I/O**: First class support for all common data storage layers: local, cloud storage & databases.
+- **Easy to use**: Write your queries the way they were intended. Polars, internally, will determine the most efficient way to execute using its query optimizer.
+- **Out of Core**: Polars supports out of core data transformation with its streaming API. Allowing you to process your results without requiring all your data to be in memory at the same time
+- **Parallel**: Polars fully utilises the power of your machine by dividing the workload among the available CPU cores without any additional configuration.
+- **Vectorized Query Engine**: Polars uses [Apache Arrow](https://arrow.apache.org/), a columnar data format, to process your queries in a vectorized manner. It uses [SIMD](https://en.wikipedia.org/wiki/Single_instruction,_multiple_data) to optimize CPU usage.
+
+## Performance :rocket: :rocket:
+
+Polars is very fast, and in fact is one of the best performing solutions available.
+See the results in h2oai's [db-benchmark](https://duckdblabs.github.io/db-benchmark/), revived by the DuckDB project.
+
+Polars [TPC-H Benchmark results](https://www.pola.rs/benchmarks.html) are now available on the official website.
+
+## Example
+
+{{code_block('home/example','example',['scan_csv','filter','group_by','collect'])}}
+
+## Community
+
+Polars has a very active community with frequent releases (approximately weekly). Below are some of the top contributors to the project:
+
+--8<-- "docs/people.md"
+
+## Contributing
+
+We appreciate all contributions, from reporting bugs to implementing new features. Read our [contributing guide](development/contributing/index.md) to learn more.
+
+## License
+
+This project is licensed under the terms of the [MIT license](https://github.com/pola-rs/polars/blob/main/LICENSE).
diff --git a/docs/src/python/user-guide/basics/expressions.py b/docs/src/python/user-guide/basics/expressions.py
@@ -6,16 +6,19 @@
 
 df = pl.DataFrame(
     {
-        "a": range(5),
-        "b": np.random.rand(5),
+        "a": range(8),
+        "b": np.random.rand(8),
         "c": [
-            datetime(2025, 12, 1),
-            datetime(2025, 12, 2),
-            datetime(2025, 12, 3),
-            datetime(2025, 12, 4),
-            datetime(2025, 12, 5),
+            datetime(2022, 12, 1),
+            datetime(2022, 12, 2),
+            datetime(2022, 12, 3),
+            datetime(2022, 12, 4),
+            datetime(2022, 12, 5),
+            datetime(2022, 12, 6),
+            datetime(2022, 12, 7),
+            datetime(2022, 12, 8),
         ],
-        "d": [1, 2.0, float("nan"), -42, None],
+        "d": [1, 2.0, float("nan"), float("nan"), 0, -5, -42, None],
     }
 )
 # --8<-- [end:setup]
@@ -33,12 +36,12 @@
 # --8<-- [end:select3]
 
 # --8<-- [start:exclude]
-df.select(pl.exclude(["a", "c"]))
+df.select(pl.exclude("a"))
 # --8<-- [end:exclude]
 
 # --8<-- [start:filter]
 df.filter(
-    pl.col("c").is_between(datetime(2025, 12, 2), datetime(2025, 12, 3)),
+    pl.col("c").is_between(datetime(2022, 12, 2), datetime(2022, 12, 8)),
 )
 # --8<-- [end:filter]
 

diff --git a/docs/src/python/user-guide/basics/reading-writing.py b/docs/src/python/user-guide/basics/reading-writing.py
@@ -6,12 +6,11 @@
     {
         "integer": [1, 2, 3],
         "date": [
-            datetime(2025, 1, 1),
-            datetime(2025, 1, 2),
-            datetime(2025, 1, 3),
+            datetime(2022, 1, 1),
+            datetime(2022, 1, 2),
+            datetime(2022, 1, 3),
         ],
         "float": [4.0, 5.0, 6.0],
-        "string": ["a", "b", "c"],
     }
 )
 

diff --git a/docs/src/rust/user-guide/basics/expressions.rs b/docs/src/rust/user-guide/basics/expressions.rs
@@ -6,16 +6,19 @@ fn main() -> Result<(), Box<dyn std::error::Error>> {
     let mut rng = rand::thread_rng();
 
     let df: DataFrame = df!(
-        "a" => 0..5,
-        "b"=> (0..5).map(|_| rng.gen::<f64>()).collect::<Vec<f64>>(),
+        "a" => 0..8,
+        "b"=> (0..8).map(|_| rng.gen::<f64>()).collect::<Vec<f64>>(),
         "c"=> [
-            NaiveDate::from_ymd_opt(2025, 12, 1).unwrap().and_hms_opt(0, 0, 0).unwrap(),
-            NaiveDate::from_ymd_opt(2025, 12, 2).unwrap().and_hms_opt(0, 0, 0).unwrap(),
-            NaiveDate::from_ymd_opt(2025, 12, 3).unwrap().and_hms_opt(0, 0, 0).unwrap(),
-            NaiveDate::from_ymd_opt(2025, 12, 4).unwrap().and_hms_opt(0, 0, 0).unwrap(),
-            NaiveDate::from_ymd_opt(2025, 12, 5).unwrap().and_hms_opt(0, 0, 0).unwrap(),
+            NaiveDate::from_ymd_opt(2022, 12, 1).unwrap().and_hms_opt(0, 0, 0).unwrap(),
+            NaiveDate::from_ymd_opt(2022, 12, 2).unwrap().and_hms_opt(0, 0, 0).unwrap(),
+            NaiveDate::from_ymd_opt(2022, 12, 3).unwrap().and_hms_opt(0, 0, 0).unwrap(),
+            NaiveDate::from_ymd_opt(2022, 12, 4).unwrap().and_hms_opt(0, 0, 0).unwrap(),
+            NaiveDate::from_ymd_opt(2022, 12, 5).unwrap().and_hms_opt(0, 0, 0).unwrap(),
+            NaiveDate::from_ymd_opt(2022, 12, 6).unwrap().and_hms_opt(0, 0, 0).unwrap(),
+            NaiveDate::from_ymd_opt(2022, 12, 7).unwrap().and_hms_opt(0, 0, 0).unwrap(),
+            NaiveDate::from_ymd_opt(2022, 12, 8).unwrap().and_hms_opt(0, 0, 0).unwrap(),
         ],
-        "d"=> [Some(1.0), Some(2.0), None, Some(-42.), None]
+        "d"=> [Some(1.0), Some(2.0), None, None, Some(0.0), Some(-5.0), Some(-42.), None]
     )
     .unwrap();
 
@@ -43,17 +46,17 @@ fn main() -> Result<(), Box<dyn std::error::Error>> {
     let out = df
         .clone()
         .lazy()
-        .select([col("*").exclude(["a", "c"])])
+        .select([col("*").exclude(["a"])])
         .collect()?;
     println!("{}", out);
     // --8<-- [end:exclude]
 
     // --8<-- [start:filter]
-    let start_date = NaiveDate::from_ymd_opt(2025, 12, 2)
+    let start_date = NaiveDate::from_ymd_opt(2022, 12, 2)
         .unwrap()
         .and_hms_opt(0, 0, 0)
         .unwrap();
-    let end_date = NaiveDate::from_ymd_opt(2025, 12, 3)
+    let end_date = NaiveDate::from_ymd_opt(2022, 12, 8)
         .unwrap()
         .and_hms_opt(0, 0, 0)
         .unwrap();

diff --git a/docs/src/rust/user-guide/basics/reading-writing.rs b/docs/src/rust/user-guide/basics/reading-writing.rs
@@ -9,9 +9,9 @@ fn main() -> Result<(), Box<dyn std::error::Error>> {
     let mut df: DataFrame = df!(
         "integer" => &[1, 2, 3],
         "date" => &[
-                NaiveDate::from_ymd_opt(2025, 1, 1).unwrap().and_hms_opt(0, 0, 0).unwrap(),
-                NaiveDate::from_ymd_opt(2025, 1, 2).unwrap().and_hms_opt(0, 0, 0).unwrap(),
-                NaiveDate::from_ymd_opt(2025, 1, 3).unwrap().and_hms_opt(0, 0, 0).unwrap(),
+                NaiveDate::from_ymd_opt(2022, 1, 1).unwrap().and_hms_opt(0, 0, 0).unwrap(),
+                NaiveDate::from_ymd_opt(2022, 1, 2).unwrap().and_hms_opt(0, 0, 0).unwrap(),
+                NaiveDate::from_ymd_opt(2022, 1, 3).unwrap().and_hms_opt(0, 0, 0).unwrap(),
         ],
         "float" => &[4.0, 5.0, 6.0]
     )

diff --git a/docs/user-guide/basics/expressions.md b/docs/user-guide/basics/expressions.md
@@ -0,0 +1,130 @@
+# Expressions
+
+`Expressions` are the core strength of Polars. The `expressions` offer a versatile structure that both solves easy queries and is easily extended to complex ones. Below we will cover the basic components that serve as building block (or in Polars terminology contexts) for all your queries:
+
+- `select`
+- `filter`
+- `with_columns`
+- `group_by`
+
+To learn more about expressions and the context in which they operate, see the User Guide sections: [Contexts](../concepts/contexts.md) and [Expressions](../concepts/expressions.md).
+
+### Select statement
+
+To select a column we need to do two things. Define the `DataFrame` we want the data from. And second, select the data that we need. In the example below you see that we select `col('*')`. The asterisk stands for all columns.
+
+{{code_block('user-guide/basics/expressions','select',['select'])}}
+
+```python exec="on" result="text" session="getting-started/expressions"
+--8<-- "python/user-guide/basics/expressions.py:setup"
+print(
+    --8<-- "python/user-guide/basics/expressions.py:select"
+)
+```
+
+You can also specify the specific columns that you want to return. There are two ways to do this. The first option is to pass the column names, as seen below.
+
+{{code_block('user-guide/basics/expressions','select2',['select'])}}
+
+```python exec="on" result="text" session="getting-started/expressions"
+print(
+    --8<-- "python/user-guide/basics/expressions.py:select2"
+)
+```
+
+The second option is to specify each column using `pl.col`. This option is shown below.
+
+{{code_block('user-guide/basics/expressions','select3',['select'])}}
+
+```python exec="on" result="text" session="getting-started/expressions"
+print(
+    --8<-- "python/user-guide/basics/expressions.py:select3"
+)
+```
+
+If you want to exclude an entire column from your view, you can simply use `exclude` in your `select` statement.
+
+{{code_block('user-guide/basics/expressions','exclude',['select'])}}
+
+```python exec="on" result="text" session="getting-started/expressions"
+print(
+    --8<-- "python/user-guide/basics/expressions.py:exclude"
+)
+```
+
+### Filter
+
+The `filter` option allows us to create a subset of the `DataFrame`. We use the same `DataFrame` as earlier and we filter between two specified dates.
+
+{{code_block('user-guide/basics/expressions','filter',['filter'])}}
+
+```python exec="on" result="text" session="getting-started/expressions"
+print(
+    --8<-- "python/user-guide/basics/expressions.py:filter"
+)
+```
+
+With `filter` you can also create more complex filters that include multiple columns.
+
+{{code_block('user-guide/basics/expressions','filter2',['filter'])}}
+
+```python exec="on" result="text" session="getting-started/expressions"
+print(
+    --8<-- "python/user-guide/basics/expressions.py:filter2"
+)
+```
+
+### With_columns
+
+`with_columns` allows you to create new columns for your analyses. We create two new columns `e` and `b+42`. First we sum all values from column `b` and store the results in column `e`. After that we add `42` to the values of `b`. Creating a new column `b+42` to store these results.
+
+{{code_block('user-guide/basics/expressions','with_columns',['with_columns'])}}
+
+```python exec="on" result="text" session="getting-started/expressions"
+print(
+    --8<-- "python/user-guide/basics/expressions.py:with_columns"
+)
+```
+
+### Group by
+
+We will create a new `DataFrame` for the Group by functionality. This new `DataFrame` will include several 'groups' that we want to group by.
+
+{{code_block('user-guide/basics/expressions','dataframe2',['DataFrame'])}}
+
+```python exec="on" result="text" session="getting-started/expressions"
+--8<-- "python/user-guide/basics/expressions.py:dataframe2"
+print(df2)
+```
+
+{{code_block('user-guide/basics/expressions','group_by',['group_by'])}}
+
+```python exec="on" result="text" session="getting-started/expressions"
+print(
+    --8<-- "python/user-guide/basics/expressions.py:group_by"
+)
+```
+
+{{code_block('user-guide/basics/expressions','group_by2',['group_by'])}}
+
+```python exec="on" result="text" session="getting-started/expressions"
+print(
+    --8<-- "python/user-guide/basics/expressions.py:group_by2"
+)
+```
+
+### Combining operations
+
+Below are some examples on how to combine operations to create the `DataFrame` you require.
+
+{{code_block('user-guide/basics/expressions','combine',['select','with_columns'])}}
+
+```python exec="on" result="text" session="getting-started/expressions"
+--8<-- "python/user-guide/basics/expressions.py:combine"
+```
+
+{{code_block('user-guide/basics/expressions','combine2',['select','with_columns'])}}
+
+```python exec="on" result="text" session="getting-started/expressions"
+--8<-- "python/user-guide/basics/expressions.py:combine2"
+```
diff --git a/docs/user-guide/basics/index.md b/docs/user-guide/basics/index.md
@@ -0,0 +1,18 @@
+# Introduction
+
+This chapter is intended for new Polars users.
+The goal is to provide a quick overview of the most common functionality.
+Feel free to skip ahead to the [next chapter](../concepts/data-types/overview.md) to dive into the details.
+
+!!! rust "Rust Users Only"
+
+    Due to historical reasons, the eager API in Rust is outdated. In the future, we would like to redesign it as a small wrapper around the lazy API (as is the design in Python / NodeJS). In the examples, we will use the lazy API instead with `.lazy()` and `.collect()`. For now you can ignore these two functions. If you want to know more about the lazy and eager API, go [here](../concepts/lazy-vs-eager.md).
+
+    To enable the Lazy API ensure you have the feature flag `lazy` configured when installing Polars
+    ```
+    # Cargo.toml
+    [dependencies]
+    polars = { version = "x", features = ["lazy", ...]}
+    ```
+
+    Because of the ownership ruling in Rust, we can not reuse the same `DataFrame` multiple times in the examples. For simplicity reasons we call `clone()` to overcome this issue. Note that this does not duplicate the data but just increments a pointer (`Arc`).
diff --git a/docs/user-guide/basics/joins.md b/docs/user-guide/basics/joins.md
@@ -0,0 +1,26 @@
+# Combining DataFrames
+
+There are two ways `DataFrame`s can be combined depending on the use case: join and concat.
+
+## Join
+
+Polars supports all types of join (e.g. left, right, inner, outer). Let's have a closer look on how to `join` two `DataFrames` into a single `DataFrame`. Our two `DataFrames` both have an 'id'-like column: `a` and `x`. We can use those columns to `join` the `DataFrames` in this example.
+
+{{code_block('user-guide/basics/joins','join',['join'])}}
+
+```python exec="on" result="text" session="getting-started/joins"
+--8<-- "python/user-guide/basics/joins.py:setup"
+--8<-- "python/user-guide/basics/joins.py:join"
+```
+
+To see more examples with other types of joins, go the [User Guide](../transformations/joins.md).
+
+## Concat
+
+We can also `concatenate` two `DataFrames`. Vertical concatenation will make the `DataFrame` longer. Horizontal concatenation will make the `DataFrame` wider. Below you can see the result of an horizontal concatenation of our two `DataFrames`.
+
+{{code_block('user-guide/basics/joins','hstack',['hstack'])}}
+
+```python exec="on" result="text" session="getting-started/joins"
+--8<-- "python/user-guide/basics/joins.py:hstack"
+```