Added polars

ifnesi · Jan 8, 2024 · d655a35 · d655a35
1 parent 689cc43
commit d655a35
Show file tree

Hide file tree

Showing 2 changed files with 35 additions and 1 deletion.
diff --git a/README.md b/README.md
@@ -7,13 +7,16 @@ Python implementation of Gunnar's 1 billion row challenge:
 ## Performance (on a MacBook Pro M1 32GB)
 | Interpreter | Script | user | system | cpu | total |
 | ----------- | ------ | ---- | ------ | --- | ----- |
+| python3 | calculateAveragePolars.py | 77.84 | 3.64 | 703% | 11.585 |
 | pypy3 | calculateAveragePypy.py | ~~139.15~~<br>135.25 | ~~3.02s~~<br>2.92 | ~~699%~~<br>735% | ~~20.323~~<br>18.782 |
 | python3 | calculateAverageDuckDB.py | 186.78 | 4.21 | 806% | 23.673 |
 | pypy3 | calculateAverage.py | ~~284.90~~<br>242.89 | ~~9.12~~<br>6.28 | ~~749%~~<br>780% | ~~39.236~~<br>31.926 |
 | python3 | calculateAverage.py | ~~378.54~~<br>329.20 | ~~6.94~~<br>3.77 | ~~747%~~<br>793% | ~~51.544~~<br>41.941 |
 | python3 | calculateAveragePypy.py | ~~573.77~~<br>510.93 | ~~2.70~~<br>1.88 | ~~787%~~<br>793% | ~~73.170~~<br>64.660 |
 
-The file `calculateAveragePypy.py` was created by [donalm](https://github.com/donalm), a +2x improved version of the initial script (`calculateAverage.py`) when running in pypy3, even capable of beating the implementation using [DuckDB](https://duckdb.org/) `calculateAverageDuckDB.py`.
+The script `calculateAveragePolars.py` was suggested by [Taufan](https://github.com/mtaufanr) on this [post](https://github.com/gunnarmorling/1brc/discussions/62#discussioncomment-8026402).
+
+The script `calculateAveragePypy.py` was created by [donalm](https://github.com/donalm), a +2x improved version of the initial script (`calculateAverage.py`) when running in pypy3, even capable of beating the implementation using [DuckDB](https://duckdb.org/) `calculateAverageDuckDB.py`.
 
 [Olivier Scalbert](https://github.com/oscalbert) has made a simple but incredible suggestion where performance increased by an average of 15% (table above has been updated), thank you :slightly_smiling_face:
 

diff --git a/calculateAveragePolars.py b/calculateAveragePolars.py
@@ -0,0 +1,31 @@
+import polars as pl
+
+
+# Read data file
+df = pl.scan_csv(
+    "measurements.txt",
+    separator=";",
+    has_header=False,
+    with_column_names=lambda cols: ["station_name", "measurement"],
+)
+
+# Group data
+grouped = (
+    df.group_by("station_name")
+    .agg(
+        pl.min("measurement").alias("min_measurement"),
+        pl.mean("measurement").alias("mean_measurement"),
+        pl.max("measurement").alias("max_measurement"),
+    )
+    .sort("station_name")
+    .collect(streaming=True)
+)
+
+# Print final results
+print("{", end="")
+for data in grouped.iter_rows():
+    print(
+        f"{data[0]}={data[1]:.1f}/{data[2]:.1f}/{data[3]:.1f}",
+        end=", ",
+    )
+print("\b\b} ")