Skip to content

Commit

Permalink
Olivier Scalbert's improvements
Browse files Browse the repository at this point in the history
  • Loading branch information
ifnesi committed Jan 6, 2024
1 parent 5852308 commit 976f93f
Show file tree
Hide file tree
Showing 3 changed files with 73 additions and 42 deletions.
37 changes: 31 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,35 @@ Python implementation of Gunnar's 1 billion row challenge:
## Performance (on a MacBook Pro M1 32GB)
| Interpreter | Script | user | system | cpu | total |
| ----------- | ------ | ---- | ------ | --- | ----- |
| pypy3 | calculateAveragePypy.py | 139.15s | 3.02s | 699% | 20.323 |
| python3 | calculateAverageDuckDB.py | 186.78s | 4.21s | 806% | 23.673 |
| pypy3 | calculateAverage.py | 284.90s | 9.12s | 749% | 39.236 |
| python3 | calculateAverage.py | 378.54s | 6.94s | 747% | 51.544 |
| python3 | calculateAveragePypy.py | 573.77s | 2.70s | 787% | 73.170 |
| pypy3 | calculateAveragePypy.py | ~~139.15~~<br>135.25 | ~~3.02s~~<br>2.92 | ~~699%~~<br>735% | ~~20.323~~<br>18.782 |
| python3 | calculateAverageDuckDB.py | 186.78 | 4.21 | 806% | 23.673 |
| pypy3 | calculateAverage.py | ~~284.90~~<br>242.89 | ~~9.12~~<br>6.28 | ~~749%~~<br>780% | ~~39.236~~<br>31.926 |
| python3 | calculateAverage.py | ~~378.54~~<br>329.20 | ~~6.94~~<br>3.77 | ~~747%~~<br>793% | ~~51.544~~<br>41.941 |
| python3 | calculateAveragePypy.py | ~~573.77~~<br>510.93 | ~~2.70~~<br>1.88 | ~~787%~~<br>793% | ~~73.170~~<br>64.660 |

The file `calculateAveragePypy.py` was created by [donalm](https://github.com/donalm), a +2x improved version of the initial script (`calculateAverage.py`) when running in pypy3, even capable of beating the implementation using [DuckDB](https://duckdb.org/) `calculateAverageDuckDB.py`.
The file `calculateAveragePypy.py` was created by [donalm](https://github.com/donalm), a +2x improved version of the initial script (`calculateAverage.py`) when running in pypy3, even capable of beating the implementation using [DuckDB](https://duckdb.org/) `calculateAverageDuckDB.py`.

[Olivier Scalbert](https://github.com/oscalbert) has made a simple but incredible suggestion where performance increased by an average of 15% (table above has been updated), thank you :slightly_smiling_face:

His suggestions were to change from:
```
if measurement < result[location][0]:
result[location][0] = measurement
if measurement > result[location][1]:
result[location][1] = measurement
result[location][2] += measurement
result[location][3] += 1
```

to:
```
_result = result[location]
if measurement < _result[0]:
_result[0] = measurement
if measurement > _result[1]:
_result[1] = measurement
_result[2] += measurement
_result[3] += 1
```

Python can be surprising sometimes.
39 changes: 21 additions & 18 deletions calculateAverage.py
Original file line number Diff line number Diff line change
Expand Up @@ -77,12 +77,13 @@ def _process_file_chunk(
1,
] # min, max, sum, count
else:
if measurement < result[location][0]:
result[location][0] = measurement
if measurement > result[location][1]:
result[location][1] = measurement
result[location][2] += measurement
result[location][3] += 1
_result = result[location]
if measurement < _result[0]:
_result[0] = measurement
if measurement > _result[1]:
_result[1] = measurement
_result[2] += measurement
_result[3] += 1
return result


Expand All @@ -105,22 +106,24 @@ def process_file(
if location not in result:
result[location] = measurements
else:
if measurements[0] < result[location][0]:
result[location][0] = measurements[0]
if measurements[1] > result[location][1]:
result[location][1] = measurements[1]
result[location][2] += measurements[2]
result[location][3] += measurements[3]
_result = result[location]
if measurements[0] < _result[0]:
_result[0] = measurements[0]
if measurements[1] > _result[1]:
_result[1] = measurements[1]
_result[2] += measurements[2]
_result[3] += measurements[3]

# Print final results
results_calculated = dict()
print("{", end="")
for location, measurements in sorted(result.items()):
results_calculated[
location
] = f"{measurements[0]:.1f}/{(measurements[2] / measurements[3]) if measurements[3] !=0 else 0:.1f}/{measurements[1]:.1f}"
return results_calculated
print(
f"{location}={measurements[0]:.1f}/{(measurements[2] / measurements[3]) if measurements[3] !=0 else 0:.1f}/{measurements[1]:.1f}",
end=", ",
)
print("\b\b} ")


if __name__ == "__main__":
cpu_count, *start_end = get_file_chunks("measurements.txt")
print(process_file(cpu_count, start_end[0]))
process_file(cpu_count, start_end[0])
39 changes: 21 additions & 18 deletions calculateAveragePypy.py
Original file line number Diff line number Diff line change
Expand Up @@ -105,12 +105,13 @@ def _process_file_chunk(
1,
] # min, max, sum, count
else:
if value < result[location][0]:
result[location][0] = value
if value > result[location][1]:
result[location][1] = value
result[location][2] += value
result[location][3] += 1
_result = result[location]
if value < _result[0]:
_result[0] = value
if value > _result[1]:
_result[1] = value
_result[2] += value
_result[3] += 1

location = None

Expand All @@ -136,22 +137,24 @@ def process_file(
if location not in result:
result[location] = measurements
else:
if measurements[0] < result[location][0]:
result[location][0] = measurements[0]
if measurements[1] > result[location][1]:
result[location][1] = measurements[1]
result[location][2] += measurements[2]
result[location][3] += measurements[3]
_result = result[location]
if measurements[0] < _result[0]:
_result[0] = measurements[0]
if measurements[1] > _result[1]:
_result[1] = measurements[1]
_result[2] += measurements[2]
_result[3] += measurements[3]

# Print final results
results_calculated = dict()
print("{", end="")
for location, measurements in sorted(result.items()):
results_calculated[
location.decode("utf-8")
] = f"{measurements[0]:.1f}/{(measurements[2] / measurements[3]) if measurements[3] !=0 else 0:.1f}/{measurements[1]:.1f}"
return results_calculated
print(
f"{location.decode('utf-8')}={measurements[0]:.1f}/{(measurements[2] / measurements[3]) if measurements[3] !=0 else 0:.1f}/{measurements[1]:.1f}",
end=", ",
)
print("\b\b} ")


if __name__ == "__main__":
cpu_count, *start_end = get_file_chunks("measurements.txt")
print(process_file(cpu_count, start_end[0]))
process_file(cpu_count, start_end[0])

0 comments on commit 976f93f

Please sign in to comment.