Intelligent Python Performance Optimization Tool
Automatically profile, analyze, and accelerate your Python code with minimal effort.
๐ฏ Specialized for CPU-intensive, computationally heavy algorithms - not all Python code benefits from optimization.
- Automatic Hotspot Detection: Uses
cProfileintegration to identify performance bottlenecks - ML-Powered Suggestions: Advanced heuristics to recommend the best optimization strategy
- Multi-Backend Support: Seamlessly applies Numba JIT, NumPy vectorization, and transpilation hints
- ๐ฑ Modern GUI: User-friendly desktop application with real-time optimization preview
- ๐ Jupyter Magic:
%%pyspeedcell magic for notebook-based development workflow - ๐ Performance Comparison: Built-in benchmarking with statistical timing analysis
- ๐ฅ Numba JIT Compilation: Automatic
@numba.njitdecoration for numerical functions - ๐ NumPy Vectorization: Transforms explicit loops into vectorized operations
- ๐ ๏ธ Transpilation Ready: Generates stubs for C++/Rust acceleration
- ๐ง Intelligent Targeting: Applies optimizations only where they matter most
Test Environment: AMD Ryzen 5-3600, Python 3.9, Numba 0.61.2
| Test Case | Original (s) | Optimized (s) | Speedup | Status |
|---|---|---|---|---|
| Recursive Fibonacci | 6.90 | 0.04 | 172x | โ Verified |
| Monte Carlo ฯ | 91.0 | 0.98 | 92.4x | โ Verified |
| Matrix Multiplication | 43.0 | 0.61 | 71x | โ Verified |
| Image Convolution | 35.0 | 0.64 | 55x | โ Verified |
| Time Series Analysis | 47.0 | 0.98 | 48x | โ Verified |
| Pi Calculation (Leibniz) | 13.51 | 0.43 | 31.4x | โ Verified |
| Mandelbrot Set Fractal | 214.59 | 8.85 | 24.3x | โ Verified |
| Image Brightening | 17.34 | 0.96 | 18.1x | โ Verified |
Pi Calculation (Leibniz Series):
[11:27:34] INFO: Numba v0.61.2 is available.
[11:27:53] Applied transformations: Function 'calculate_pi' was transformed using: NUMBA
[11:27:58] โ
Optimized run time: 1.71s (includes JIT compilation)
[CLI Run] Warmed-up execution: 0.43s โ 31.4x speedup
Monte Carlo ฯ Estimation:
[11:39:52 - 11:41:23] Profiling complete (~91 seconds pure Python)
[CLI Run] Optimized execution: 0.98s โ 92.4x speedup
Matrix Multiplication (Triple-Nested Loops):
[11:51:37 - 11:52:20] Profiling complete (~43 seconds pure Python)
โ
Optimized run time: 1.21s (includes JIT compilation)
[CLI Run] Warmed-up execution: 0.61s โ 71x speedup
vs. np.dot(): 0.0016s (PySpeed closed massive performance gap)
Image Convolution (Quadruple-Nested Loops):
[11:55:42 - 11:56:17] Profiling complete (~35 seconds pure Python)
โ
Optimized run time: 1.34s (includes JIT compilation)
[CLI Run] Warmed-up execution: 0.64s โ 55x speedup
Output: Generated convoluted_blurred_image.png (correctness verified)
Time Series Analysis (Rolling Window SMA):
[12:00:19 - 12:01:06] Profiling complete (~47 seconds pure Python)
โ
Optimized run time: 2.15s (includes JIT compilation)
[CLI Run] Warmed-up execution: 0.98s โ 48x speedup
vs. pandas.rolling(): 0.16s (PySpeed closed 98% of performance gap)
Recursive Fibonacci (Memoization):
โ
Applied @functools.lru_cache decorator
Median Original: 6.90s โ Median Optimized: 0.04s
Speedup: 172x (exponential โ linear complexity)
Image Brightening (NumPy Vectorization):
โ
Transformed nested loops into vectorized operations
Median Original: 17.34s โ Median Optimized: 0.96s
Speedup: 18.1x (4K image processing)
Mandelbrot Set Fractal (Ultimate Stress Test):
โ ๏ธ Script too intensive to profile - switched to static analysis mode
โ
[RECOMMENDATION] Function is Numba-compatible (compilation verified)
Median Original: 214.59s (~3.6 minutes) โ Median Optimized: 8.85s
Speedup: 24.3x (transforms unusable โ production-ready)
-
1. Extreme Workload Handling: Proves PySpeed works on scripts that take minutes to run
-
2. Professional Robustness: Shows the tool gracefully handles profiling timeouts and switches to static analysis
-
3. Visual Impact: Mandelbrot fractals are universally recognized as computationally intensive and visually stunning
Real-World Relevance: Demonstrates PySpeed's value for:
Interactive development (3+ minutes โ <10 seconds) Production workloads (batch jobs become real-time) Research iteration (parameter tuning becomes feasible)
The 214.59 โ 8.85 seconds transformation is particularly compelling because it crosses the psychological barrier from "grab coffee while it runs" to "watch it complete." This positions PySpeed as a tool that transforms workflows, not just optimizes code! ๐
Key Insights:
- All CPU-bound algorithms achieved 18-172x speedups with zero manual optimization
- Multiple optimization types: JIT compilation, vectorization, and memoization all work seamlessly
- Complexity transformation: Converts exponential algorithms (Fibonacci) to linear performance
- Stress test resilience: Handles extreme workloads that timeout profiling (3+ minute runtimes)
- Gap bridging: PySpeed transforms unusable code (3-15 minutes) into production-ready performance (<10s)
- Near-native performance: Gets within 6x of professional C-backed libraries (pandas, NumPy)
# Clone the repository
git clone https://github.com/LMLK-seal/pyspeed.git
cd pyspeed
# Install in development mode
pip install -e .# Core functionality
pip install customtkinter astor ipython requests
# Optional (recommended for full optimization support)
# The fundamental package for scientific computing. Required by Numba and
# used in many test cases. Version <2 is recommended for broad compatibility
# with other scientific libraries like SciPy.
pip install numpy<2.0.0- Python: 3.8 or higher
- Operating System: Windows, macOS, Linux
- Memory: 512MB RAM minimum
- Dependencies: See
requirements.txtfor complete list
Launch the graphical interface:
python pyspeed_gui.pyWorkflow:
- ๐ Load Script โ Open your Python file
- ๐ Profile โ Identify performance hotspots (For computationally heavy algorithms profiling will time-out, skip to optimize).
- ๐ง Analyze & Optimize โ Apply intelligent transformations
- โก Compare โ Benchmark original vs. optimized code
Load the extension in your notebook:
%load_ext pyspeedBasic Usage:
%%pyspeed
def calculate_pi(n_terms: int):
"""CPU-intensive function perfect for optimization"""
numerator = 4.0
denominator = 1.0
pi = 0.0
for _ in range(n_terms):
pi += numerator / denominator
denominator += 2.0
pi -= numerator / denominator
denominator += 2.0
return pi
# This will be automatically profiled and optimized
result = calculate_pi(1_000_000)
print(f"ฯ โ {result}")Performance Comparison:
%%pyspeed --compare
import numpy as np
def slow_array_operation(a, b):
"""This loop will be vectorized automatically"""
c = np.zeros_like(a)
for i in range(len(a)):
c[i] = a[i] * b[i] + np.sin(a[i])
return c
# Generate test data
x = np.random.rand(100_000)
y = np.random.rand(100_000)
result = slow_array_operation(x, y)Perfect for researchers working with:
- Numerical simulations
- Monte Carlo methods
- Signal processing algorithms
- Mathematical modeling
Accelerate your data workflows:
- Large dataset processing
- Feature engineering pipelines
- Custom aggregation functions
- Statistical computations
Optimize bottlenecks in:
- Real-time systems
- Game development
- Financial algorithms
- Computer graphics
Matrix Multiplication (Before):
def slow_matrix_multiply(A, B):
"""Triple-nested loops - perfect Numba target"""
rows_A, cols_A = len(A), len(A[0])
rows_B, cols_B = len(B), len(B[0])
result = [[0 for _ in range(cols_B)] for _ in range(rows_A)]
for i in range(rows_A):
for j in range(cols_B):
for k in range(cols_A):
result[i][j] += A[i][k] * B[k][j]
return resultAfter (automatically generated):
import numba
@numba.njit
def slow_matrix_multiply(A, B):
rows_A, cols_A = len(A), len(A[0])
rows_B, cols_B = len(B), len(B[0])
result = [[0 for _ in range(cols_B)] for _ in range(rows_A)]
for i in range(rows_A):
for j in range(cols_B):
for k in range(cols_A):
result[i][j] += A[i][k] * B[k][j]
return resultโก Result: 71x speedup (43s โ 0.61s)
Before:
def element_wise_operation(a, b, c):
result = np.zeros_like(a)
for i in range(len(a)):
result[i] = a[i] * b[i] + c[i]
return resultAfter (automatically generated):
def element_wise_operation(a, b, c):
result = a * b + c # Vectorized operation
return resultโก Result: ~100x speedup for large arrays
File Organizer Script:
import os
import shutil
def organize_files_by_extension(source_dir, target_dir):
"""PySpeed will NOT optimize this - and that's good!"""
for filename in os.listdir(source_dir):
if os.path.isfile(os.path.join(source_dir, filename)):
extension = filename.split('.')[-1].lower()
ext_dir = os.path.join(target_dir, extension)
if not os.path.exists(ext_dir):
os.makedirs(ext_dir)
shutil.move(
os.path.join(source_dir, filename),
os.path.join(ext_dir, filename)
)Why PySpeed skips this:
- ๐ I/O Bound: Dominated by file system operations, not computation
- ๐ซ String Operations: Heavy use of string manipulation (not numeric)
- ๐ System Calls:
os.listdir,shutil.movecannot be JIT-compiled - โก Already Efficient: The bottleneck is disk speed, not Python code
๐ฏ PySpeed Result: No modifications suggested - "No functions were modified based on current hotspots and heuristics."
Time Series Analysis:
def calculate_sma_naive(data, window_size):
"""Naive sliding window - PySpeed transforms this"""
sma = []
for i in range(len(data) - window_size + 1):
window_sum = sum(data[i:i + window_size])
sma.append(window_sum / window_size)
return smaPerformance Comparison:
- Pure Python: 47.0 seconds โ
- PySpeed + Numba: 0.98 seconds โ (48x speedup)
- Pandas (C-backed): 0.16 seconds ๐
๐ฏ Achievement: PySpeed closed 98% of the performance gap between pure Python and professional libraries, automatically transforming unusable code into production-ready performance.
PySpeed includes optional anonymous telemetry to improve optimization algorithms:
{
"telemetry_consent": true,
"optimization_preferences": {
"prefer_numba": true,
"enable_experimental": false
}
}Configuration file location: ~/.pyspeed/config.json
What's collected (anonymously):
- โ Code structure hashes (not source code)
- โ Optimization success rates
- โ Performance improvement metrics
- โ Never your actual source code
- โ Never personally identifiable information
We welcome contributions! Here's how to get started:
Found an issue? Please create a detailed bug report:
- Environment: Python version, OS, dependencies
- Reproduction: Minimal code example
- Expected vs. Actual: What should happen vs. what happens
Have an optimization idea? We'd love to hear it:
- Use Case: Describe the scenario
- Implementation: Technical approach (if known)
- Impact: Expected performance benefits
PySpeed uses a modular optimization pipeline:
โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโ
โ Source โโโโโถโ Profiler โโโโโถโ Analyzer โ
โ Code โ โ (cProfile) โ โ (AST Walker) โ
โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโผโโโโโโโโโโ
โ Optimized โโโโโโ Transformer โโโโโโ ML Optimizer โ
โ Code โ โ (AST Rewrite)โ โ (Heuristics) โ
โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโ
This project is licensed under the MIT License - see the LICENSE file for details.
- Numba Team: For the incredible JIT compilation framework
- NumPy Community: For the foundation of scientific Python
- AST Module Contributors: Making code transformation possible
- All Contributors: Who help make PySpeed better every day
Made with โค๏ธ by LMLK-seal
