Optimize gradient_descent

codeflash-ai[bot] · web-flow · commit 73c8fb65439f · 2025-11-24T23:55:46.000Z
The optimization dramatically improves performance by **replacing nested loops with vectorized NumPy operations**, achieving a **25815% speedup** (from 12.0 seconds to 46.3 milliseconds).

**Key optimizations applied:**

1. **Vectorized predictions**: Replaced the double nested loop for computing predictions with `X.dot(weights)`, leveraging NumPy's optimized BLAS routines instead of Python loops.

2. **Vectorized gradient calculation**: Eliminated another double nested loop by using `X.T.dot(errors) / m`, which computes the entire gradient vector in one operation.

3. **In-place weight updates**: Used vectorized subtraction `weights -= learning_rate * gradient` instead of element-wise loops.

**Why this is faster:**
- NumPy operations execute in optimized C code rather than interpreted Python loops
- BLAS libraries provide highly optimized matrix operations that utilize CPU cache efficiently
- Eliminates the overhead of millions of Python loop iterations (the profiler shows ~31M loop iterations in the original code)

**Performance characteristics from tests:**
- Excellent for large-scale problems (1000+ samples, 50+ features) where the vectorization advantage is most pronounced
- Maintains identical numerical behavior across all test cases (basic linear relationships, edge cases, large datasets)
- Particularly beneficial for the typical machine learning workloads with moderate to high iteration counts (500-1000 iterations)

The optimization transforms an O(iterations × m × n) nested loop implementation into efficient matrix operations, making it suitable for production machine learning pipelines where gradient descent is often called repeatedly.
diff --git a/src/numerical/optimization.py b/src/numerical/optimization.py
@@ -7,16 +7,8 @@ def gradient_descent(
     m, n = X.shape
     weights = np.zeros(n)
     for _ in range(iterations):
-        predictions = np.zeros(m)
-        for i in range(m):
-            for j in range(n):
-                predictions[i] += X[i, j] * weights[j]
+        predictions = X.dot(weights)
         errors = predictions - y
-        gradient = np.zeros(n)
-        for j in range(n):
-            for i in range(m):
-                gradient[j] += errors[i] * X[i, j]
-            gradient[j] /= m
-        for j in range(n):
-            weights[j] -= learning_rate * gradient[j]
+        gradient = X.T.dot(errors) / m
+        weights -= learning_rate * gradient
     return weights