Skip to content

Commit 523061d

Browse files
committed
Add docs for hls4ml Optimization API
1 parent 30b6b33 commit 523061d

File tree

1 file changed

+120
-0
lines changed

1 file changed

+120
-0
lines changed
Lines changed: 120 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,120 @@
1+
========================
2+
hls4ml Optimization API
3+
========================
4+
5+
Pruning and weight sharing are effective techniques to reduce model footprint and computational requirements. The hls4ml Optimization API introduces hardware-aware pruning and weight sharing.
6+
By defining custom objectives, the algorithm solves a Knapsack optimization problem aimed at maximizing model performance, while keeping the target resource(s) at a minimum. Out-of-the box objectives include network sparsity, GPU FLOPs, Vivado DSPs, memory utilization etc.
7+
8+
The code block below showcases three use cases of the hls4ml Optimization API - network sparsity (unstructured pruning), GPU FLOPs (structured pruning) and Vivado DSP utilization (pattern pruning). First, we start with unstructured pruning:
9+
10+
.. code-block:: Python
11+
from sklearn.metrics import accuracy_score
12+
from tensorflow.keras.optimizers import Adam
13+
from tensorflow.keras.metrics import CategoricalAccuracy
14+
from tensorflow.keras.losses import CategoricalCrossentropy
15+
16+
from hls4ml.optimization.keras import optimize_model
17+
from hls4ml.optimization.keras.utils import get_model_sparsity
18+
from hls4ml.optimization.attributes import get_attributes_from_keras_model
19+
20+
from hls4ml.optimization.objectives import ParameterEstimator
21+
from hls4ml.optimization.scheduler import PolynomialScheduler
22+
23+
# Define baseline model and load data
24+
# X_train, y_train = ...
25+
# X_val, y_val = ...
26+
# X_test, y_test = ...
27+
# baseline_model = ...
28+
29+
# Evaluate baseline model
30+
y_baseline = baseline_model.predict(X_test)
31+
acc_base = accuracy_score(np.argmax(y_test, axis=1), np.argmax(y_baseline, axis=1))
32+
sparsity, layers = get_model_sparsity(baseline_model)
33+
print(f'Baseline Keras accuracy: {acc_base}')
34+
print(f'Baseline Keras sparsity, overall: {sparsity}')
35+
print(f'Baseline Keras sparsity, per-layer: {layers}')
36+
37+
# Defining training parameters
38+
# Epochs refers to the number of maximum epochs to train a model, after imposing some sparsity
39+
# If the model is pre-trained, a good rule of thumb is to use between a 1/3 and 1/2 of the number of epochs used to train baseline model
40+
epochs = 10
41+
batch_size = 128
42+
metric = 'accuracy'
43+
optimizer = Adam()
44+
loss_fn = CategoricalCrossentropy(from_logits=True)
45+
46+
# Define the metric to monitor, as well as if its increasing or decreasing
47+
# This disctinction allows us to optimize both regression and classification models
48+
# In regression, e.g. minimize validation MSE & for classification e.g. maximize accuracy
49+
metric, increasing = CategoricalAccuracy(), True
50+
51+
# Relative tolerance (rtol) is the the relative loss in metric the optimized model is allowed to incur
52+
rtol = 0.975
53+
54+
# A scheduler defines how the sparsity is incremented at each step
55+
# In this case, the maximum sparsity is 50% and it will be applied at a polynomially decreasing rate, for 10 steps
56+
# If the final sparsity is unspecified, it is set to 100%
57+
# The optimization algorithm stops either when (i) the relative drop in performance is below threshold or (ii) final sparsity reached
58+
scheduler = PolynomialScheduler(5, final_sparsity=0.5)
59+
60+
# Get model attributes
61+
model_attributes = get_attributes_from_keras_model(baseline_model)
62+
63+
# Optimize model
64+
# ParameterEstimator is the objective and, in this case, the objective is to minimize the total number of parameters
65+
optimized_model = optimize_model(
66+
baseline_model, model_attributes, ParameterEstimator, scheduler,
67+
X_train, y_train, X_val, y_val, batch_size, epochs, optimizer, loss_fn, metric, increasing, rtol
68+
)
69+
70+
# Evaluate optimized model
71+
y_optimized = optimized_model.predict(X_test)
72+
acc_optimized = accuracy_score(np.argmax(y_test, axis=1), np.argmax(y_optimized, axis=1))
73+
sparsity, layers = get_model_sparsity(optimized_model)
74+
print(f'Optimized Keras accuracy: {acc_optimized}')
75+
print(f'Optimized Keras sparsity, overall: {sparsity}')
76+
print(f'Opimized Keras sparsity, per-layer: {layers}')
77+
78+
In a similar manner, it is possible to target GPU FLOPs or Vivado DSPs. However, in that case, sparsity is not equivalent to model sparsity.
79+
Instead, it is the sparsity of the target resource. As an example: Starting with a network utilizing 512 DSPs and a final sparsity of 50%; the optimized network will use 256 DSPs.
80+
81+
To optimize GPU FLOPs, the code is similar to above:
82+
83+
.. code-block:: Python
84+
from hls4ml.optimization.objectives.gpu_objectives import GPUFLOPEstimator
85+
86+
# Optimize model
87+
# Note the change from ParameterEstimator to GPUFLOPEstimator
88+
optimized_model = optimize_model(
89+
baseline_model, model_attributes, GPUFLOPEstimator, scheduler,
90+
X_train, y_train, X_val, y_val, batch_size, epochs, optimizer, loss_fn, metric, increasing, rtol
91+
)
92+
93+
# Evaluate optimized model
94+
y_optimized = optimized_model.predict(X_test)
95+
acc_optimized = accuracy_score(np.argmax(y_test, axis=1), np.argmax(y_optimized, axis=1))
96+
print(f'Optimized Keras accuracy: {acc_optimized}')
97+
98+
# Note the difference in total number of parameters
99+
# Optimizing GPU FLOPs is equivalent to removing entire structures (filters, neurons) from the network
100+
print(baseline_model.summary())
101+
print(optimized_model.summary())
102+
103+
Finally, optimizing Vivado DSPs is possible, given a hls4ml config:
104+
105+
.. code-block:: Python
106+
from hls4ml.utils.config import config_from_keras_model
107+
from hls4ml.optimization.objectives.vivado_objectives import VivadoDSPEstimator
108+
109+
# Create hls4ml config
110+
default_reuse_factor = 4
111+
default_precision = 'ac_fixed<16, 6>'
112+
hls_config = config_from_keras_model(baseline_model, granularity='name', default_precision=default_precision, default_reuse_factor=default_reuse_factor)
113+
hls_config['IOType'] = 'io_parallel'
114+
115+
# Optimize model
116+
# Note the change from ParameterEstimator to VivadoDSPEstimator
117+
optimized_model = optimize_model(
118+
baseline_model, model_attributes, VivadoDSPEstimator, scheduler,
119+
X_train, y_train, X_val, y_val, batch_size, epochs, optimizer, loss_fn, metric, increasing, rtol
120+
)

0 commit comments

Comments
 (0)