You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The following observations are made from the performance test:
65
+
66
+
. **Performance Scaling**: The increase from 520 bytes to 8000 bytes (15.4x size increase) results in approximately 9.8x performance degradation (19,173 ns/op vs 1,947 ns/op).
67
+
This represents sub-linear scaling, which suggests the implementation handles large data efficiently.
38
68
39
-
[cols="1,6"]
69
+
. **Instruction Count Scaling**: Instructions per operation increase from 27,681 to 285,220 (10.3x increase), closely matching the performance degradation, indicating the bottleneck is primarily computational rather than memory bandwidth.
70
+
71
+
. **Throughput Impact**: Operations per second decrease from 513,698 op/s to 52,158 op/s, representing a 9.8x reduction in throughput.
72
+
73
+
. **Cache Efficiency**: The IPC (Instructions Per Cycle) remains relatively stable (6.759 to 7.094), suggesting good CPU pipeline utilization despite the increased data size.
74
+
75
+
. **Memory Access Patterns**: The branch mis-prediction rate increases slightly (0.0% to 0.4%), indicating minimal impact on branch prediction accuracy.
76
+
77
+
78
+
**key**
79
+
80
+
[cols="1,6", options="header"]
40
81
|===
41
82
| Metric | Description
42
-
| ns/op | Nanoseconds per operation - the average time it takes to complete one benchmark iteration, measured in billionths of a second
43
-
| op/s | Operations per second - the throughput rate showing how many benchmark iterations can be completed per second
83
+
| ns/op | Nanoseconds per operation - average time it takes to complete one benchmark iteration
84
+
| op/s | Operations per second - throughput rate showing how many benchmark iterations can be completed per second
44
85
| err% | Error percentage - statistical margin of error in the measurement, indicating the reliability of the benchmark results
45
86
| ins/op | Instructions per operation - the number of CPU instructions executed for each benchmark iteration
46
87
| cyc/op | CPU cycles per operation - the number of CPU clock cycles consumed for each benchmark iteration
@@ -50,25 +91,25 @@ The following regression tests failed with `MAX_SCRIPT_ELEMENT_SIZE` set to 8000
50
91
| total | Total benchmark time - the total wall-clock time spent running the entire benchmark in seconds
Even though 64 bytes doesn't require padding (it's exactly one SHA256 block), the ins/op still increases from 8,736 to 11,107 instructions. Here's why:
74
115
@@ -111,27 +152,116 @@ Even though 64 bytes doesn't require padding (it's exactly one SHA256 block), th
111
152
The increase from 8,736 to 11,107 instructions (~27% increase) suggests that even without padding overhead, the additional data movement and processing of "real" data vs padded data adds significant instruction count.
112
153
This is a good example of how seemingly small changes in input size can affect the underlying implementation's code paths and optimization strategies.
NOTE: This test is likely irrelevant as per latest BIP-0360: _To prevent OP_DUP from creating an 8 MB stack by duplicating stack elements larger than 520 bytes we define OP_DUP to fail on stack elements larger than 520 bytes_.
192
+
193
+
This test builds off the previous (involving the hashing of large stack element data) by duplicating that stack element data.
194
+
195
+
The following Bitcoin script is used to conduct this performance test:
196
+
197
+
-----
198
+
<pre-image array> OP_DUP OP_SHA256 OP_DROP OP_1
199
+
-----
200
+
201
+
When executed, this script adds the pre-image array of arbitrary data to the stack.
202
+
Immediately after, a `OP_DUP` operation duplicates the pre-image array on the stack.
203
+
Then, a SHA256 hash function pops the pre-image array off the stack, executes a hash and adds the result to the top of the stack.
204
+
The `OP_DROP` operation removes the hash result from the stack.
* The additional instructions for OP_DUP are relatively small compared to the SHA256 computation
236
+
237
+
. Memory Operations:
238
+
+
239
+
The OP_DUP operation primarily affects memory operations rather than computational complexity.
240
+
This explains why the impact diminishes with larger data sizes where SHA256 computation dominates the performance.
241
+
242
+
This analysis shows that the OP_DUP operation has a measurable but manageable performance impact, especially for larger stack elements where the computational overhead of SHA256 dominates the overall execution time.
132
243
133
244
=== Procedure
134
245
246
+
* Testing is done using functionality found in the link:https://github.com/jbride/bitcoin/tree/p2qrh[p2qrh branch] of Bitcoin Core.
247
+
248
+
* Compilation of Bitcoin Core is done using the following `cmake` flags:
249
+
+
250
+
-----
251
+
$ cmake \
252
+
-B build \
253
+
-DWITH_ZMQ=ON \
254
+
-DCMAKE_EXPORT_COMPILE_COMMANDS=ON \
255
+
-DBUILD_BENCH=ON
256
+
-----
257
+
258
+
* Bench tests are conducted similar to the following :
0 commit comments