Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up compilation of all our WriteAttribute machinery. #11603

Merged

Conversation

bzbarsky-apple
Copy link
Contributor

It turns out that instantiating fairly heavy-weight templates hundreds
of times is slow to compile.

Instead of having an instantiation per attribute, switch to only
instantiating the complex templates per type of attribute, with thin
per-attribute wrappers for auto-deriving the cluster id and attribute
id. This shaves over a minute of wall-clock time off compiling
chip-tool for me, and close to 2 minutes of total CPU time.

Problem

Slower compiles than we want.

Change overview

See above.

Testing

No behavior changes, did lots of measurement of compile times.

@github-actions
Copy link

github-actions bot commented Nov 9, 2021

PR #11603: Size comparison from 89898f8 to fbc6e3d

Full report (9 builds for k32w, p6, qpg, telink)
platform target config section 89898f8 fbc6e3d change % change
k32w lock-app k32w061+debug (read/write) 592360 592360 0 0.0
.bss 68524 68524 0 0.0
.data 1880 1880 0 0.0
.text 516156 516156 0 0.0
shell k32w061+debug (read/write) 658016 658016 0 0.0
.bss 79324 79324 0 0.0
.data 1848 1848 0 0.0
.text 571044 571044 0 0.0
lighting-app k32w061+se05x+release (read/write) 699648 699648 0 0.0
.bss 77996 77996 0 0.0
.data 1912 1912 0 0.0
.text 613940 613940 0 0.0
p6 all-clusters-app default (read/write) 2299528 2299528 0 0.0
.bss 112448 112448 0 0.0
.data 2536 2536 0 0.0
.heap 918360 918360 0 0.0
.text 1257792 1257792 0 0.0
lock-app default (read/write) 2212184 2212184 0 0.0
.bss 101256 101256 0 0.0
.data 2408 2408 0 0.0
.heap 929680 929680 0 0.0
.text 1170448 1170448 0 0.0
qpg lighting-app qpg6100+debug (read only) 490776 490776 0 0.0
(read/write) 114140 114140 0 0.0
.bss 51152 51152 0 0.0
.data 1012 1012 0 0.0
.text 485456 485456 0 0.0
lock-app qpg6100+debug (read only) 466988 466988 0 0.0
(read/write) 114144 114144 0 0.0
.bss 50096 50096 0 0.0
.data 968 968 0 0.0
.text 461668 461668 0 0.0
persistent-storage-app qpg6100+debug (read only) 153400 153400 0 0.0
(read/write) 114140 114140 0 0.0
.bss 19616 19616 0 0.0
.data 364 364 0 0.0
.text 148080 148080 0 0.0
telink lighting-app tlsr9518adk80d (read/write) 663750 663750 0 0.0
bss 69272 69272 0 0.0
noinit 33216 33216 0 0.0
text 458596 458596 0 0.0

It turns out that instantiating fairly heavy-weight templates hundreds
of times is slow to compile.

Instead of having an instantiation per attribute, switch to only
instantiating the complex templates per _type_ of attribute, with thin
per-attribute wrappers for auto-deriving the cluster id and attribute
id.  This shaves over a minute of wall-clock time off compiling
chip-tool for me, and close to 2 minutes of total CPU time.
@github-actions
Copy link

github-actions bot commented Nov 9, 2021

PR #11603: Size comparison from 89898f8 to edb6f1c

Decreases (2 builds for linux)
platform target config section 89898f8 edb6f1c change % change
linux chip-tool debug (read only) 4995029 4615541 -379488 -7.6
.rodata 242064 241360 -704 -0.3
.text 4481717 4102933 -378784 -8.5
tv-app debug .data.rel.ro 59448 59432 -16 -0.0
Full report (38 builds for efr32, esp32, k32w, linux, mbed, nrfconnect, p6, qpg, telink)
platform target config section 89898f8 edb6f1c change % change
efr32 lighting-app BRD4161A (read only) 742904 742904 0 0.0
(read/write) 116268 116268 0 0.0
.bss 114484 114484 0 0.0
.data 1784 1784 0 0.0
.text 742896 742896 0 0.0
BRD4161A+rpc (read only) 730440 730440 0 0.0
(read/write) 132892 132892 0 0.0
.bss 130988 130988 0 0.0
.data 1900 1900 0 0.0
.text 730432 730432 0 0.0
lock-app BRD4161A (read only) 722192 722192 0 0.0
(read/write) 114084 114084 0 0.0
.bss 112340 112340 0 0.0
.data 1744 1744 0 0.0
.text 722184 722184 0 0.0
window-app BRD4161A (read only) 723088 723088 0 0.0
(read/write) 114412 114412 0 0.0
.bss 112660 112660 0 0.0
.data 1748 1748 0 0.0
.text 723080 723080 0 0.0
esp32 all-clusters-app c3devkit (read only) 880694 880694 0 0.0
(read/write) 1306536 1306536 0 0.0
.dram0.bss 58464 58464 0 0.0
.dram0.data 16472 16472 0 0.0
.flash.rodata 198360 198360 0 0.0
.flash.text 880694 880694 0 0.0
.iram0.text 57526 57526 0 0.0
m5stack (read only) 911867 911867 0 0.0
(read/write) 423864 423864 0 0.0
.dram0.bss 60968 60968 0 0.0
.dram0.data 32108 32108 0 0.0
.flash.rodata 204624 204624 0 0.0
.flash.text 911867 911867 0 0.0
.iram0.text 125115 125115 0 0.0
k32w lighting-app k32w061+se05x+release (read/write) 699648 699648 0 0.0
.bss 77996 77996 0 0.0
.data 1912 1912 0 0.0
.text 613940 613940 0 0.0
lock-app k32w061+debug (read/write) 592360 592360 0 0.0
.bss 68524 68524 0 0.0
.data 1880 1880 0 0.0
.text 516156 516156 0 0.0
shell k32w061+debug (read/write) 658016 658016 0 0.0
.bss 79324 79324 0 0.0
.data 1848 1848 0 0.0
.text 571044 571044 0 0.0
linux all-clusters-app debug (read only) 1710601 1710601 0 0.0
(read/write) 126528 126528 0 0.0
.bss 57872 57872 0 0.0
.data 1042 1042 0 0.0
.data.rel.ro 62352 62352 0 0.0
.dynamic 592 592 0 0.0
.got 4088 4088 0 0.0
.init 27 27 0 0.0
.init_array 552 552 0 0.0
.rodata 139765 139765 0 0.0
.text 1437346 1437346 0 0.0
bridge-app debug+rpc (read only) 1298253 1298253 0 0.0
(read/write) 77072 77072 0 0.0
.bss 42768 42768 0 0.0
.data 1568 1568 0 0.0
.data.rel.ro 27760 27760 0 0.0
.dynamic 592 592 0 0.0
.got 3952 3952 0 0.0
.init 27 27 0 0.0
.init_array 408 408 0 0.0
.rodata 111540 111540 0 0.0
.text 1090821 1090821 0 0.0
chip-tool debug (read only) 4995029 4615541 -379488 -7.6
(read/write) 134760 134760 0 0.0
.bss 25840 25840 0 0.0
.data 2256 2256 0 0.0
.data.rel.ro 101232 101232 0 0.0
.dynamic 592 592 0 0.0
.got 4368 4368 0 0.0
.init 27 27 0 0.0
.init_array 432 432 0 0.0
.rodata 242064 241360 -704 -0.3
.text 4481717 4102933 -378784 -8.5
lighting-app debug+rpc (read only) 1557945 1557945 0 0.0
(read/write) 110088 110088 0 0.0
.bss 48432 48432 0 0.0
.data 1202 1202 0 0.0
.data.rel.ro 55168 55168 0 0.0
.dynamic 608 608 0 0.0
.got 4112 4112 0 0.0
.init 27 27 0 0.0
.init_array 528 528 0 0.0
.rodata 128977 128977 0 0.0
.text 1295410 1295410 0 0.0
ota-provider-app debug (read only) 1259721 1259721 0 0.0
(read/write) 75336 75336 0 0.0
.bss 44864 44864 0 0.0
.data 752 752 0 0.0
.data.rel.ro 24616 24616 0 0.0
.dynamic 592 592 0 0.0
.got 4016 4016 0 0.0
.init 27 27 0 0.0
.init_array 448 448 0 0.0
.rodata 113216 113216 0 0.0
.text 1050258 1050258 0 0.0
ota-requestor-app debug (read only) 1344281 1344281 0 0.0
(read/write) 79104 79104 0 0.0
.bss 47328 47328 0 0.0
.data 816 816 0 0.0
.data.rel.ro 25880 25880 0 0.0
.dynamic 592 592 0 0.0
.got 3992 3992 0 0.0
.init 27 27 0 0.0
.init_array 472 472 0 0.0
.rodata 124232 124232 0 0.0
.text 1121250 1121250 0 0.0
shell debug (read only) 789065 789065 0 0.0
(read/write) 65480 65480 0 0.0
.bss 23912 23912 0 0.0
.data 242 242 0 0.0
.data.rel.ro 36816 36816 0 0.0
.dynamic 592 592 0 0.0
.got 3528 3528 0 0.0
.init 27 27 0 0.0
.init_array 344 344 0 0.0
.rodata 78191 78191 0 0.0
.text 609362 609362 0 0.0
tv-app debug (read only) 1842281 1842281 0 0.0
(read/write) 407936 407936 0 0.0
.bss 340112 340112 0 0.0
.data 2736 2736 0 0.0
.data.rel.ro 59448 59432 -16 -0.0
.dynamic 592 592 0 0.0
.got 4408 4408 0 0.0
.init 27 27 0 0.0
.init_array 616 616 0 0.0
.rodata 156456 156456 0 0.0
.text 1541906 1541906 0 0.0
mbed all-clusters-app CY8CPROTO_062_4343W+release (read only) 6224 6224 0 0.0
(read/write) 2290856 2290856 0 0.0
.bss 179436 179436 0 0.0
.data 5232 5232 0 0.0
.heap 851776 851776 0 0.0
.text 1253456 1253456 0 0.0
lighting-app CY8CPROTO_062_4343W+release (read only) 6224 6224 0 0.0
(read/write) 2270952 2270952 0 0.0
.bss 172492 172492 0 0.0
.data 5584 5584 0 0.0
.heap 858368 858368 0 0.0
.text 1233552 1233552 0 0.0
lock-app CY8CPROTO_062_4343W+release (read only) 6224 6224 0 0.0
(read/write) 2248672 2248672 0 0.0
.bss 171388 171388 0 0.0
.data 5568 5568 0 0.0
.heap 859488 859488 0 0.0
.text 1211272 1211272 0 0.0
pigweed-app CY8CPROTO_062_4343W+release (read only) 6224 6224 0 0.0
(read/write) 1139744 1139744 0 0.0
.bss 11752 11752 0 0.0
.data 4368 4368 0 0.0
.heap 1020328 1020328 0 0.0
.text 103128 103128 0 0.0
shell CY8CPROTO_062_4343W+release (read only) 6224 6224 0 0.0
(read/write) 2048864 2048864 0 0.0
.bss 156456 156456 0 0.0
.data 4976 4976 0 0.0
.heap 875016 875016 0 0.0
.text 1011464 1011464 0 0.0
nrfconnect lighting-app nrf52840dk_nrf52840 (read/write) 862155 862155 0 0.0
bss 111460 111460 0 0.0
rodata 96924 96924 0 0.0
text 578128 578128 0 0.0
nrf52840dk_nrf52840+rpc (read/write) 824503 824503 0 0.0
bss 107812 107812 0 0.0
rodata 88104 88104 0 0.0
text 552276 552276 0 0.0
nrf5340dk_nrf5340_cpuapp (read/write) 787162 787162 0 0.0
bss 112832 112832 0 0.0
rodata 92180 92180 0 0.0
text 507600 507600 0 0.0
lock-app nrf52840dk_nrf52840 (read/write) 838831 838831 0 0.0
bss 110492 110492 0 0.0
rodata 93296 93296 0 0.0
text 559612 559612 0 0.0
nrf5340dk_nrf5340_cpuapp (read/write) 764142 764142 0 0.0
bss 111904 111904 0 0.0
rodata 88600 88600 0 0.0
text 489176 489176 0 0.0
pigweed-app nrf52840dk_nrf52840 (read/write) 497327 497327 0 0.0
bss 51824 51824 0 0.0
rodata 45780 45780 0 0.0
text 339436 339436 0 0.0
pump-app nrf52840dk_nrf52840 (read/write) 844955 844955 0 0.0
bss 110632 110632 0 0.0
rodata 95004 95004 0 0.0
text 563772 563772 0 0.0
pump-controller-app nrf52840dk_nrf52840 (read/write) 838699 838699 0 0.0
bss 110528 110528 0 0.0
rodata 93292 93292 0 0.0
text 559348 559348 0 0.0
shell nrf52840dk_nrf52840 (read/write) 776431 776431 0 0.0
bss 109280 109280 0 0.0
rodata 72564 72564 0 0.0
text 520004 520004 0 0.0
nrf5340dk_nrf5340_cpuapp (read/write) 691482 691482 0 0.0
bss 110264 110264 0 0.0
rodata 67204 67204 0 0.0
text 440612 440612 0 0.0
p6 all-clusters-app default (read/write) 2299528 2299528 0 0.0
.bss 112448 112448 0 0.0
.data 2536 2536 0 0.0
.heap 918360 918360 0 0.0
.text 1257792 1257792 0 0.0
lock-app default (read/write) 2212184 2212184 0 0.0
.bss 101256 101256 0 0.0
.data 2408 2408 0 0.0
.heap 929680 929680 0 0.0
.text 1170448 1170448 0 0.0
qpg lighting-app qpg6100+debug (read only) 490776 490776 0 0.0
(read/write) 114140 114140 0 0.0
.bss 51152 51152 0 0.0
.data 1012 1012 0 0.0
.text 485456 485456 0 0.0
lock-app qpg6100+debug (read only) 466988 466988 0 0.0
(read/write) 114144 114144 0 0.0
.bss 50096 50096 0 0.0
.data 968 968 0 0.0
.text 461668 461668 0 0.0
persistent-storage-app qpg6100+debug (read only) 153400 153400 0 0.0
(read/write) 114140 114140 0 0.0
.bss 19616 19616 0 0.0
.data 364 364 0 0.0
.text 148080 148080 0 0.0
telink lighting-app tlsr9518adk80d (read/write) 663750 663750 0 0.0
bss 69272 69272 0 0.0
noinit 33216 33216 0 0.0
text 458596 458596 0 0.0

@bzbarsky-apple
Copy link
Contributor Author

@woody-apple woody-apple merged commit be75489 into project-chip:master Nov 9, 2021
@bzbarsky-apple bzbarsky-apple deleted the faster-write-compile branch November 9, 2021 23:35
PSONALl pushed a commit to PSONALl/connectedhomeip that referenced this pull request Dec 3, 2021
…ip#11603)

It turns out that instantiating fairly heavy-weight templates hundreds
of times is slow to compile.

Instead of having an instantiation per attribute, switch to only
instantiating the complex templates per _type_ of attribute, with thin
per-attribute wrappers for auto-deriving the cluster id and attribute
id.  This shaves over a minute of wall-clock time off compiling
chip-tool for me, and close to 2 minutes of total CPU time.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants