Skip to content

Replace bit field manipulation functions with macros #913

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

akosthekiss
Copy link
Member

JerryScript-DCO-1.0-Signed-off-by: Akos Kiss akiss@inf.u-szeged.hu

@akosthekiss akosthekiss added enhancement An improvement performance Affects performance binary size Affects binary size labels Feb 22, 2016
@LaszloLango
Copy link
Contributor

I have related work in LaszloLango@1223b99
There is much optimization possibility in ecma_make_*_value functions too. My commit has ~10% performance gain, but the binary size is bigger with 8K.

@akosthekiss
Copy link
Member Author

This patch achieves quite good code size reduction on arm targets (2-8K on release builds).

filename master/text master/data pr/text pr/data gain/text gain/data
release.linux-armv7lhf/jerry 195943 52 187192 52 8751 0
release.linux-armv7lhf-cp/jerry 142148 52 136555 52 5593 0
release.linux-armv7lhf-cp_minimal/jerry 87957 0 85396 0 2561 0
release.linux-armv7lhf-mem_stats/jerry 196539 52 188297 52 8242 0
release.linux-armv7lhf-mem_stress_test/jerry 195575 52 187588 52 7987 0
release.linux-x86_64/jerry 278590 61 277340 52 1250 9
release.linux-x86_64-cp/jerry 210702 61 210010 52 692 9
release.linux-x86_64-cp_minimal/jerry 132993 0 132442 0 551 0
release.linux-x86_64-mem_stats/jerry 280071 61 278549 52 1522 9
release.linux-x86_64-mem_stress_test/jerry 279031 61 277469 52 1562 9
release.mcu_stm32f3-cp.bin/jerry 142748 72 140232 72 2516 0
release.mcu_stm32f3-cp_minimal.bin/jerry 82436 20 80912 20 1524 0
release.mcu_stm32f4-cp.bin/jerry 142748 72 140232 72 2516 0
release.mcu_stm32f4-cp_minimal.bin/jerry 82436 20 80912 20 1524 0
debug.linux-armv7lhf/jerry 734016 60 733252 60 764 0
debug.linux-armv7lhf-cp/jerry 537776 60 537020 60 756 0
debug.linux-armv7lhf-cp_minimal/jerry 368272 24 366996 24 1276 0
debug.linux-armv7lhf-mem_stats/jerry 737516 60 736660 60 856 0
debug.linux-armv7lhf-mem_stress_test/jerry 734064 60 733300 60 764 0
debug.linux-x86_64/jerry 817371 68 816185 68 1186 0
debug.linux-x86_64-cp/jerry 635223 68 633962 68 1261 0
debug.linux-x86_64-cp_minimal/jerry 443502 32 442300 32 1202 0
debug.linux-x86_64-mem_stats/jerry 821176 68 820710 68 466 0
debug.linux-x86_64-mem_stress_test/jerry 817391 68 816205 68 1186 0
debug.mcu_stm32f4-cp.bin/jerry 547212 76 546324 76 888 0
debug.mcu_stm32f4-cp_minimal.bin/jerry 367396 44 366500 44 896 0

Performance on x86_64/linux also improved slightly, especially on the longer running test cases (crypto-md5.js and crypto-sha1.js).

                  Benchmark |                   Perf(+ is better)
                 3d-cube.js |   0.259s ->   0.262s :  -1.093%  (+-1.765%) : [≈]
     access-binary-trees.js |   0.137s ->   0.132s :  +3.519%  (+-1.882%) : [+]
         access-fannkuch.js |   0.719s ->   0.717s :  +0.255%  (+-1.278%) : [≈]
            access-nbody.js |   0.316s ->   0.307s :  +2.847%  (+-0.907%) : [+]
bitops-3bit-bits-in-byte.js |   0.205s ->   0.203s :  +0.893%  (+-2.276%) : [≈]
     bitops-bits-in-byte.js |   0.262s ->   0.269s :  -2.610%  (+-2.358%) : [-]
      bitops-bitwise-and.js |   0.308s ->   0.296s :  +3.794%  (+-1.580%) : [+]
   controlflow-recursive.js |   0.090s ->   0.090s :  -0.557%  (+-7.179%) : [≈]
              crypto-aes.js |   0.508s ->   0.494s :  +2.854%  (+-1.115%) : [+]
              crypto-md5.js |   3.048s ->   2.736s : +10.226%  (+-0.374%) : [+]
             crypto-sha1.js |   1.351s ->   1.213s : +10.178%  (+-0.343%) : [+]
       date-format-tofte.js |   0.214s ->   0.208s :  +2.802%  (+-1.420%) : [+]
       date-format-xparb.js |   0.100s ->   0.097s :  +2.663%  (+-4.337%) : [≈]
             math-cordic.js |   0.293s ->   0.293s :  -0.057%  (+-2.025%) : [≈]
       math-partial-sums.js |   0.160s ->   0.159s :  +1.143%  (+-3.549%) : [≈]
      math-spectral-norm.js |   0.148s ->   0.148s :  +0.337%  (+-2.330%) : [≈]
            string-fasta.js |   0.449s ->   0.430s :  +4.306%  (+-1.561%) : [+]
            Geometric mean: |                 Speed up: 2.501% (+-0.650%) : [+]

* @param width width of the bit-field to be extracted
* @return bit-field's value
*/
#define jrt_extract_bit_field(container, lsb, width) \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use uppercase for name of macros
#define JRT_EXTRACT_BIT_FIELD(container, lsb, width) \

* @return bit-field's value
*/
uint64_t __attr_const___
jrt_set_bit_field_value (uint64_t container, /**< container to insert bit-field to */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why could not make this inline (in release mode)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inline declaration in this C file can only have effect if we have LTO. Macros work whatever compiler/linker or setting we use.

Moreover, another issue with the current function-based approach is that it always casts everything to uint64_t, even if we would like to work with a different type. And then we cast the returned uint64_t again back to our original type. That cast is not something a compiler can optimize away.

@akosthekiss akosthekiss force-pushed the bit-field-macros branch 2 times, most recently from f099c4d to a7e8a97 Compare February 22, 2016 15:44
@akosthekiss
Copy link
Member Author

Rebased and pushed. (Also: found a solution to get rid of not-so-nice type casts that were present in previous versions of this patch.)

* @return bit-field's value
*/
#define JRT_EXTRACT_BIT_FIELD(container, lsb, width) \
(((container) >> lsb) & ((((__typeof__ (container)) 1) << (width)) - 1))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typeof is std c99?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now, that you mention... Interestingly, the build system does not complain about it. I see two ways of dealing with this:

  • leave it in the currently proposed form, or
  • let the macros have one more argument requiring the type of the container explicitly.

This first approach is a bit less standard-conforming (but seems to be supported by all compiilers out there, at least by clang for sure), while the second one needs more maintenance if the type of the first argument changes (anywhere where the macro is used).

Opinions?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer the second, since we want to maintain stdc99 support.

@akosthekiss
Copy link
Member Author

@zherczeg I've updated the patch as suggested. Does it look better?

* @return bit-field's value
*/
#define JRT_EXTRACT_BIT_FIELD(type, container, lsb, width) \
(((container) >> lsb) & ((((type) 1) << (width)) - 1))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why did you remove the following asserts?

JERRY_ASSERT (lsb < JERRY_BITSINBYTE * sizeof (uint64_t));
JERRY_ASSERT (width < JERRY_BITSINBYTE * sizeof (uint64_t));
JERRY_ASSERT ((lsb + width) <= JERRY_BITSINBYTE * sizeof (uint64_t));

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(((container) >> lsb) & ((((type) 1u) << (width)) - 1u))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had two reasons to remove the asserts:

  • I didn't know how to insert them in the macro (which is an expression only, not a block of statements), and
  • during the first experiments with the patch the compiler did emit warnings (which were turned to errors) when lsb and/or width was 'incompatible' with the type of container, and I've considered them as good replacements of the asserts.

As for the 1us: in the first case, it is certainly superfluous, since 1 is cast to type. In the second case, it might also be better to type-cast it instead of using the hard-coded 1u. IMHO.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the do { ... } while (0) trick could work here too. But if the compiler emit warnings, then it's Ok for me. The debug version of jerry runs really slow, so we should use assertions where they are necessary.

#define JRT_EXTRACT_BIT_FIELD(type, container, lsb, width) \
do \
{ \
  JERRY_ASSERT (lsb < JERRY_BITSINBYTE * sizeof (type)); \
  JERRY_ASSERT (width < JERRY_BITSINBYTE * sizeof (type)); \
  JERRY_ASSERT ((lsb + width) <= JERRY_BITSINBYTE * sizeof (type)); \
  (((container) >> lsb) & ((((type) 1) << (width)) - 1)) \
} while (0)

@LaszloLango
Copy link
Contributor

LGTM

@zherczeg
Copy link
Member

In general functions are easier to debug but I have no strong opinion, because I also use a lot of macros. I did not find the RPi2 results, am I missing something?

@bzsolt
Copy link
Member

bzsolt commented Feb 25, 2016

RPi2 results
master (76b37f3)

                  Benchmark |                  Perf(+ is better)
                 3d-cube.js |  2.627s ->   2.543s :  +3.173%  (+-3.506%) : [≈]
     access-binary-trees.js |  1.407s ->   1.370s :  +2.607%  (+-2.291%) : [+]
         access-fannkuch.js |  7.960s ->   7.967s :  -0.084%  (+-2.530%) : [≈]
            access-nbody.js |  3.070s ->   3.077s :  -0.217%  (+-1.078%) : [≈]
bitops-3bit-bits-in-byte.js |  1.730s ->   1.723s :  +0.386%  (+-1.912%) : [≈]
     bitops-bits-in-byte.js |  2.653s ->   2.603s :  +1.884%  (+-3.494%) : [≈]
      bitops-bitwise-and.js |  3.680s ->   3.663s :  +0.453%  (+-3.587%) : [≈]
   controlflow-recursive.js |  0.890s ->   0.870s :  +2.247%  (+-0.000%) : [+]
              crypto-aes.js |  4.730s ->   4.643s :  +1.832%  (+-2.769%) : [≈]
              crypto-md5.js | 21.490s ->  21.847s :  -1.660%  (+-0.982%) : [-]
             crypto-sha1.js |  9.750s ->   9.817s :  -0.684%  (+-3.406%) : [≈]
       date-format-tofte.js |  2.233s ->   2.223s :  +0.448%  (+-2.090%) : [≈]
       date-format-xparb.js |  1.087s ->   1.053s :  +3.068%  (+-4.240%) : [≈]
             math-cordic.js |  2.977s ->   2.943s :  +1.120%  (+-2.463%) : [≈]
       math-partial-sums.js |  1.660s ->   1.670s :  -0.602%  (+-0.000%) : [-]
      math-spectral-norm.js |  1.503s ->   1.457s :  +3.104%  (+-3.064%) : [+]
            string-fasta.js |  4.537s ->   4.467s :  +1.543%  (+-3.251%) : [≈]
            Geometric mean: |                Speed up: 1.106% (+-0.661%) : [+]

Binary sizes (bytes): 197,240 -> 193,144

@zherczeg
Copy link
Member

LGTM

JerryScript-DCO-1.0-Signed-off-by: Akos Kiss akiss@inf.u-szeged.hu
@akosthekiss akosthekiss merged commit c9f5950 into jerryscript-project:master Feb 29, 2016
@akosthekiss akosthekiss deleted the bit-field-macros branch February 29, 2016 12:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binary size Affects binary size enhancement An improvement performance Affects performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants