Skip to content

Compact Byte Code parser and executor for Jerry (release branch). #813

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

zherczeg
Copy link
Member

Compact Byte Code parser and executor for Jerry.

JerryScript-DCO-1.0-Signed-off-by: Zoltan Herczeg zherczeg@inf.u-szeged.hu
JerryScript-DCO-1.0-Signed-off-by: László Langó llango.u-szeged@partner.samsung.com
JerryScript-DCO-1.0-Signed-off-by: Tamas Gergely tgergely.u-szeged@partner.samsung.com
JerryScript-DCO-1.0-Signed-off-by: Zsolt Borbély zsborbely.u-szeged@partner.samsung.com
JerryScript-DCO-1.0-Signed-off-by: Roland Takacs rtakacs.u-szeged@partner.samsung.com
JerryScript-DCO-1.0-Signed-off-by: István Kádár ikadar@inf.u-szeged.hu

@zherczeg
Copy link
Member Author

Hi all,

this pull request contains the release branch of a Compact Byte Code (CBC) parser and executor of Jerry. The short summary is we achieved progress on all aspects of Jerry, including memory consumption, performance and binary size.

The bad news is that the patch is huge, but the good news we don't need to fix everything before landing it. It is enough to focus on the style issues. The rest can be done in followup patches. The biggest problem is loosing the change log after squashing, and this makes tracking bugs quite difficult.

In the followings we show the results of the CBC work.

Full Test262 conformance in debug mode with 30s timeout (in debug some tests runs several minutes, they would pass, but excluded now).

=== Summary ===

  • Ran 11725 tests
  • Passed 11332 tests (96.6%)
  • Failed 393 tests (3.4%)

This shows that the CBC engine is stable, and there is no risk of moving to CBC. Of course the remaining issues will be fixed in (much) smaller commits.

Binary size:

  • Current master binary size: 202436 bytes
  • CBC binary size: 183648 bytes
  • Gain: 9.2%

Parsing results of the concatenation of IoT.js on Raspberry Pi 2:

Current master:

  • Heap stats:
    Heap size = 260096 bytes
    Chunk size = 64 bytes
    Allocated chunks count = 0
    Allocated = 0 bytes
    Waste = 0 bytes
    Peak allocated chunks count = 1322
    Peak allocated = 76540 bytes
    Peak waste = 10436 bytes
  • Runtime: 0.84 sec

CBC Parser:

Heap stats:
Heap size = 260096 bytes
Chunk size = 64 bytes
Allocated chunks count = 0
Allocated = 0 bytes
Waste = 0 bytes
Peak allocated chunks count = 793
Peak allocated = 41337 bytes
Peak waste = 9451 bytes

Note: Total byte code size is 12650 bytes

  • Runtime: 0.14 sec

Gain:

  • Memory: 46% memory reduction
  • Runtime: 6 times faster

SunSpider results:

Benchmark RSS
(+ is better)
Perf
(+ is better)
3d-cube.js 112-> 88 (21.429) 3.166-> 2.771 (12.476)
access-binary-trees.js 84-> 80 (4.762) 1.854-> 1.584 (14.563)
access-fannkuch.js 40-> 36 (10.000) 9.169-> 7.946 (13.338)
access-nbody.js 48-> 40 (16.667) 4.229-> 3.26 (22.913)
bitops-3bit-bits-in-byte.js 32-> 24 (25.000) 2.326-> 1.906 (18.057)
bitops-bits-in-byte.js 32-> 24 (25.000) 3.117-> 2.871 (7.892)
bitops-bitwise-and.js 32-> 24 (25.000) 3.575-> 3.971 (-11.077)
controlflow-recursive.js 220-> 136 (38.182) 1.589-> 1.052 (33.795)
crypto-aes.js 128-> 116 (9.375) 5.248-> 4.797 (8.594)
crypto-md5.js 184-> 180 (2.174) 22.17->21.632 (2.427)
crypto-sha1.js 132-> 124 (6.061) 10.417-> 9.721 (6.681)
date-format-tofte.js 72-> 60 (16.667) 3.342-> 2.334 (30.162)
date-format-xparb.js 72-> 60 (16.667) 1.749-> 1.235 (29.388)
math-cordic.js 40-> 28 (30.000) 3.349-> 3.17 (5.345)
math-partial-sums.js 32-> 28 (12.500) 1.999-> 1.963 (1.801)
math-spectral-norm.js 40-> 32 (20.000) 2.158-> 1.616 (25.116)
string-fasta.js 48-> 44 (8.333) 5.596-> 4.613 (17.566)
Geometric mean: RSS reduction: 17.4965% Speed up: 14.8297%

Only the runtime of one test is decreased, but everything else is improved. There are several tests with 25+ % memory or performance improvement. We did not focus on SunSpider results, it seems this is the effect of using a simpler design.

I think we achieved more than we initially expected. The key advantage of Compct Byte Code is combining simplicity and efficiency, and this is proved by reducing both memory and binary size, and improved performance.

Please review the patch and help us with feedback. Lets land this work soon and continue the development on the master branch.

@seanshpark
Copy link
Contributor

👍

@lemmaa
Copy link
Contributor

lemmaa commented Jan 21, 2016

Great! hope to be landing well shortly! :)

@zherczeg
Copy link
Member Author

@huxinx if you have some time, could you measure this patch on intel?

@huxinx
Copy link
Contributor

huxinx commented Jan 22, 2016

There are build errors on X86. I just did a rough fix to make compiling pass.

cbc_integration_dev_release...huxinx:cbc_integration_dev_release

Run ./tools/run-perf-test.sh 10 times,
OS, ubuntu 15.04, 32 bit
CPU, Intel(R) Celeron(R) CPU N2820 @ 2.13GHz, 2 core

Benchmark RSS
(+ is better)
Perf
(+ is better)
3d-cube.js 120-> 84 (30.000) 1.06978->0.883556 (17.408)
access-binary-trees.js 88-> 76 (13.636) 0.652889->0.508889 (22.056)
access-fannkuch.js 44-> 32 (27.273) 3.45689->2.89156 (16.354)
bitops-3bit-bits-in-byte.js 36-> 24 (33.333) 0.901778-> 0.672 (25.481)
bitops-bits-in-byte.js 32-> 24 (25.000) 1.17689->0.997333 (15.257)
bitops-bitwise-and.js 36-> 28 (22.222) 1.22222->1.27111 (-4.000)
bitops-nsieve-bits.js 160-> 152 (5.000) 10.0116-> 10.12 (-1.083)
controlflow-recursive.js 244-> 140 (42.623) 0.553333-> 0.324 (41.446)
crypto-aes.js 136-> 116 (14.706) 2.07067->1.82933 (11.655)
crypto-md5.js 196-> 172 (12.245) 10.592->10.4169 (1.653)
crypto-sha1.js 136-> 128 (5.882) 4.80133->4.60444 (4.101)
date-format-tofte.js 80-> 60 (25.000) 1.18978->0.812889 (31.677)
date-format-xparb.js 76-> 64 (15.790) 0.636889->0.420889 (33.915)
math-cordic.js 44-> 32 (27.273) 1.28533-> 1.128 (12.240)
math-spectral-norm.js 44-> 36 (18.182) 0.788889->0.535556 (32.113)
string-base64.js 168-> 160 (4.762) 71.8538->73.3244 (-2.047)
Geometric mean: RSS reduction: 20.8973% Speed up: 17.2996%

Mon Jan 25 07:47:57 EST 2016

Binary size:
Current master binary size: 292k
CBC binary size: 272k

@zherczeg
Copy link
Member Author

The numbers are too similar in the two columns. Are you sure you run the two different binaries?

@zherczeg
Copy link
Member Author

And please measure them on the edison board.

@jiangzidong
Copy link
Contributor

on Edison Board

Benchmark RSS
(+ is better)
Perf
(+ is better)
3d-cube.js 112-> 76 (32.143) 5.16->4.18333 (18.928)
access-binary-trees.js 84-> 64 (23.810) 3.13667->2.38333 (24.017)
access-fannkuch.js 40-> 20 (50.000) 16.5833->13.5733 (18.151)
bitops-3bit-bits-in-byte.js 28-> 16 (42.857) 4.3->3.23333 (24.806)
bitops-bits-in-byte.js 28-> 16 (42.857) 5.73333->4.81333 (16.046)
bitops-bitwise-and.js 32-> 16 (50.000) 5.94667->6.14667 (-3.363)
controlflow-recursive.js 240-> 132 (45.000) 2.71667-> 1.5 (44.785)
crypto-aes.js 128-> 108 (15.625) 10.0167-> 8.51 (15.042)
crypto-md5.js 188-> 168 (10.638) 50.9033->50.0267 (1.722)
crypto-sha1.js 132-> 116 (12.121) 23.06->22.0933 (4.192)
date-format-tofte.js 76-> 52 (31.579) 5.73333->3.70333 (35.407)
date-format-xparb.js 72-> 56 (22.222) 3.05667-> 1.98 (35.224)
math-cordic.js 36-> 24 (33.333) 6.13667->5.30333 (13.580)
math-spectral-norm.js 40-> 24 (40.000) 3.77667->2.54667 (32.568)
string-fasta.js
Geometric mean: RSS reduction: 33.5484% Speed up: 21.2257%
  • the original run-perf-test cannot run on edison, because 1) the sleep in busybox don't support float argument, I use usleep instead. 2) the timeout command is not included in the yoctoOS, So don't use the timeout
  • in Edison and my x86 host, the string-fasta.js failed after the patch

@huxinx
Copy link
Contributor

huxinx commented Jan 25, 2016

Update SunSpider results on NUC. @zherczeg

@zherczeg
Copy link
Member Author

Thank you very much!

@zherczeg
Copy link
Member Author

Regarding failures, this patch is quite big (1.3Mb at the moment), so it will likely cause breaks. We will fix these issues. Perhaps not everything in this huge patch, but in follow-up patches. Anyway, we just achieved 98.1% test262 conformance in debug mode, and preparing a new release patch soon.

@zherczeg zherczeg force-pushed the cbc_integration_dev_release branch from 75207b5 to d3c75d1 Compare January 25, 2016 09:59
@zherczeg
Copy link
Member Author

Release patch is done. Latest results on RPi2:

Benchmark RSS
(+ is better)
Perf
(+ is better)
3d-cube.js 108-> 88 (18.518) 4.339-> 3.8 (12.422)
access-binary-trees.js 84-> 76 (9.524) 2.543-> 2.189 (13.921)
access-fannkuch.js 40-> 32 (20.000) 12.024->10.302 (14.321)
access-nbody.js 44-> 36 (18.182) 5.164-> 4.348 (15.802)
bitops-3bit-bits-in-byte.js 36-> 24 (33.333) 2.714-> 2.437 (10.206)
bitops-bits-in-byte.js 36-> 24 (33.333) 3.649-> 3.136 (14.059)
bitops-bitwise-and.js 36-> 16 (55.556) 4.503-> 4.409 (2.087)
controlflow-recursive.js 228-> 140 (38.596) 1.76-> 1.233 (29.943)
crypto-aes.js 132-> 112 (15.152) 5.673-> 5.094 (10.206)
crypto-md5.js 192-> 176 (8.333) 26.151->26.401 (-0.956)
crypto-sha1.js 132-> 124 (6.061) 11.355->11.198 (1.383)
date-format-tofte.js 76-> 56 (26.316) 3.952-> 2.711 (31.402)
date-format-xparb.js 72-> 56 (22.222) 2.297-> 1.543 (32.825)
math-cordic.js 40-> 24 (40.000) 3.974-> 4.173 (-5.008)
math-partial-sums.js 36-> 28 (22.222) 2.602-> 2.511 (3.497)
math-spectral-norm.js 44-> 32 (27.273) 2.69-> 2.235 (16.915)
string-fasta.js 48-> 40 (16.667) 7.214-> 5.987 (17.009)
Geometric mean: RSS reduction: 25.357% Speed up: 13.6276%

@sand1k
Copy link
Contributor

sand1k commented Jan 26, 2016

@zherczeg, good job!

We have several questions:

  1. Is snapshot supported by this patch? This is an important feature, which should be working.
  2. Could you please update the PR, so that original jerry's bytecode would be available? I think we should add a build-time flag, indicating which bytecode to use.

@zherczeg
Copy link
Member Author

Hi,

  1. The snapshot support is not mandatory feature, so plan to add it later.
  2. we are considering this feature.

@akosthekiss
Copy link
Member

I'd argue against keeping both implementations in the code base. That would give only a little advantage over keeping the two implementations in branches/forks but will be a hell to maintain, and will certainly not point toward a best unified solution. I'm much concerned that the two byte code approaches will keep living their own more and more divergent life.

@zherczeg zherczeg closed this Jan 27, 2016
@zherczeg zherczeg force-pushed the cbc_integration_dev_release branch from d3c75d1 to fec49f5 Compare January 27, 2016 07:21
@zherczeg
Copy link
Member Author

Sry, accidentally closing this.

@zherczeg zherczeg reopened this Jan 27, 2016
@tilmannOSG
Copy link

I strongly agree with @akiss77 and share his concerns. I don't think it makes sense to have two different byte code implementations in the tree.

@sand1k @egavrin @ruben-ayrapetyan In the last IoT.js/JerryScript workshop all parties agreed that if the compact byte code turns out to be a significant improvement it will replace the existing implementation. Those goals have been met so we should proceed accordingly.

Making both byte code implementations coexist is a significant amount of effort with no perceived gain. This effort is better spent on improving JerryScript.

The snapshot support is an important feature but it should not block the merge of the compact byte code. The current patch is big enough as it is and as there are no (known) use cases where the lack of snapshotting causes a regression (the overall benefit of snapshotting is also reduced due to the increased efficiency of the compact byte code implementation).

@lemmaa , you seem to be very keen on getting the compact byte code merged, what is your perspective on this?

@lemmaa
Copy link
Contributor

lemmaa commented Jan 29, 2016

@tilmannOSG I really want to land this patch ASAP. At the same time, I also think that we would better to keep both implementations, at least for a while. Because we need to have enough time to investigate CBC implementations from every perspective. So, how about keep both implementations until next workshop and do investigation and improvement? One big problem in this approach is I can't find what is efficient way to keep the both. Build time configuration looks too complex to make patch, and keep in separate branch also in maintenance.

From the other point of view, I think we need to consider making followings completed before landing.

  • same Test-262 pass rate to current BC.
  • snapshot support.
  • compact profile support (I can't sure is it already considered or not)

As for snapshot, CBC reduces footprint of byte code very big, but main purpose of snapshot is not reducing the footprint of byte code but moving byte code storage to FLASH from RAM. So It need to be implemented even CBC landed. One more, the effect of snapshot is much higher in large framework codes, like IoT.js, than small benchmark code. Yes, IoT.js is actual use case of snapshot. :)

@zherczeg, @egavrin Do you have good idea to keep both implementations?

@akosthekiss
Copy link
Member

@lemmaa AFAIK

  • test262 pass rate is better with CBC (with no regressions);
  • snapshot support is already implemented in the development branch (and this PR will get updated to incorporate that enhancement - together with various others); and
  • the same holds for compact profile support.

@zherczeg can confirm the above (and give an estimation when the next PR update from the dev branch is planned to happen)

@zherczeg
Copy link
Member Author

tests-262, jerry-release:

  • current Jerry master: 20 failures (excluding intl)
  • current cbc dev branch: 12 failures (excluding intl)

The 12 failures are also part of the 20, so new failures introduced.

  • We finished the snapshot support yesterday. Today we run SunSpider to check whether it works. It worked.
  • We also added compact profile because it is tested by "make precommit".

The difficulty for having multiple implementations in one trunk:

  • Header names (same name: conflict, different name: duplicates includes)
  • Testing (maintenance)
  • New features requires double implementation

I fully understand the risk of moving to a new implementation. The question is whether all my tools will work? Please try them and we will fix the issues you encounter in a short time. I think maintaining two implementations is much more resource consuming than fixing the issues. We can also check out the current byte code from git if we need it in the future.

For example, we have already been aware that one line needs to be changed in IoT.js for cbc: there is no need to include the zero terminator for strings passed to jerry_eval (zero is not a valid JavaScript character, and CBC correctly throws a syntax error). After that change, all IoT.js tests are passed. But you might encounter other issues as well. Perhaps we should open an IoT.js branch for cbc, and collect such fixes there.

@zherczeg zherczeg force-pushed the cbc_integration_dev_release branch from a6977f3 to 3d8075c Compare February 1, 2016 07:43
@zherczeg
Copy link
Member Author

zherczeg commented Feb 1, 2016

New commit with snapshot support. "make precommit" runs cleanly. Please try it and let us know if something does not work.

@tilmannOSG
Copy link

Yes, as mentioned before, having two byte code implementations in the tree is an absolute maintenance nightmare. This is not a practical way to move forward. JerryScript is a small project and we need to be careful to spend our limited resources on efforts which add significant value to the project.

As @zherczeg said, there's always a risk in moving to a new implementation. However, our current results indicate that the compact byte code implementation is a strict improvement both in correctness and performance. In addition, the authors of this patch already have a track record of addressing any identified issues quickly. Moving to the new byte code implementation is also fully transparent to the users.

This pull request has been open for one and a half weeks now and to my knowledge there was not a single issue reported.

Everyone who has concerns about this patch please test it with your code and report any issues you found.

@huxinx If you could test the new implementation with your own use cases we would highly appreciate any feedback :)

I propose that we allow for another week of testing and assuming no blocking issues are found we proceed with the merge.

@lemmaa
Copy link
Contributor

lemmaa commented Feb 1, 2016

@akiss77 @zherczeg Thanks for explanation, it looks like all issues cleared for me. :) Today, @wateret added cbc_integration_dev to daily benchmark graph of http://samsung.github.io/jerryscript/benchmark/benchmark.html?show=sum, for reference.

@egavrin We need your opinion about this, please.

@zherczeg
Copy link
Member Author

zherczeg commented Feb 1, 2016

Thank you @lemmaa. Progress is visible on those charts as well.

@zherczeg zherczeg force-pushed the cbc_integration_dev_release branch from 3d8075c to 5f45684 Compare February 1, 2016 12:12
@zherczeg
Copy link
Member Author

zherczeg commented Feb 1, 2016

We tried IoT.js with snapshot. It works after the following two changes:

  • Land Move jerry_port functions into jerry-core #824 (IoT.js does not compile with the latest Jerry).
  • In iotjs_module_process.cpp, the +1 after 'String code("", len1 + len2 + len3 + 1);' needs to be removed. The String automatically allocates memory for the terminator zero value.

Both changes are unrelated to CBC.

@huxinx
Copy link
Contributor

huxinx commented Feb 2, 2016

Latest results on NUC.

OS, ubuntu 15.04, 32 bit
CPU, Intel(R) Celeron(R) CPU N2820 @ 2.13GHz, 2 core

Benchmark RSS
(+ is better)
Perf
(+ is better)
3d-cube.js 120-> 92 (23.333) 1.06933->0.878222 (17.872)
access-binary-trees.js 88-> 76 (13.636) 0.653333->0.500889 (23.333)
access-fannkuch.js 44-> 36 (18.182) 3.46133->2.87778 (16.859)
bitops-3bit-bits-in-byte.js 32-> 28 (12.500) 0.901333->0.651111 (27.761)
bitops-bits-in-byte.js 36-> 28 (22.222) 1.17778->0.967111 (17.887)
bitops-bitwise-and.js 36-> 20 (44.444) 1.22->1.29556 (-6.193)
bitops-nsieve-bits.js 156-> 148 (5.128) 10.0191->10.1031 (-0.838)
controlflow-recursive.js 244-> 140 (42.623) 0.552->0.295556 (46.457)
crypto-aes.js 132-> 116 (12.121) 2.06667->1.83022 (11.441)
crypto-md5.js 192-> 180 (6.250) 10.5973->10.4102 (1.765)
crypto-sha1.js 136-> 132 (2.941) 4.80444->4.59733 (4.311)
date-format-tofte.js 80-> 68 (15.000) 1.18844->0.796889 (32.947)
date-format-xparb.js 80-> 60 (25.000) 0.637333->0.419556 (34.170)
math-cordic.js 40-> 28 (30.000) 1.28578->1.08844 (15.348)
math-spectral-norm.js 44-> 32 (27.273) 0.790667-> 0.52 (34.233)
string-fasta.js 56-> 44 (21.429) 2.15067-> 1.804 (16.119)
Geometric mean: RSS reduction: 21.0608% Speed up: 19.5892%

Tue Feb 2 06:29:51 EST 2016

@zherczeg
Copy link
Member Author

zherczeg commented Feb 2, 2016

Thank you for the measurement. I suppose that means the compilation issues you encountered before were fixed.

I hope @sand1k @egavrin @ruben-ayrapetyan give their positive feedback soon and we can land this four month of work. I am not aware of anything else that blocks the landing process.

@zherczeg
Copy link
Member Author

zherczeg commented Feb 4, 2016

Since no serious issues has been reported, we would like to land this patch tomorrow. If anybody encounters any bugs please report them and we will fix them. Please report only the bugs. I am aware that in a 40000 lines of patch there is a lot of opportunity to refactor certain parts, but that would delay the landing of this major advancement. Of course we can (should!) refactor those parts in smaller patches later.

JerryScript-DCO-1.0-Signed-off-by: László Langó llango.u-szeged@partner.samsung.com
JerryScript-DCO-1.0-Signed-off-by: Tamas Gergely tgergely.u-szeged@partner.samsung.com
JerryScript-DCO-1.0-Signed-off-by: Zsolt Borbély zsborbely.u-szeged@partner.samsung.com
JerryScript-DCO-1.0-Signed-off-by: Roland Takacs rtakacs.u-szeged@partner.samsung.com
JerryScript-DCO-1.0-Signed-off-by: István Kádár ikadar@inf.u-szeged.hu
JerryScript-DCO-1.0-Signed-off-by: Zoltan Herczeg zherczeg.u-szeged@partner.samsung.com
@zherczeg zherczeg force-pushed the cbc_integration_dev_release branch from 5f45684 to 876bd62 Compare February 5, 2016 09:14
@zherczeg
Copy link
Member Author

zherczeg commented Feb 5, 2016

Landed in 4d2dd22
Thank you for all help and feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants