-
Notifications
You must be signed in to change notification settings - Fork 683
Use CESU-8 lookup table to improve lit_get_unicode_char_size_by_utf8_… #793
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -757,6 +757,15 @@ lit_utf8_string_code_unit_at (const lit_utf8_byte_t *utf8_buf_p, /**< utf-8 stri | |||
return code_unit; | |||
} /* lit_utf8_string_code_unit_at */ | |||
|
|||
/* CESU-8 number of bytes occupied lookup table */ | |||
const lit_utf8_size_t table[] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems OK, but I would reduce the table size to uint8 from uint32 to reduce the table size from 64 to 16 bytes (the values are small anyway). Also the table should be aligned to 16 bytes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zherczeg, why do we need the table alignment?
JerryScript-DCO-1.0-Signed-off-by: Xin Hu Xin.A.Hu@intel.com Run ./tools/run-perf-test.sh 10 times,
Tue Dec 29 12:23:25 EST 2015 To replace uint32 with uint8, align to 16 bytes. Done. |
…first_byte performance JerryScript-DCO-1.0-Signed-off-by: Xin Hu Xin.A.Hu@intel.com
(first_byte >> 4) == 3 || (first_byte >> 4) == 4 || | ||
(first_byte >> 4) == 5 || (first_byte >> 4) == 6 || | ||
(first_byte >> 4) == 7 || (first_byte >> 4) == 12 || | ||
(first_byte >> 4) == 13 || (first_byte >> 4) == 14)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no objection against big asserts, but in this case wouldn't be better to use (first_byte >> 4) <= 7 ? This is just a note, so if others prefer the current one, it is ok for me.
Wed Jan 21 23:57:18 UTC 1970
|
@@ -757,6 +757,15 @@ lit_utf8_string_code_unit_at (const lit_utf8_byte_t *utf8_buf_p, /**< utf-8 stri | |||
return code_unit; | |||
} /* lit_utf8_string_code_unit_at */ | |||
|
|||
/* CESU-8 number of bytes occupied lookup table */ | |||
const __attribute__ ((aligned (CESU_8_TABLE_MEM_ALIGNMENT))) lit_utf8_byte_t table[] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need the alignment attribute?
I think this patch has a new version without table. Anyway, alignment improves cache locality (ensures the table consumes only one page). |
@zherczeg, I see. Thanks for explanation. Then its better to explain the reason in comments. |
Let's close this PR. |
JerryScript-DCO-1.0-Signed-off-by: Xin Hu Xin.A.Hu@intel.com
Run ./tools/run-perf-test.sh 10 times,
OS, ubuntu 15.04, 32 bit
CPU, Intel(R) Celeron(R) CPU N2820 @ 2.13GHz, 2 core
(+ is better)
(+ is better)
Tue Dec 29 06:36:58 EST 2015
In lit_get_unicode_char_size_by_utf8_first_byte(), use a lookup table to get the cesu-8 length.
Then there is no if/else branch in lit_get_unicode_char_size_by_utf8_first_byte(),
and it gets chance to be inlined.
This lookup table is pretty small, which should not affect memory a lot.