Skip to content

Global object unescape routine #606

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 11, 2015

Conversation

szledan
Copy link
Contributor

@szledan szledan commented Aug 28, 2015

JerryScript-DCO-1.0-Signed-off-by: Szilard Ledan szledan.u-szeged@partner.samsung.com


/* 3. */
MEM_DEFINE_LOCAL_ARRAY (input_start_p, input_size, lit_utf8_byte_t);
ecma_string_to_utf8_string (input_string_p, input_start_p, (ssize_t) (input_size));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you, please, add assertion that the return value is not negative, as in 7064a2c?

@ruben-ayrapetyan ruben-ayrapetyan added normal ecma builtins Related to ECMA built-in routines development Feature implementation labels Aug 28, 2015
@ruben-ayrapetyan ruben-ayrapetyan added this to the ECMA builtins milestone Aug 28, 2015
@szledan
Copy link
Contributor Author

szledan commented Aug 28, 2015

I've updated the patch

/* 3. */
MEM_DEFINE_LOCAL_ARRAY (input_start_p, input_size, lit_utf8_byte_t);
ssize_t sz = ecma_string_to_utf8_string (input_string_p, input_start_p, (ssize_t) (input_size));
JERRY_ASSERT (sz > 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The string's size can be zero.

@zherczeg
Copy link
Member

Please try unescape("%ud800\udc00") === "\ud800\udc00"


lit_utf8_size_t lit_size = lit_code_point_to_utf8 (code_point, output_char_p);
output_char_p += lit_size;
output_length += lit_size;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems that the output_length is synchronized with output_char_p. In the case, we could remove output_length and calculate it as output_char_p - input_start_p.

lit_utf8_size_t input_size = ecma_string_get_size (input_string_p);

/* 3. */
MEM_DEFINE_LOCAL_ARRAY (input_start_p, input_size, lit_utf8_byte_t);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment why input_size >= output_size, and an assert at the end. I don't think this is trivial.

E.g. %xx is 3 byte long, and the maximum is 0xff, which encoded as 2 bytes in UTF8. Etc.

@szledan
Copy link
Contributor Author

szledan commented Sep 7, 2015

I've updated the patch.
Could you check it please?

/* The length of input string is always greater than output string
* so we re-use the input string buffer.
* E.g. %xx is 3 byte long, and the maximum is 0xff, which encoded
* as 2 bytes in UTF8. Etc. */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just suggested an text idea. I think this would sound better:

The %xx is three byte long, and the maximum encoded value is 0xff, which maximum encoded length is two byte. Similar to this, the maximum encoded length of %uxxxx is four byte.

@zherczeg
Copy link
Member

zherczeg commented Sep 7, 2015

One minor thing and LGTM after that.

@szledan szledan force-pushed the global-unescape branch 2 times, most recently from f0f0e4a to b2ede9d Compare September 9, 2015 08:06
@galpeter
Copy link
Contributor

galpeter commented Oct 2, 2015

@zherczeg , do we you have other any comments?

@zherczeg
Copy link
Member

zherczeg commented Oct 2, 2015

This patch conflits with CESU8. I would wait to land that first, update this, and land it.

@egavrin
Copy link
Contributor

egavrin commented Oct 20, 2015

@zherczeg CESU8 landed

@szledan szledan force-pushed the global-unescape branch 2 times, most recently from 664e4de to 8d6dabe Compare November 2, 2015 12:54
@szledan
Copy link
Contributor Author

szledan commented Nov 2, 2015

I've updated the patch.
@dbatyai , could you check it please?

@egavrin egavrin assigned dbatyai and unassigned szledan Nov 2, 2015
* 8 found valid '%uwxyz' pattern
*/
uint8_t status = 0;
lit_code_point_t hex_digits = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's enough to use an ecma_char_t for hex_digits, the decoded value will always be less than 0xFFFF.

@dbatyai
Copy link
Member

dbatyai commented Nov 6, 2015

LGTM

@dbatyai dbatyai assigned egavrin and unassigned dbatyai Nov 6, 2015
@egavrin
Copy link
Contributor

egavrin commented Nov 9, 2015

make push

@egavrin egavrin assigned zherczeg and unassigned egavrin Nov 9, 2015
@zherczeg
Copy link
Member

LGTM

@egavrin egavrin assigned szledan and unassigned zherczeg Nov 10, 2015
JerryScript-DCO-1.0-Signed-off-by: Szilard Ledan szledan.u-szeged@partner.samsung.com
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
development Feature implementation ecma builtins Related to ECMA built-in routines normal
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants