Skip to content

Add functions for iterating utf-8 strings #293

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 3, 2015

Conversation

sand1k
Copy link
Contributor

@sand1k sand1k commented Jul 1, 2015

The functions provide the following:

  • Iterating forward: lit_utf8_iterator_read_next, lit_utf8_iterator_read_next_and_incr, lit_utf8_iterator_incr.
  • Iterating backward: lit_utf8_iterator_read_prev, lit_utf8_iterator_read_prev_and_decr, lit_utf8_iterator_decr.
  • Saving and restoring position: lit_utf8_iterator_get_pos, lit_utf8_iterator_restore_pos.
  • Retrieving index of code unit at current iterator position: lit_utf8_iterator_get_index.
  • Determining begin/end of a string: lit_utf8_iterator_is_eos, lit_utf8_iterator_is_bos.
  • Setting iterator to begin/end of a string: lit_utf8_iterator_set_to_bos, lit_utf8_iterator_set_to_eos.

@sand1k sand1k added ecma core Related to core ECMA functionality development Feature implementation labels Jul 1, 2015
@@ -25,15 +25,81 @@
#define LIT_BYTE_NULL (0)

/**
* For the formal definition of Unicode transformation formats (UTF) see Section 3.9, Unicode Encoding Forms in The
* Unicode Standard (http://www.unicode.org/versions/Unicode7.0.0/ch03.pdf#G7404, tables 3-6, 3-7).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you use Unicode 7.0? Since @ruben-ayrapetyan used data from Unicode 3.0 tables, should we regenerate them for 7.0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"ECMAScript source text is represented as a sequence of characters in the Unicode character encoding, version 3.0 or later."
Definitions in this file are the same for both Unicode 3.0 and 7.0.
We can change this link to 3.0.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just add comment about it

@egavrin egavrin added this to the Core ECMA features milestone Jul 2, 2015
0,
buf_size,
{
0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial offset of iterator is 0.

@zherczeg
Copy link
Member

zherczeg commented Jul 2, 2015

I have some comments for the names. But anything is good for me.
+1 lgtm

@egavrin
Copy link
Contributor

egavrin commented Jul 2, 2015

@sand1k please update naming and make push

@sand1k sand1k force-pushed the Andrey-ut8-string-iterators branch from 9ad76ba to 4375ad0 Compare July 3, 2015 10:14
JerryScript-DCO-1.0-Signed-off-by: Andrey Shitov a.shitov@samsung.com
@sand1k sand1k force-pushed the Andrey-ut8-string-iterators branch from 4375ad0 to ae3eea8 Compare July 3, 2015 10:25
@sand1k sand1k merged commit ae3eea8 into master Jul 3, 2015
@sand1k
Copy link
Contributor Author

sand1k commented Jul 3, 2015

@egavrin, @zherczeg fixed and pushed.

@sand1k sand1k deleted the Andrey-ut8-string-iterators branch July 3, 2015 10:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
development Feature implementation ecma core Related to core ECMA functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants