Skip to content

Conversation

@alexdowad
Copy link
Contributor

}
zend_string *ret = php_mb_convert_encoding_ex(Z_STRVAL_P(var), Z_STRLEN_P(var), to_encoding, from_encoding);
zval_ptr_dtor(orig_var);
ZVAL_STR(orig_var, ret);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where I would like to ask someone who knows PHP internals well... am I handling the refcounts correctly?

mb_convert_variables replaces strings in an array or object. After converting the old zend_string to the new one, I am using zval_ptr_dtor to drop the reference to the old one, then ZVAL_STR to make the zval point to the new string.

I don't think ZVAL_STR modifies the refcount, and when the new zend_string is returned from php_mb_convert_encoding_ex, it should start with a refcount of 1, so hopefully we are OK here... 😕

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to make sense, but you can always run some tests with Valgrind or ASAN to see if there is any issue. But been staring at this for a bit and it seems to check out

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that's a good idea.

// JIS string (BASE64 encoded)
$jis = base64_decode('GyRCRnxLXDhsJUYlLSU5JUgkRyQ5ISMbKEIwMTIzNBskQiM1IzYjNyM4IzkhIxsoQg==');
// EUC-JP string
$euc_jp = '日本語テキストです。0123456789。';
Copy link
Contributor Author

@alexdowad alexdowad Nov 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, interesting to see that GitHub is able to recognize this EUC-JP encoded string and display it correctly here...

On the other side, I've converted it to UTF-8 so that it will (hopefully) appear correctly in most text editors.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it is preferable to have *.phpt files encoded as UTF-8 whenever possible.

Copy link
Member

@cmb69 cmb69 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Looks like a nice simplification, too.

// JIS string (BASE64 encoded)
$jis = base64_decode('GyRCRnxLXDhsJUYlLSU5JUgkRyQ5ISMbKEIwMTIzNBskQiM1IzYjNyM4IzkhIxsoQg==');
// EUC-JP string
$euc_jp = '日本語テキストです。0123456789。';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it is preferable to have *.phpt files encoded as UTF-8 whenever possible.

@alexdowad
Copy link
Contributor Author

This was already merged.

@alexdowad alexdowad closed this Dec 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants