Skip to content

Enable string conversion in EUC-JP. #1296

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jun 16, 2025

Conversation

YOCKOW
Copy link
Member

@YOCKOW YOCKOW commented May 16, 2025

Background: EUC-JP is not supported by OSS CoreFoundation, while it is supported by macOS Foundation Framework.
See #1016

This PR resolves the issue by calling ICU API if necessary.

@YOCKOW
Copy link
Member Author

YOCKOW commented May 16, 2025

@swift-ci Please test

@YOCKOW YOCKOW marked this pull request as ready for review May 16, 2025 09:25
@YOCKOW YOCKOW requested review from parkera and jmschonfeld May 16, 2025 09:26
@YOCKOW YOCKOW marked this pull request as draft May 20, 2025 03:09
YOCKOW added a commit to YOCKOW/swift-foundation that referenced this pull request May 20, 2025
@YOCKOW YOCKOW force-pushed the i-see-you-legacy-string-encodings branch from 7ae3f7f to b0a8981 Compare May 20, 2025 03:45
YOCKOW added a commit to YOCKOW/swift-foundation that referenced this pull request May 20, 2025
YOCKOW added a commit to YOCKOW/swift-foundation that referenced this pull request May 20, 2025
YOCKOW added a commit to YOCKOW/swift-foundation that referenced this pull request May 20, 2025
@YOCKOW
Copy link
Member Author

YOCKOW commented May 20, 2025

@swift-ci Please test

@YOCKOW
Copy link
Member Author

YOCKOW commented May 20, 2025

@parkera @jmschonfeld Thank you for reviewing.

In response to your feedback, I've made some changes:

  • Made it clear that ICU.StringConverter doesn't support UTF-* and US-ASCII.
  • Removed unnecessary nonisolated(unsafe) from the static property.
  • Added comments in func _icuMakeStringFromBytes_impl to clarify the safety of passing the pointer.
  • Let it delegate string conversion to ICU only if the encoding is EUC-JP and there's no FOUNDATION_FRAMEWORK (at this point).

@YOCKOW YOCKOW marked this pull request as ready for review May 20, 2025 08:16
@YOCKOW
Copy link
Member Author

YOCKOW commented May 27, 2025

Please let me request reviews again as per changes above.

@YOCKOW YOCKOW requested review from parkera and jmschonfeld May 27, 2025 02:19
Copy link
Contributor

@parkera parkera left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good to me, but also want @jmschonfeld to chime in.

Copy link
Contributor

@jmschonfeld jmschonfeld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for putting this together!

@jmschonfeld jmschonfeld self-requested a review June 11, 2025 20:07
YOCKOW added a commit to YOCKOW/swift-foundation that referenced this pull request Jun 12, 2025
YOCKOW added a commit to YOCKOW/swift-foundation that referenced this pull request Jun 12, 2025
YOCKOW added a commit to YOCKOW/swift-foundation that referenced this pull request Jun 12, 2025
YOCKOW added a commit to YOCKOW/swift-foundation that referenced this pull request Jun 12, 2025
YOCKOW added a commit to YOCKOW/swift-foundation that referenced this pull request Jun 12, 2025
@YOCKOW YOCKOW force-pushed the i-see-you-legacy-string-encodings branch from 96de131 to bc1656d Compare June 12, 2025 07:41
@YOCKOW YOCKOW force-pushed the i-see-you-legacy-string-encodings branch from bc1656d to 844479e Compare June 12, 2025 07:48
@YOCKOW
Copy link
Member Author

YOCKOW commented Jun 12, 2025

@swift-ci Please test

Copy link
Contributor

@jmschonfeld jmschonfeld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks - that resolves the build failures, but I have a comment about one issue in the tests that I've discovered as well (this should hopefully be the last thing, thanks for bearing with me)

XCTAssertNil(sushi.data(using: String._Encoding.japaneseEUC))
XCTAssertEqual(
sushi.data(using: String._Encoding.japaneseEUC, allowLossyConversion: true),
"Sushi?".data(using: .utf8)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test unfortunately fails in the FOUNDATION_FRAMEWORK configuration because Foundation.framework's implementation of this conversion replaces both UTF-16 scalars of the emoji with ? rather than replacing the entire grapheme cluster with a single ?. I suspect that maybe ICU is replacing only with a single ? - do you happen to know if that's intentional? If so, for now we might need to vary the expected value to be Sushi?? in FOUNDATION_FRAMEWORK vs. Sushi? in !FOUNDATION_FRAMEWORK to account for the discrepancy between Foundation.framework and ICU (assuming we think this result from ICU is indeed correct and not a bug in how we're calling ICU)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for my overlooking again, but I'm not sure if we should implement conversion just like Foundation framework.
I mean the behavior of Foundation framework looks buggy because it doesn't seem to handle surrogate pairs correctly. (🍣 is non-BMP but consists of only one scalar U+1F363. Does that "buggy" behavior come from the historical reason that NSString has been represented as a sequence of UTF–16??)
Of course, we can imitate such behavior by replacing UCNV_FROM_U_CALLBACK_SUBSTITUTE with a custom callback function.
However, let me point out that current swift-foundation's implementation for .ascii is code-point-basis, not UTF-16-value-basis:

import Foundation

let sushiEmoji = "🍣"
print(sushiEmoji.data(using: .ascii, allowLossyConversion: true)!.count)
// -> Prints "1"

Options...?

  • To keep consistent with current (e.g.).ascii implementation:
    • Divide the test case for FOUNDATION_FRAMEWORK and !FOUNDATION_FRAMEWORK.
  • To make it compatible with Foundation framework's behavior:
    • Implement a custom callback for ucnv_setFromUCallBack.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I suspect this behavior comes from the fact that NSString is natively stored as UTF-16. For now, let's just divide the test case with a FOUNDATION_FRAMEWORK/!FOUNDATION_FRAMEWORK value for this expectation since I think enabling the conversion at all on non-Darwin is a good step even if the behavior is slightly different than Foundation.framework, and I can come back with a separate change to look into making this behavior the same across both platforms (while ensuring we don't break compatibility with existing clients).

@YOCKOW
Copy link
Member Author

YOCKOW commented Jun 15, 2025

@jmschonfeld
Thank you for the advice. I revised the test cases.

(I didn't know CI didn't execute test cases with FOUNDATION_FRAMEWORK.)

@YOCKOW
Copy link
Member Author

YOCKOW commented Jun 15, 2025

@swift-ci Please test

@jmschonfeld
Copy link
Contributor

Thanks for all of your updates - looks great to me now!

@jmschonfeld jmschonfeld merged commit 09a0524 into swiftlang:main Jun 16, 2025
15 checks passed
chloe-yeo pushed a commit to chloe-yeo/swift-foundation that referenced this pull request Jun 16, 2025
* Enable string conversion in EUC-JP.

Background: EUC-JP is not supported by OSS CoreFoundation, while it is supported by macOS Foundation Framework.
See swiftlang#1016

This commit resolves the issue by calling ICU API if necessary.

* ICU: Omit encodings that should be supported by FoundationEssentials.

In response to: swiftlang#1296 (comment)

* ICU: Remove unnecessary `nonisolated(unsafe)` from static property.

In response to: swiftlang#1296 (comment)

* Add comment to `func _icuMakeStringFromBytes_impl`.

In response to: swiftlang#1296 (comment)

* Delegate string conversion to ICU only when encoding is EUC-JP.

In response to: swiftlang#1296 (comment)

* Replace dynamic `_icu*` functions only if `!FOUNDATION_FRAMEWORK`.

In response to: swiftlang#1296 (comment)

* Divide test cases depending on `FOUNDATION_FRAMEWORK` for EUC_JP conversion.

In response to: swiftlang#1296 (comment)
@YOCKOW YOCKOW deleted the i-see-you-legacy-string-encodings branch June 17, 2025 00:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Discrepancy: String.data(using: .japaneseEUC) returns nil on non-Darwin.
3 participants