Skip to content
This repository was archived by the owner on Jan 17, 2024. It is now read-only.

Utf8.fromUtf8(): allow passing length #61

Merged
merged 1 commit into from
Nov 18, 2020

Conversation

jpnurmi
Copy link
Contributor

@jpnurmi jpnurmi commented Nov 14, 2020

No description provided.

@jpnurmi
Copy link
Contributor Author

jpnurmi commented Nov 14, 2020

@jpnurmi
Copy link
Contributor Author

jpnurmi commented Nov 14, 2020

I had to rebase on top of #60 to be able to run tests.

@dcharkes
Copy link
Contributor

@jpnurmi that is a useful addition!

@lrhn Any guidance on whether this should be an optional argument, or whether null-terminated and non-null-terminated strings should use different static methods?

@jpnurmi jpnurmi force-pushed the strlen branch 2 times, most recently from 0dc5bd1 to c82f6bb Compare November 16, 2020 20:32
@jpnurmi
Copy link
Contributor Author

jpnurmi commented Nov 16, 2020

  • {int? length} didn't work with Dart SDK 2.10 from Flutter stable
  • {int length} wouldn't work with Dart SDK 2.12 from Flutter master

The latest revision uses {int length = 0} and checks length > 0 instead of null to make it work with both older and newer SDKs. I'm also curious about what's the correct way to handle this.

@dcharkes dcharkes requested a review from lrhn November 16, 2020 21:46
@dcharkes
Copy link
Contributor

package:ffi is kind of strange, because it works both with and without NNBD at the moment. We will only publish pre-releases opted in to NNBD of the package until we release a stable Dart version with NNBD.

Adding a new static method to differentiate between null-terminated and not will solve the problem for now. @lrhn what do you think from an API design point of view?

Alternatively, you can fork the package and work on your own branch for the time being.

@jpnurmi
Copy link
Contributor Author

jpnurmi commented Nov 16, 2020

I've had similar helpers in a couple of FFI-based projects, so I thought it would be nice to have in package:ffi, but I can live with the helpers if it's just causing trouble. :) I think non-null-terminated cases are pretty rare, after all, but it's nice to be able to avoid unnecessary strlen() calls in case the length is known.

Copy link

@lrhn lrhn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think an optional argument is fine, it really is two ways to specify the same input to the same function.

A default of -1 will work both before and after null safety, but that shouldn't be important.
You should not be trying to make code which works with both. A library is either null safe or not, and you must pick one.
The ffi package has SDK >= 2.12.0-dev so it is null safety enabled, and you should be able to use int? length = null. If that doesn't work, then null safety (of the ffi package or any other package) is simply not compatible with what you're using it for, and if it works today, it's an accident.

@jpnurmi
Copy link
Contributor Author

jpnurmi commented Nov 17, 2020

you should be able to use int? length = null

int? length is fine? Specifying explicitly null complains:

Don't explicitly initialize variables to null. dart(avoid_init_to_null)

@lrhn
Copy link

lrhn commented Nov 17, 2020

ACK, that's a lint, so you should probably follow it an omit the = null. The language itself has no problem with = null, but it is the default so you don't have to write it.

@jpnurmi
Copy link
Contributor Author

jpnurmi commented Nov 17, 2020

Let me know if I should push separately the doc changes that are not strictly related to this PR. The Utf8 & Utf16 class descriptions are side by side in the class list so I couldn't leave it like that...

@jpnurmi jpnurmi changed the title Utf8.fromUtf8(): allow passing the length if not 0-terminated Utf8.fromUtf8(): allow passing length Nov 17, 2020
Copy link

@lrhn lrhn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this look fine, apart from referring to UTF-16 code units as "zero byte"s.
I've made some extra notes, but that's just drive-by comments on things you haven't touched. I'd let @dcharkes ponder those.
Also, it's his package, so I'll let him decide how to structure CLs (I'm a fan of monolithic CLs that fix everything in one go. Other people tend to disagree). 😁

@@ -8,15 +8,15 @@ import 'dart:typed_data';

import 'package:ffi/ffi.dart';

/// [Utf16] implements conversion between Dart strings and null-terminated
/// [Utf16] implements conversion between Dart strings and zero byte terminated
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: It's not a zero byte when it's 16-bit. Maybe a a zero "word" or "code unit". Maybe just "zero terminated", then it's the zero value of the appropriate type.

///
/// If 'string' contains NULL bytes, the converted string will be truncated
/// If 'string' contains zero bytes, the converted string will be truncated
/// prematurely. Unpaired surrogate code points in [string] will be preserved
/// in the UTF-8 encoded result. See [Utf8Encoder] for details on encoding.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably say UTF-16 here.
(You don't have to fix all existing typos, I'm just mentioning it so someone, like @dcharkes, can fix it later). 😉

/// native function signatures.
class Utf16 extends Struct {
/// Convert a [String] to a Utf16-encoded null-terminated C string.
/// Convert a [String] to a UTF-16 encoded zero byte terminated C string.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto: zero terminated.

///
/// If 'string' contains NULL bytes, the converted string will be truncated
/// If 'string' contains zero bytes, the converted string will be truncated
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... contains a zero code unit ....

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It actually won't be truncated. It allocates and copies all the code units over.
If that result is interpreted as zero-terminated, then that result will be truncated, but if you know the length, you should be able to get it all back.
(Not your text either.)

/// Returns the length of a null-terminated string -- the number of (one-byte)
/// characters before the first null byte.
/// Returns the length of a zero byte terminated string — the number of
/// bytes before the first zero byte.
static int strlen(Pointer<Utf8> string) {
final Pointer<Uint8> array = string.cast<Uint8>();
final Uint8List nativeString = array.asTypedList(_maxSize);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idea: Maybe use indexOf(0) below instead of indexWhere(closure). Should have less overhead.

jpnurmi added a commit to jpnurmi/ffi that referenced this pull request Nov 17, 2020
Be consistent about zero- vs. null-terminated (the latter was more
commonly used, but the former was suggested in dart-archive#61). Furthermore,
align UTF-8/16 spelling and fix a couple of minor typos.
jpnurmi added a commit to jpnurmi/ffi that referenced this pull request Nov 17, 2020
Be consistent about zero- vs. null-terminated (the latter was more
commonly used, but using the former was suggested in dart-archive#61). Furthermore,
align UTF-8/16 spelling and fix a couple of minor typos and formatting
issues.
@jpnurmi jpnurmi mentioned this pull request Nov 17, 2020
jpnurmi added a commit to jpnurmi/ffi that referenced this pull request Nov 17, 2020
Be consistent about zero- vs. null-terminated (the latter was more
commonly used, but using the former was suggested in dart-archive#61). Furthermore,
align UTF-8/16 spelling and fix a couple of minor typos and formatting
issues.
@jpnurmi
Copy link
Contributor Author

jpnurmi commented Nov 17, 2020

I opened a separate less intrusive PR to update the docs #63. I'll rebase and cleanup this afterwards if @dcharkes agrees with the changes.

dcharkes pushed a commit that referenced this pull request Nov 18, 2020
Be consistent about zero- vs. null-terminated (the latter was more
commonly used, but using the former was suggested in #61). Furthermore,
align UTF-8/16 spelling and fix a couple of minor typos and formatting
issues.
The string could be non-zero-terminated, for example, or even if it is,
the length may be already known, in which case an unnecessary strlen()
call can be avoided.
@dcharkes dcharkes merged commit 7eae47d into dart-archive:master Nov 18, 2020
@jpnurmi jpnurmi deleted the strlen branch November 18, 2020 09:01
@jpnurmi
Copy link
Contributor Author

jpnurmi commented Nov 18, 2020

Thanks!

P.S. There's still one strlen() improvement idea buried in the comments:

lrhn (Lasse R.H. Nielsen) 15 hours ago Member
Idea: Maybe use indexOf(0) below instead of indexWhere(closure). Should have less overhead.

@dcharkes
Copy link
Contributor

@jpnurmi, 1 thing is missing for publishing a new version of the package. Please make a PR that bumps the version number to 0.2.0-nullsafety.1 and add an entry to CHANGELOG.md. Then I can release another pre-release (only usable in Dart 1.12).

Alternatively, you can just depend on a git hash instead (or your own branch) for the time being until Dart 1.12 is released to stable.

@dcharkes
Copy link
Contributor

Thanks!

P.S. There's still one strlen() improvement idea buried in the comments:

lrhn (Lasse R.H. Nielsen) 15 hours ago Member
Idea: Maybe use indexOf(0) below instead of indexWhere(closure). Should have less overhead.

Do you want to address this before I merge #64?

jpnurmi added a commit to jpnurmi/ffi that referenced this pull request Nov 18, 2020
As suggested in dart-archive#61, using indexOf(0) to search the zero byte has a bit
less overhead than indexWhere(closure).
@jpnurmi
Copy link
Contributor Author

jpnurmi commented Nov 18, 2020

I don't mind :) => #65

dcharkes pushed a commit that referenced this pull request Nov 18, 2020
As suggested in #61, using indexOf(0) to search the zero byte has a bit
less overhead than indexWhere(closure).
dcharkes pushed a commit to dart-lang/native that referenced this pull request Jan 16, 2024
Be consistent about zero- vs. null-terminated (the latter was more
commonly used, but using the former was suggested in dart-archive/ffi#61). Furthermore,
align UTF-8/16 spelling and fix a couple of minor typos and formatting
issues.
dcharkes pushed a commit to dart-lang/native that referenced this pull request Jan 16, 2024
The string could be non-zero-terminated, for example, or even if it is,
the length may be already known, in which case an unnecessary strlen()
call can be avoided.
dcharkes pushed a commit to dart-lang/native that referenced this pull request Jan 16, 2024
As suggested in dart-archive/ffi#61, using indexOf(0) to search the zero byte has a bit
less overhead than indexWhere(closure).
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Development

Successfully merging this pull request may close these issues.

4 participants