Utf8.fromUtf8(): allow passing length #61

jpnurmi · 2020-11-14T13:27:30Z

No description provided.

jpnurmi · 2020-11-14T13:31:34Z

For example: https://github.com/libexpat/libexpat/blob/master/expat/lib/expat.h#L269

jpnurmi · 2020-11-14T13:32:32Z

I had to rebase on top of #60 to be able to run tests.

lib/src/utf8.dart

dcharkes · 2020-11-16T15:31:45Z

@jpnurmi that is a useful addition!

@lrhn Any guidance on whether this should be an optional argument, or whether null-terminated and non-null-terminated strings should use different static methods?

jpnurmi · 2020-11-16T20:58:52Z

{int? length} didn't work with Dart SDK 2.10 from Flutter stable
{int length} wouldn't work with Dart SDK 2.12 from Flutter master

The latest revision uses {int length = 0} and checks length > 0 instead of null to make it work with both older and newer SDKs. I'm also curious about what's the correct way to handle this.

dcharkes · 2020-11-16T21:46:53Z

package:ffi is kind of strange, because it works both with and without NNBD at the moment. We will only publish pre-releases opted in to NNBD of the package until we release a stable Dart version with NNBD.

Adding a new static method to differentiate between null-terminated and not will solve the problem for now. @lrhn what do you think from an API design point of view?

Alternatively, you can fork the package and work on your own branch for the time being.

jpnurmi · 2020-11-16T22:31:04Z

I've had similar helpers in a couple of FFI-based projects, so I thought it would be nice to have in package:ffi, but I can live with the helpers if it's just causing trouble. :) I think non-null-terminated cases are pretty rare, after all, but it's nice to be able to avoid unnecessary strlen() calls in case the length is known.

lrhn

I think an optional argument is fine, it really is two ways to specify the same input to the same function.

A default of -1 will work both before and after null safety, but that shouldn't be important.
You should not be trying to make code which works with both. A library is either null safe or not, and you must pick one.
The ffi package has SDK >= 2.12.0-dev so it is null safety enabled, and you should be able to use int? length = null. If that doesn't work, then null safety (of the ffi package or any other package) is simply not compatible with what you're using it for, and if it works today, it's an accident.

lib/src/utf8.dart

test/utf8_test.dart

jpnurmi · 2020-11-17T12:37:42Z

you should be able to use int? length = null

int? length is fine? Specifying explicitly null complains:

Don't explicitly initialize variables to null. dart(avoid_init_to_null)

lrhn · 2020-11-17T12:45:43Z

ACK, that's a lint, so you should probably follow it an omit the = null. The language itself has no problem with = null, but it is the default so you don't have to write it.

lib/src/utf8.dart

jpnurmi · 2020-11-17T13:26:55Z

Let me know if I should push separately the doc changes that are not strictly related to this PR. The Utf8 & Utf16 class descriptions are side by side in the class list so I couldn't leave it like that...

lrhn

I think this look fine, apart from referring to UTF-16 code units as "zero byte"s.
I've made some extra notes, but that's just drive-by comments on things you haven't touched. I'd let @dcharkes ponder those.
Also, it's his package, so I'll let him decide how to structure CLs (I'm a fan of monolithic CLs that fix everything in one go. Other people tend to disagree). 😁

lrhn · 2020-11-17T17:23:21Z

lib/src/utf16.dart

@@ -8,15 +8,15 @@ import 'dart:typed_data';

 import 'package:ffi/ffi.dart';

-/// [Utf16] implements conversion between Dart strings and null-terminated
+/// [Utf16] implements conversion between Dart strings and zero byte terminated


Nit: It's not a zero byte when it's 16-bit. Maybe a a zero "word" or "code unit". Maybe just "zero terminated", then it's the zero value of the appropriate type.

lrhn · 2020-11-17T17:24:28Z

lib/src/utf16.dart

  ///
-  /// If 'string' contains NULL bytes, the converted string will be truncated
+  /// If 'string' contains zero bytes, the converted string will be truncated
  /// prematurely. Unpaired surrogate code points in [string] will be preserved
  /// in the UTF-8 encoded result. See [Utf8Encoder] for details on encoding.


Should probably say UTF-16 here.
(You don't have to fix all existing typos, I'm just mentioning it so someone, like @dcharkes, can fix it later). 😉

lrhn · 2020-11-17T17:27:50Z

lib/src/utf16.dart

 /// native function signatures.
 class Utf16 extends Struct {
-  /// Convert a [String] to a Utf16-encoded null-terminated C string.
+  /// Convert a [String] to a UTF-16 encoded zero byte terminated C string.


Ditto: zero terminated.

lrhn · 2020-11-17T17:28:12Z

lib/src/utf16.dart

  ///
-  /// If 'string' contains NULL bytes, the converted string will be truncated
+  /// If 'string' contains zero bytes, the converted string will be truncated


... contains a zero code unit ....

It actually won't be truncated. It allocates and copies all the code units over.
If that result is interpreted as zero-terminated, then that result will be truncated, but if you know the length, you should be able to get it all back.
(Not your text either.)

lrhn · 2020-11-17T17:32:38Z

lib/src/utf8.dart

-  /// Returns the length of a null-terminated string -- the number of (one-byte)
-  /// characters before the first null byte.
+  /// Returns the length of a zero byte terminated string &mdash; the number of
+  /// bytes before the first zero byte.
  static int strlen(Pointer<Utf8> string) {
    final Pointer<Uint8> array = string.cast<Uint8>();
    final Uint8List nativeString = array.asTypedList(_maxSize);


Idea: Maybe use indexOf(0) below instead of indexWhere(closure). Should have less overhead.

test/utf8_test.dart

Be consistent about zero- vs. null-terminated (the latter was more commonly used, but the former was suggested in dart-archive#61). Furthermore, align UTF-8/16 spelling and fix a couple of minor typos.

Be consistent about zero- vs. null-terminated (the latter was more commonly used, but using the former was suggested in dart-archive#61). Furthermore, align UTF-8/16 spelling and fix a couple of minor typos and formatting issues.

jpnurmi · 2020-11-17T20:10:06Z

I opened a separate less intrusive PR to update the docs #63. I'll rebase and cleanup this afterwards if @dcharkes agrees with the changes.

Be consistent about zero- vs. null-terminated (the latter was more commonly used, but using the former was suggested in #61). Furthermore, align UTF-8/16 spelling and fix a couple of minor typos and formatting issues.

The string could be non-zero-terminated, for example, or even if it is, the length may be already known, in which case an unnecessary strlen() call can be avoided.

jpnurmi · 2020-11-18T09:06:28Z

Thanks!

P.S. There's still one strlen() improvement idea buried in the comments:

lrhn (Lasse R.H. Nielsen) 15 hours ago Member
Idea: Maybe use indexOf(0) below instead of indexWhere(closure). Should have less overhead.

dcharkes · 2020-11-18T09:06:54Z

@jpnurmi, 1 thing is missing for publishing a new version of the package. Please make a PR that bumps the version number to 0.2.0-nullsafety.1 and add an entry to CHANGELOG.md. Then I can release another pre-release (only usable in Dart 1.12).

Alternatively, you can just depend on a git hash instead (or your own branch) for the time being until Dart 1.12 is released to stable.

dcharkes · 2020-11-18T09:39:18Z

Thanks!

P.S. There's still one strlen() improvement idea buried in the comments:

lrhn (Lasse R.H. Nielsen) 15 hours ago Member
Idea: Maybe use indexOf(0) below instead of indexWhere(closure). Should have less overhead.

Do you want to address this before I merge #64?

As suggested in dart-archive#61, using indexOf(0) to search the zero byte has a bit less overhead than indexWhere(closure).

jpnurmi · 2020-11-18T09:56:17Z

I don't mind :) => #65

As suggested in #61, using indexOf(0) to search the zero byte has a bit less overhead than indexWhere(closure).

Be consistent about zero- vs. null-terminated (the latter was more commonly used, but using the former was suggested in dart-archive/ffi#61). Furthermore, align UTF-8/16 spelling and fix a couple of minor typos and formatting issues.

The string could be non-zero-terminated, for example, or even if it is, the length may be already known, in which case an unnecessary strlen() call can be avoided.

As suggested in dart-archive/ffi#61, using indexOf(0) to search the zero byte has a bit less overhead than indexWhere(closure).

google-cla bot added the cla: yes label Nov 14, 2020

googlebot added the cla: yes label Nov 14, 2020

dcharkes reviewed Nov 16, 2020

View reviewed changes

lib/src/utf8.dart Outdated Show resolved Hide resolved

dcharkes mentioned this pull request Nov 16, 2020

Merge null safety branch to master #62

Merged

jpnurmi force-pushed the strlen branch 2 times, most recently from 0dc5bd1 to c82f6bb Compare November 16, 2020 20:32

dcharkes requested a review from lrhn November 16, 2020 21:46

lrhn reviewed Nov 17, 2020

View reviewed changes

lib/src/utf8.dart Outdated Show resolved Hide resolved

lib/src/utf8.dart Outdated Show resolved Hide resolved

lib/src/utf8.dart Outdated Show resolved Hide resolved

lib/src/utf8.dart Outdated Show resolved Hide resolved

test/utf8_test.dart Show resolved Hide resolved

jpnurmi force-pushed the strlen branch from c82f6bb to b4cf503 Compare November 17, 2020 12:33

lrhn reviewed Nov 17, 2020

View reviewed changes

lib/src/utf8.dart Outdated Show resolved Hide resolved

lrhn reviewed Nov 17, 2020

View reviewed changes

lib/src/utf8.dart Outdated Show resolved Hide resolved

jpnurmi changed the title ~~Utf8.fromUtf8(): allow passing the length if not 0-terminated~~ Utf8.fromUtf8(): allow passing length Nov 17, 2020

lrhn reviewed Nov 17, 2020

View reviewed changes

jpnurmi mentioned this pull request Nov 17, 2020

Tweak the docs #63

Merged

Utf8.fromUtf8(): allow passing the length

1f3a67a

The string could be non-zero-terminated, for example, or even if it is, the length may be already known, in which case an unnecessary strlen() call can be avoided.

jpnurmi force-pushed the strlen branch from 0ce6713 to 1f3a67a Compare November 18, 2020 08:50

dcharkes approved these changes Nov 18, 2020

View reviewed changes

dcharkes merged commit 7eae47d into dart-archive:master Nov 18, 2020

jpnurmi deleted the strlen branch November 18, 2020 09:01

jpnurmi added a commit to jpnurmi/ffi that referenced this pull request Nov 18, 2020

Minor Utf8.strlen() improvement

be60023

As suggested in dart-archive#61, using indexOf(0) to search the zero byte has a bit less overhead than indexWhere(closure).

jpnurmi mentioned this pull request Nov 18, 2020

Minor Utf8.strlen() improvement #65

Merged

dcharkes pushed a commit that referenced this pull request Nov 18, 2020

Minor Utf8.strlen() improvement (#65)

b34a792

As suggested in #61, using indexOf(0) to search the zero byte has a bit less overhead than indexWhere(closure).

dcharkes pushed a commit to dart-lang/native that referenced this pull request Jan 16, 2024

[ffi] Minor Utf8.strlen() improvement (dart-archive/ffi#65)

4832e4d

As suggested in dart-archive/ffi#61, using indexOf(0) to search the zero byte has a bit less overhead than indexWhere(closure).

Utf8.fromUtf8(): allow passing length #61

Utf8.fromUtf8(): allow passing length #61

Uh oh!

Conversation

jpnurmi commented Nov 14, 2020

Uh oh!

jpnurmi commented Nov 14, 2020

Uh oh!

jpnurmi commented Nov 14, 2020

Uh oh!

Uh oh!

dcharkes commented Nov 16, 2020

Uh oh!

jpnurmi commented Nov 16, 2020

Uh oh!

dcharkes commented Nov 16, 2020

Uh oh!

jpnurmi commented Nov 16, 2020

Uh oh!

lrhn left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jpnurmi commented Nov 17, 2020

Uh oh!

lrhn commented Nov 17, 2020

Uh oh!

Uh oh!

Uh oh!

jpnurmi commented Nov 17, 2020

Uh oh!

lrhn left a comment

Choose a reason for hiding this comment

Uh oh!

lrhn Nov 17, 2020

Choose a reason for hiding this comment

Uh oh!

lrhn Nov 17, 2020

Choose a reason for hiding this comment

Uh oh!

lrhn Nov 17, 2020

Choose a reason for hiding this comment

Uh oh!

lrhn Nov 17, 2020

Choose a reason for hiding this comment

Uh oh!

lrhn Nov 17, 2020

Choose a reason for hiding this comment

Uh oh!

lrhn Nov 17, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jpnurmi commented Nov 17, 2020

Uh oh!

jpnurmi commented Nov 18, 2020

Uh oh!

dcharkes commented Nov 18, 2020

Uh oh!

dcharkes commented Nov 18, 2020

Uh oh!

jpnurmi commented Nov 18, 2020

Uh oh!

Uh oh!