Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Solve recently introduce ASCII regression #80709

Merged
merged 3 commits into from
Jan 17, 2023

Conversation

adamsitnik
Copy link
Member

One of the goals I had in #75012 was to replace the usage of internal helpers with new public API calls.

Initially we wanted to make Ascii.GetIndexOfFirstNonAsciiChar public, but at the end we decided to keep it internal.

Here I am removing the redundant call to Ascii.GetIndexOfFirstNonAsciiChar(ROS) with a call to the helper method that accepts a pointer and the length. What we get is one less method call, one less span creation and pinning.

fixes #80247

* one less method call
* one less pinning
* don't create a Span out of pointer just to fix it and get the pointer
@ghost
Copy link

ghost commented Jan 16, 2023

Tagging subscribers to this area: @dotnet/area-system-text-encoding
See info in area-owners.md if you want to be subscribed.

Issue Details

One of the goals I had in #75012 was to replace the usage of internal helpers with new public API calls.

Initially we wanted to make Ascii.GetIndexOfFirstNonAsciiChar public, but at the end we decided to keep it internal.

Here I am removing the redundant call to Ascii.GetIndexOfFirstNonAsciiChar(ROS) with a call to the helper method that accepts a pointer and the length. What we get is one less method call, one less span creation and pinning.

fixes #80247

Author: adamsitnik
Assignees: -
Labels:

area-System.Text.Encoding

Milestone: -

@ghost ghost assigned adamsitnik Jan 16, 2023
Copy link
Member

@tarekgh tarekgh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I assume we have a good test coverage for the changed APIs.

@adamsitnik adamsitnik added the NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) label Jan 17, 2023
@adamsitnik
Copy link
Member Author

I've applied no merge label as I am about to apply the same pattern in few other places and it would be better to merge of all of them in a single PR.

@stephentoub
Copy link
Member

If this overhead resulted in a measurable regression, does that suggest there's something to be changed about the public APIs? Presumably everyone else using them would hit similar issues?

@adamsitnik
Copy link
Member Author

If this overhead resulted in a measurable regression, does that suggest there's something to be changed about the public APIs? Presumably everyone else using them would hit similar issues?

The overhead is quite specific to our needs and is visible only for small inputs. Most of the Encoding methods pin the buffer and then just work with raw pointers:

fixed (char* pChars = chars)
{
return GetByteCountCommon(pChars + index, count);
}

fixed (char* pChars = chars)
{
return GetByteCountCommon(pChars, chars!.Length);
}

fixed (char* charsPtr = &MemoryMarshal.GetReference(chars))
{
return GetByteCountCommon(charsPtr, chars.Length);
}

We introduced Span-accepting methods that are also pinning the memory and then just passing a pointer to the helper. So, the overhead comes from creating a span (from the pointer and length), pinning the memory (again), having one more method call (to our public method) and doing some basic checks again (like checking for empty span). It’s very little, but here it matters.

int firstNonAsciiIndex = Ascii.GetIndexOfFirstNonAsciiChar(new ReadOnlySpan<char>(pChars, charsLength));

fixed (byte* pBuffer = &MemoryMarshal.GetReference(buffer))
{
nuint idxOfFirstNonAsciiElement = GetIndexOfFirstNonAsciiByte(pBuffer, bufferLength);

I've checked if we could get rid of pinning, but plenty of our ASCII helpers assume that the pointer is fixed and implement optimizations that are searching for the first aligned address:
// Now adjust the read pointer so that future reads are aligned.

We could switch to using manager pointers and remove the optimizations that try to take advantage of the alignment, but it would be very hard to achieve performance parity.

FWIW I am not happy about making these changes, as it’s as step back s but it seems to be the lesser evil in this particular case.

@adamsitnik adamsitnik removed the NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) label Jan 17, 2023
@adamsitnik adamsitnik merged commit f3a0f4f into dotnet:main Jan 17, 2023
@adamsitnik adamsitnik mentioned this pull request Jan 31, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Feb 17, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Regressions in System.Text.Perf_Utf8Encoding and System.Text.Tests.Perf_Encoding
3 participants