Skip to content

Conversation

@maartenbreddels
Copy link
Contributor

Following up on #7418 I tried and benchmarked a different way for

  • ascii_lower
  • ascii_upper

Before (lower is similar):

--------------------------------------------------
Benchmark           Time           CPU Iterations
--------------------------------------------------
AsciiUpper_median    4922843 ns      4918961 ns           10 bytes_per_second=3.1457G/s items_per_second=213.17M/s

After:

--------------------------------------------------
Benchmark           Time           CPU Iterations
--------------------------------------------------
AsciiUpper_median    1391272 ns      1390014 ns           10 bytes_per_second=11.132G/s items_per_second=754.363M/s

This is a 3.7x speedup (on a AMD machine).

Using http://quick-bench.com/JaDErmVCY23Z1tu6YZns_KBt0qU I found 4.6x speedup for clang 9, 6.4x for GCC 9.2.

Also, the test is expanded a bit to include a non-ascii codepoint, to make explicit it is fine to upper
or lower case a utf8 string. The non-overlap encoding of utf8 make this ok (see section 2.5 of Unicode
Standard Core Specification v13.0).

Following up on apache#7418 I tried and benchmarked a different way for
 * ascii_lower
 * ascii_upper

Before (lower is similar):
```
--------------------------------------------------
Benchmark           Time           CPU Iterations
--------------------------------------------------
AsciiUpper_median    4922843 ns      4918961 ns           10 bytes_per_second=3.1457G/s items_per_second=213.17M/s
```

After:
```
--------------------------------------------------
Benchmark           Time           CPU Iterations
--------------------------------------------------
AsciiUpper_median    1391272 ns      1390014 ns           10 bytes_per_second=11.132G/s items_per_second=754.363M/s

```

This is a 3.7x speedup (on a AMD machine).

Using http://quick-bench.com/JaDErmVCY23Z1tu6YZns_KBt0qU I found 4.6x speedup for clang 9, 6.4x for GCC 9.2.

Also, the test is expanded a bit to include a non-ascii codepoint, to make explicit it is fine to upper
or lower case a utf8 string. The non-overlap encoding of utf8 make this ok (see section 2.5 of Unicode
Standard Core Specification v13.0).
@github-actions
Copy link

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is neat. It appears that both gcc and clang auto-vectorize the conditional expression.

const uint8_t utf8_code_unit = *input++;
// Code units in the range [a-z] can only be an encoding of an ascii
// character/codepoint, not the 2nd, 3rd or 4th code unit (byte) of an different
// codepoint. This guaranteed by non-overal design of the unicode standard. (see
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"non-overlap"

@maartenbreddels
Copy link
Contributor Author

I also thought that we could do a bit check instead of the range check, e.g. code_unit & 0b11100000) == 0b01100000, but that would also transform the backtick for instance (binary value 0b1100000).

The generated code looks vectorized indeed. I didn't look into the details of the generated code by clang and GCC, it seems their performance is a bit different, so we might be able to squeeze out a bit more if we want. Happy to look into that later (create a new issue), but I rather spend my time on other functions now.

@pitrou
Copy link
Member

pitrou commented Jun 15, 2020

It's ok, there's no need to further optimize those functions.

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@pitrou
Copy link
Member

pitrou commented Jun 15, 2020

Travis-CI failure is unrelated.

@pitrou pitrou closed this in d98b9c5 Jun 15, 2020
@pitrou
Copy link
Member

pitrou commented Jun 15, 2020

Thank you @maartenbreddels !

@maartenbreddels maartenbreddels deleted the ARROW-9131 branch June 15, 2020 13:32
@maartenbreddels
Copy link
Contributor Author

Thanks, let me know if my workflow is ok, or if I can make things go smoother.

PS: I am looking for a document describing the kernel design. I see these two cases (if (batch[0].kind() == Datum::ARRAY) { and the else clause, but I am not sure I fully understand this. But I'm not sure where this is described, if it is.

PS2: Are there alternative channels for quick/small questions, or is this fine?

@pitrou
Copy link
Member

pitrou commented Jun 15, 2020

I think the current approach (implement kernels one-by-one) is reasonable and manageable for us (and for you as well I hope).

I don't think there's much documentation for now around the kernel design. Basically, kernels should usually be able to process two kinds of inputs (represented as Datums): arrays and scalars. That said, arrays are the dominant case, so if we leave scalars unimplemented in a given kernel, that's not an urgent problem.

For quick development questions, we have a public chat instance at https://ursalabs.zulipchat.com/ - just register though and you can chat with the team. The main channel there is the "dev" channel, and Zulip allows you to create subtopics - don't hesitate to use those!

@maartenbreddels
Copy link
Contributor Author

I think the current approach (implement kernels one-by-one) is reasonable and manageable for us (and for you as well I hope).

No, this is fine.

That said, arrays are the dominant case, so if we leave scalars unimplemented in a given kernel, that's not an urgent problem.

👍

@wesm
Copy link
Member

wesm commented Jun 15, 2020

PS2: Are there alternative channels for quick/small questions, or is this fine?

Can we use dev@arrow.apache.org so everyone can see the discussion and it's searchable via Google later?

PS: I am looking for a document describing the kernel design.

Keep in mind that the new kernels framework is only a few weeks old (7ad49ee), so developer documentation is a bit behind, so don't hesitate to ask questions.

@maartenbreddels
Copy link
Contributor Author

Can we use dev@arrow.apache.org so everyone can see the discussion and it's searchable via Google later?

I'm ok with that if not considered too noisy (like small cmake questions etc).

@wesm
Copy link
Member

wesm commented Jun 15, 2020

It shouldn't be a problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants