Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8318158: RISC-V: implement roundD/roundF intrinsics #16382

Closed
wants to merge 14 commits into from

Conversation

omikhaltsova
Copy link

@omikhaltsova omikhaltsova commented Oct 26, 2023

Please, review this Implementation of the roundD/roundF intrinsics for RISC-V platform.

In the table below it is shown that NaN argument should be processed as a special case.

                                                  RISC-V                            Java
                                        (FCVT.W.S)    (FCVT.L.D)  (long round(double a)) (int round(float a))
Minimum valid input (after rounding)     −2^31         −2^63         Long.MIN_VALUE       Integer.MIN_VALUE
Maximum valid input (after rounding)      2^31 − 1      2^63 − 1     Long.MAX_VALUE       Integer.MAX_VALUE
Output for out-of-range negative input   −2^31         −2^63         Long.MIN_VALUE       Integer.MIN_VALUE
Output for −∞                            −2^31         −2^63         Long.MIN_VALUE       Integer.MIN_VALUE
Output for out-of-range positive input    2^31 − 1      2^63 - 1     Long.MAX_VALUE       Integer.MAX_VALUE
Output for +∞                             2^31 − 1      2^63 - 1     Long.MAX_VALUE       Integer.MAX_VALUE
Output for NaN                            2^31 − 1      2^63 - 1           0                      0

The benchmark running with the 2nd fixed implementation on the T-Head RVB-ICE board shows the following performance improvement::

Before

Benchmark                              (TESTSIZE)   Mode  Cnt    Score   Error   Units
FpRoundingBenchmark.test_round_double        2048  thrpt   15   59.555  0.179  ops/ms
FpRoundingBenchmark.test_round_float         2048  thrpt   15   49.760  0.103  ops/ms

After

Benchmark                              (TESTSIZE)   Mode  Cnt    Score   Error   Units
FpRoundingBenchmark.test_round_double        2048  thrpt   15  110.956  0.186  ops/ms
FpRoundingBenchmark.test_round_float         2048  thrpt   15  115.947  0.122  ops/ms

Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8318158: RISC-V: implement roundD/roundF intrinsics (Enhancement - P4)

Reviewers

Contributors

  • Vladimir Kempik <vkempik@openjdk.org>

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/16382/head:pull/16382
$ git checkout pull/16382

Update a local copy of the PR:
$ git checkout pull/16382
$ git pull https://git.openjdk.org/jdk.git pull/16382/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 16382

View PR using the GUI difftool:
$ git pr show -t 16382

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/16382.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Oct 26, 2023

👋 Welcome back omikhaltcova! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Oct 26, 2023

@omikhaltsova The following label will be automatically applied to this pull request:

  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot hotspot-dev@openjdk.org label Oct 26, 2023
@omikhaltsova omikhaltsova changed the title 8318158: RISC-V: implement roundD/roundF intrisics 8318158: RISC-V: implement roundD/roundF intrinsics Oct 26, 2023
@omikhaltsova omikhaltsova marked this pull request as ready for review October 27, 2023 11:38
@openjdk openjdk bot added the rfr Pull request is ready for review label Oct 27, 2023
@mlbridge
Copy link

mlbridge bot commented Oct 27, 2023

Copy link
Member

@luhenry luhenry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On wording, RoundingMode::rne says "round to Nearest, ties to Even", while Math.round(float) says "round to Neares, ties to positive infinity". Are these equivalent? Do we have a test covering that?

@theRealAph
Copy link
Contributor

Please, review this Implementation of the roundD/roundF intrinsics for RISC-V platform. As shown below the output for RISC-V instructions and Java methods differs only for NaN argument.

I doubt that. Check the result for all x in float, x < 0 && abs(x) < 0x1.0p23f

@omikhaltsova
Copy link
Author

Yes, you are both right, this is incorrect implementation. I compared the output of the assembler instructions fcvt.w.s/fcvt.l.d and Java Math.round(), paying attention to the range mentioned above. The results are different. Thank you for pointing me out this mistake!

@omikhaltsova
Copy link
Author

/contributor add @VladimirKempik

@openjdk
Copy link

openjdk bot commented Nov 13, 2023

@omikhaltsova
Contributor Vladimir Kempik <vkempik@openjdk.org> successfully added.

@omikhaltsova
Copy link
Author

gentle ping, please take a look at this pr!

@VladimirKempik
Copy link

Can some reviewer take a look again please ?

@RealFYang
Copy link
Member

RealFYang commented Dec 5, 2023

Unfortunately, I witnessed performance regression on sifive unmatched board.

Before:

FpRoundingBenchmark.test_ceil                2048  thrpt   15  39.243 ? 0.506  ops/ms
FpRoundingBenchmark.test_floor               2048  thrpt   15  39.448 ? 0.076  ops/ms
FpRoundingBenchmark.test_rint                2048  thrpt   15  39.411 ? 0.134  ops/ms
FpRoundingBenchmark.test_round_double        2048  thrpt   15  31.329 ? 0.085  ops/ms
FpRoundingBenchmark.test_round_float         2048  thrpt   15  31.328 ? 0.031  ops/ms

After:

FpRoundingBenchmark.test_ceil                2048  thrpt   15  39.375 ? 0.125  ops/ms
FpRoundingBenchmark.test_floor               2048  thrpt   15  39.407 ? 0.076  ops/ms
FpRoundingBenchmark.test_rint                2048  thrpt   15  39.387 ? 0.235  ops/ms
FpRoundingBenchmark.test_round_double        2048  thrpt   15  23.940 ? 0.025  ops/ms
FpRoundingBenchmark.test_round_float         2048  thrpt   15  30.629 ? 0.021  ops/ms

@omikhaltsova
Copy link
Author

@RealFYang I've reproduced this performance regression on VisionFive 2. The results are as follow:

 Before
Benchmark                              (TESTSIZE)   Mode  Cnt   Score   Error   Units
FpRoundingBenchmark.test_round_double        2048  thrpt   15  39.335 ± 0.122  ops/ms
FpRoundingBenchmark.test_round_float         2048  thrpt   15  39.327 ± 0.138  ops/ms
After
FpRoundingBenchmark.test_round_double        2048  thrpt   15  30.004 ± 0.192  ops/ms
FpRoundingBenchmark.test_round_float         2048  thrpt   15  38.489 ± 0.120  ops/ms

@theRealAph
Copy link
Contributor

@RealFYang I've reproduced this performance regression on VisionFive 2. The results are as follow:

 Before
Benchmark                              (TESTSIZE)   Mode  Cnt   Score   Error   Units
FpRoundingBenchmark.test_round_double        2048  thrpt   15  39.335 ± 0.122  ops/ms
FpRoundingBenchmark.test_round_float         2048  thrpt   15  39.327 ± 0.138  ops/ms
After
FpRoundingBenchmark.test_round_double        2048  thrpt   15  30.004 ± 0.192  ops/ms
FpRoundingBenchmark.test_round_float         2048  thrpt   15  38.489 ± 0.120  ops/ms

That is, to say the very least, surprising. I'd use -prof:perfasm to find out why.

@VladimirKempik
Copy link

@RealFYang I've reproduced this performance regression on VisionFive 2. The results are as follow:

 Before
Benchmark                              (TESTSIZE)   Mode  Cnt   Score   Error   Units
FpRoundingBenchmark.test_round_double        2048  thrpt   15  39.335 ± 0.122  ops/ms
FpRoundingBenchmark.test_round_float         2048  thrpt   15  39.327 ± 0.138  ops/ms
After
FpRoundingBenchmark.test_round_double        2048  thrpt   15  30.004 ± 0.192  ops/ms
FpRoundingBenchmark.test_round_float         2048  thrpt   15  38.489 ± 0.120  ops/ms

That is, to say the very least, surprising. I'd use -prof:perfasm to find out why.

-prof:perfasm doesn't work on u74 boards(hifive and visionfive2) as is, some problems with cycles event.
This works: -prof perfasm:"events=cpu-clock"
but it's s/w event, still better than nothing.

@theRealAph
Copy link
Contributor

That is, to say the very least, surprising. I'd use -prof:perfasm to find out why.

-prof:perfasm doesn't work on u74 boards(hifive and visionfive2) as is, some problems with cycles event. This works: -prof perfasm:"events=cpu-clock" but it's s/w event, still better than nothing.

It is. We should not simply accept something like this without trying to understand the reason.

@Hamlin-Li
Copy link

I think usage of static rounding mode here is fine for riscv.

In typical implementations, writes to the dynamic rounding mode CSR state will serialize
the pipeline.
Static rounding modes are used to implement specialized arithmetic operations that often
have to switch frequently between different rounding modes.

-- from “F” Standard Extension

@Hamlin-Li
Copy link

Hamlin-Li commented Dec 21, 2023

Looks good.
Can you add some more comments for java_round_float(or double)? As there were lots of discussion here, but all this information is not in the code.

@omikhaltsova
Copy link
Author

@Hamlin-Li I've just left some comments. Is this what was required? Could you take a look, please?!

@Hamlin-Li
Copy link

Thanks for updating.
Yes, some like that make it better.

Maybe more comments about the trick of + 0.5? You could refere to comments at https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp#L5899.

BTW, some minor comments:

  1. can you move the code added in src/hotspot/cpu/riscv/macroAssembler_riscv.cpp up to line 4241? Just to move it out of the block of a bunch of macro definitions.
  2. And, comment style, maybe change from /**/ back to //, which is consistent with other comments for non-macro code.

@omikhaltsova
Copy link
Author

@Hamlin-Li Thanks for your advices! Fixed. IMHO this will be enough, otherwise, for clearlier understanding, a specific example should be given in the comments showing that the rounding ties to positive infinity is equal to calling sequentially of fadd with RDN and fcvt with RDN.

@Hamlin-Li
Copy link

For normal cases, I guess RUP of riscv will work; but for some corner cases, we need the trick of +0.5, am I right?
But all this information is just mentioned with some inputs produce incorrect results, which is unclear for potential readers and maintainers in the future.
So, in the comments, can you add some information about this corner case, a simple example will definitely help here.

@omikhaltsova
Copy link
Author

@Hamlin-Li No, that’s wrong: RUP of riscv won’t work, it’ll give incorrect results. Look at the rounding of some values below for example:

JAVA Math.round: src =  0.345200  dst =  0
JAVA Math.round: src = -0.555550  dst = -1
JAVA Math.round: src = -1.500000  dst = -1
JAVA Math.round: src =  1.500000  dst =  2
JAVA Math.round: src = -1.345000  dst = -1
JAVA Math.round: src = -1.450000  dst = -1
JAVA Math.round: src = -0.444460  dst =  0
JAVA Math.round: src = -0.999990  dst = -1
JAVA Math.round: src =  0.999999  dst =  1
JAVA Math.round: src =  0.000000  dst =  0
JAVA Math.round: src =  0.001000  dst =  0
JAVA Math.round: src = -0.001000  dst =  0
FCVT.W.S RUP: src =  0.345200  dst =  1
FCVT.W.S RUP: src = -0.555550  dst =  0
FCVT.W.S RUP: src = -1.500000  dst = -1
FCVT.W.S RUP: src =  1.500000  dst =  2
FCVT.W.S RUP: src = -1.345000  dst = -1
FCVT.W.S RUP: src = -1.450000  dst = -1
FCVT.W.S RUP: src = -0.444460  dst =  0
FCVT.W.S RUP: src = -0.999990  dst =  0
FCVT.W.S RUP: src =  0.999990  dst =  1
FCVT.W.S RUP: src =  0.000000  dst =  0
FCVT.W.S RUP: src =  0.001000  dst =  1
FCVT.W.S RUP: src = -0.001000  dst =  0

@omikhaltsova
Copy link
Author

fadd_s requires setting the explicit rounding mode RDN (round down towards −∞) because adding 0.5f to some floats exceeds the precision limits for a float and therefore rounding takes place. This leads to the incorrect results in case of the default rounding mode RNE (round to nearest, ties to even) for some inputs:

error: src = 8388609.000000  dst = 8388610  etalon = 8388609
error: src = 8388611.000000  dst = 8388612  etalon = 8388611
error: src = 8388613.000000  dst = 8388614  etalon = 8388613
error: src = 8388615.000000  dst = 8388616  etalon = 8388615
error: src = 8388617.000000  dst = 8388618  etalon = 8388617
error: src = 8388619.000000  dst = 8388620  etalon = 8388619
error: src = 8388621.000000  dst = 8388622  etalon = 8388621
error: src = 8388623.000000  dst = 8388624  etalon = 8388623
error: src = 8388625.000000  dst = 8388626  etalon = 8388625
error: src = 8388627.000000  dst = 8388628  etalon = 8388627
error: src = 8388629.000000  dst = 8388630  etalon = 8388629
error: src = 8388631.000000  dst = 8388632  etalon = 8388631
error: src = 8388633.000000  dst = 8388634  etalon = 8388633
error: src = 8388635.000000  dst = 8388636  etalon = 8388635
error: src = 8388637.000000  dst = 8388638  etalon = 8388637
error: src = 8388639.000000  dst = 8388640  etalon = 8388639
etc.

Let’s consider two of them with RNE for fadd.s:

fadd.s rne (src + 0.5f): src = 8388609.000000  dst = 8388610.000000
fcvt.w.s rdn: src = 8388610.000000 dst = 8388610
RESULT: 8388610  (JAVA Math.round: 8388609)
fadd.s rne (src + 0.5f): src = 8388611.000000  dst = 8388612.000000
fcvt.w.s rdn: src = 8388612.000000 dst = 8388612
RESULT: 8388612  (JAVA Math.round: 8388611)

if RDN is set for fadd.s then:

fadd.s rdn (src + 0.5f): src = 8388609.000000  dst = 8388609.000000
fcvt.w.s rdn: src = 8388609.000000 dst = 8388609
RESULT: 8388609  (JAVA Math.round: 8388609)
fadd.s rdn (src + 0.5f): src = 8388611.000000  dst = 8388611.000000
fcvt.w.s rdn: src = 8388611.000000 dst = 8388611
RESULT: 8388611  (JAVA Math.round: 8388611)

@omikhaltsova
Copy link
Author

In addition, some examples with RDN for fadd.s and fcvt.w.s:

fadd.s (src + 0.5f): src = 0.345200  dst = 0.845200
fcvt.w.s: src = 0.845200 dst = 0
RESULT: 0  (JAVA Math.round: 0)

fadd.s (src + 0.5f): src = -0.555550  dst = -0.055550
fcvt.w.s: src = -0.055550 dst = -1
RESULT: -1  (JAVA Math.round: -1)

fadd.s (src + 0.5f): src = -1.500000  dst = -1.000000
fcvt.w.s: src = -1.000000 dst = -1
RESULT: -1  (JAVA Math.round: -1)

fadd.s (src + 0.5f): src = 1.500000  dst = 2.000000
fcvt.w.s: src = 2.000000 dst = 2
RESULT: 2  (JAVA Math.round: 2)

fadd.s (src + 0.5f): src = -1.345000  dst = -0.845000
fcvt.w.s: src = -0.845000 dst = -1
RESULT: -1  (JAVA Math.round: -1)

fadd.s (src + 0.5f): src = -1.450000  dst = -0.950000
fcvt.w.s: src = -0.950000 dst = -1
RESULT: -1  (JAVA Math.round: -1)

fadd.s (src + 0.5f): src = -0.444460  dst = 0.055540
fcvt.w.s: src = 0.055540 dst = 0
RESULT: 0  (JAVA Math.round: 0)

fadd.s (src + 0.5f): src = -0.999990  dst = -0.499990
fcvt.w.s: src = -0.499990 dst = -1
RESULT: -1  (JAVA Math.round: -1)

fadd.s (src + 0.5f): src = 0.999990  dst = 1.499990
fcvt.w.s: src = 1.499990 dst = 1
RESULT: 1  (JAVA Math.round: 1)

fadd.s (src + 0.5f): src = 0.000000  dst = 0.500000
fcvt.w.s: src = 0.500000 dst = 0
RESULT: 0  (JAVA Math.round: 0)

fadd.s (src + 0.5f): src = 0.001000  dst = 0.501000
fcvt.w.s: src = 0.501000 dst = 0
RESULT: 0  (JAVA Math.round: 0)

fadd.s (src + 0.5f): src = -0.001000  dst = 0.499000
fcvt.w.s: src = 0.499000 dst = 0
RESULT: 0  (JAVA Math.round: 0)

@omikhaltsova
Copy link
Author

@Hamlin-Li @RealFYang Could you take a look once again, please, whether these comments are sufficient or something else is needed?

Copy link

@Hamlin-Li Hamlin-Li left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating.

In fact by java api spec, the corner cases also include Integer/Long.MIN/MAX_VALUE

Can you add some comments like below?

It also works for -2.1474836E9f which is corresponding to -2147483648 (Integer.MIN_VALUE) and even less float value;
it also works for 2.1474836E9f which is corresponding to 2147483647 (Integer.MAX_VALUE) and even greater float value; 

BTW, some minor comments, I think you mean java.lang.Math or j.l.Math instead of java.math.

Otherwise it looks good to me.

@omikhaltsova
Copy link
Author

@Hamlin-Li thank you for reviewing! You suggested to write comments similar to aarch64 #16382 (comment). IMHO java.math.round doesn't bring any mess but I've just replaced java.math.round with java.lang.Math.round in order to be more accurate.
Concerning comments about Integer.MIN_VALUE/Integer.MAX_VALUE, I don't think it's worth writing about because:

  • none platform contains such comments;
  • the other cases should be mentioned as well in this case such as: +/-0, +/-subnormal numbers, signaling/quiet NaN, +/-inf;
  • during this review the entire 32-bit range was tested against the current Java implementation and @RealFYang rechecked and confirmed it.

@Hamlin-Li
Copy link

Just FYI, there is a java.math package.

For special cases,

  1. we're implementing an intrinsic for java api, which is defined clearly in corresponding java doc about which are special cases, so better to clearly state how we handle them.
  2. For other special cases, unless riscv has different behaviour with java spec, I don't think it's necessary to mention them either, but it does not do harm if it's mentioned.

@omikhaltsova
Copy link
Author

Thank you all very much for the review!

/integrate

@openjdk openjdk bot added the sponsor Pull request is ready to be sponsored label Dec 29, 2023
@openjdk
Copy link

openjdk bot commented Dec 29, 2023

@omikhaltsova
Your change (at version ba6e21a) is now ready to be sponsored by a Committer.

@VladimirKempik
Copy link

/sponsor

@openjdk
Copy link

openjdk bot commented Dec 29, 2023

Going to push as commit 19147f3.
Since your change was applied there have been 511 commits pushed to the master branch:

  • 2a59243: 8322734: A redundant return in method padWithLen
  • 4fc6b0f: 8068958: Timestamp.from(Instant) should throw when conversion is not possible
  • 28c82bf: 8322661: Build broken due to missing jvmtiExport.hpp after JDK-8320139
  • 7263e25: 8322490: cleanup CastNode construction
  • f695ca5: 8321151: JDK-8294427 breaks Windows L&F on all older Windows versions
  • 93fedc1: 8321802: (zipfs) Add validation of incorrect LOC signature in ZipFileSystem
  • 1230853: 8322163: runtime/Unsafe/InternalErrorTest.java fails on Alpine after JDK-8320886
  • dce7a57: 8321683: Tests fail with AssertionError in RangeWithPageSize
  • c53f845: 8322539: Parallel: Remove duplicated methods in PSAdaptiveSizePolicy
  • 84c2379: 8320139: [JVMCI] VmObjectAlloc is not generated by intrinsics methods which allocate objects
  • ... and 501 more: https://git.openjdk.org/jdk/compare/1802cb566e956febebc181da26a666bea4942e87...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Dec 29, 2023
@openjdk openjdk bot closed this Dec 29, 2023
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review sponsor Pull request is ready to be sponsored labels Dec 29, 2023
@openjdk
Copy link

openjdk bot commented Dec 29, 2023

@VladimirKempik @omikhaltsova Pushed as commit 19147f3.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot hotspot-dev@openjdk.org integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.

6 participants