Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor the Power select evaluators #5497

Merged
merged 11 commits into from
Sep 14, 2020

Conversation

aviansie-ben
Copy link
Contributor

This PR performs major refactors primarily focusing on the *select evaluators in the Power codegen:

  • Split floating-point and integer select evaluation into two separate evaluators
  • Switch from a control flow pseudoinstruction to internal control flow for the branching sequence
  • Add an optimization to switch the order of the operands to avoid register shuffles due to gprClobberEvaluate
  • Exploit isel to perform branchless integer selects on POWER10
  • Add a general API for evaluating a node to a condition register bit, reusing infrastructure from the newly refactored compare evaluators
  • Move compare operand flipping into evaluateThreeWayIntCompareToConditionRegister
  • Improve Tril test coverage for these evaluators

@fjeremic
Copy link
Contributor

@aviansie-ben it looks like at least x86 does not support the fselect evaluators that you implemented tests for. We'll likely have to skip them on some of the platforms and open up issues to track adding support. I'll go ahead and launch builds now to see which other platforms will fail that we also need to skip on.

@genie-omr build all

@aviansie-ben
Copy link
Contributor Author

@genie-omr build all

@aviansie-ben
Copy link
Contributor Author

@genie-omr build all

@andrewcraik
Copy link
Contributor

@aviansie-ben looks like your changes will clash with the changes in PR #5474 - could you see if you can incorporate the fix for making sure the aselect correctly marks its result as collected based on its children? @rpshukla perhaps your change can drop the POWER update if @aviansie-ben includes the fix in the refactor? Just wanted to get these two changes deconflicted.

@aviansie-ben
Copy link
Contributor Author

@andrewcraik Just looked into it and it seems that my refactored version already fixes that issue. Only thing that was missing was the assert relating to internal pointers, so I've gone ahead and added that.

@andrewcraik
Copy link
Contributor

FYI @rpshukla I think your PR can drop the POWER codegen part of the change since @aviansie-ben has handled it here.

Copy link
Contributor

@gita-omr gita-omr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add ZEROCHK to the PR title?

compiler/p/codegen/ControlFlowEvaluator.cpp Outdated Show resolved Hide resolved
The previous Tril tests for select opcodes were incomplete and did not
test a number of corner cases. For instance, only select opcodes with a
compare as their first child were being tested. These tests have been
expanded to cover more cases.

Signed-off-by: Ben Thomas <ben@benthomas.ca>
The Z codegen currently has a known bug with sub-integer compares as the
first child of an integral select opcode. The tests for this have been
temporarily disabled until this can be resolved.

Signed-off-by: Ben Thomas <ben@benthomas.ca>
The x86 codegen currently has a known bug with sub-integer compares as
the first child of an integral select opcode. The tests for this have
been temporarily disabled until this can be resolved.

Signed-off-by: Ben Thomas <ben@benthomas.ca>
Signed-off-by: Benjamin Thomas <ben@benthomas.ca>
In certain evaluators (e.g. *select), a selector value needs to be used
as a condition in ICF. Previously, compare nodes would need to be
evaluated to a GPR, then the GPR would be compared against 0. This was
obviously inefficient, since it's possible to evaluate most comparisons
directly to a condition register in 1-2 instructions. Some evaluators
had special handling for this, but this handling was not uniform and
would frequently miss corner cases.

To address this and allow for better optimization opportunities, a new
helper has been added which evaluates a node to a condition register and
returns the condition which is true if the node was non-zero. For
compare nodes, the comparison will be performed directly into a
condition register. For other types of nodes, the old behaviour of
evaluating the node to a GPR and comparing that against 0 will be used.

Signed-off-by: Ben Thomas <ben@benthomas.ca>
In order to avoid FXU rejects on POWER6 chips, the Power codegen
contains logic to flip the order of operands when performing a
comparison when determined to be necessary.

Unfortunately, the implementation of this optimization left much to be
desired. The generateTrg1Src2Instruction helper would set a special flag
on the target register and silently flip the compare operand order if it
was deemed to be necessary. The generateConditionalBranch helper (and
related helpers) would then check this flag to determine whether they
needed to flip the branch. This is obviously very surprising and also
doesn't correctly handle CR logical instructions.

Since the evaluation of comparisons to a condition register has now been
centralized in evaluateIntCompareToConditionRegister, the logic can now
be moved there, with the condition returned being flipped rather than
using hacky flags. This should prevent any unfortunate mistakes in the
future.

Signed-off-by: Ben Thomas <ben@benthomas.ca>
The Power evaluators for *select opcodes have been significantly
refactored to be much simpler. Additionally a number of new
optimizations have been added to reduce the overhead of these opcodes:

1. Selector nodes which are compares are now evaluated directly to a
   condition register to avoid the overhead of evaluating them and then
   comparing the result against 0.
2. The register for one of the operands is reused if possible to avoid
   an extra mr/fmr instruction.

Signed-off-by: Ben Thomas <ben@benthomas.ca>
The previous implementation of iselect node evaluation used a control
flow instruction called iselect to implement its control flow. Since the
new implementation uses ICF instead, this instruction is no longer
needed.

Signed-off-by: Ben Thomas <ben@benthomas.ca>
Power has long had an instruction for branchlessly selecting between two
GPRs based on a condition register bit. Unfortunately, performance of
this instruction left much to be desired until POWER10. Now that POWER10
has improved performance of this instruction, exploitation of it can be
added to the relevant select evaluators.

Signed-off-by: Ben Thomas <ben@benthomas.ca>
The select evaluator in the Power codegen cannot handle select nodes
with children that are internal pointers, since information about the
pinning array pointer would be lost. An assert has been added to verify
that internal pointers are not passed to this evaluator.

Signed-off-by: Ben Thomas <ben@benthomas.ca>
@aviansie-ben
Copy link
Contributor Author

Oops, the ZEROCHK changes weren't actually supposed to be part of this PR and were meant to be in a future PR 😅. I've rebased to correct merge conflicts, remove those ZEROCHK changes, and squash some of the Tril test fixes.

@fjeremic fjeremic self-assigned this Sep 3, 2020
@aviansie-ben
Copy link
Contributor Author

@gita-omr Can you please complete your review of this PR so that it can move ahead?

@fjeremic
Copy link
Contributor

@genie-omr build plinux,aix

There are some switch statements on enum classes in
ControlFlowEvaluator.cpp that were previously missing default cases.
While these switches are exhaustive, the way enum classes work in C++
means that technically any int can be used, so default cases are needed
to make sure that such errors are caught.

Signed-off-by: Ben Thomas <ben@benthomas.ca>
@aviansie-ben
Copy link
Contributor Author

@genie-omr build plinux,aix

@fjeremic fjeremic merged commit 801b40d into eclipse-omr:master Sep 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants