This repository was archived by the owner on Apr 28, 2025. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 102
Improve sqrt
and sqrtf
by using hardware more often.
#222
Closed
Closed
Changes from all commits
Commits
Show all changes
17 commits
Select commit
Hold shift + click to select a range
ec032cb
Put the usual badges in the README
Lokathor 229a61d
Use hardware intrinsic much more often
Lokathor 3894b79
Update sqrt.rs
Lokathor a78db4a
Update sqrtf.rs
Lokathor f2a29be
types are important
Lokathor d81d027
formatting fix
Lokathor 0c504f8
make core_intriniscs always available on Nightly
Lokathor 162148a
flip the stable/ unstable meaning
Lokathor e6e7ecb
Merge branch 'master' into sqrt-cfg-if
Lokathor 30c6ecb
Merge branch 'master' into sqrt-cfg-if
Lokathor 80665dc
update for the new "unstable" feature.
Lokathor 719c6b7
fix the soft-float feature name
Lokathor 321eeb8
update when hardware is assumed to have sqrt support
Lokathor d59b2d4
narrow down the panic check
Lokathor 04134b9
move the panic check to just our software func
Lokathor 0fb136d
downgrade no-panic version to 0.1.8
Lokathor 02e2eee
on ARM, f64 doesn't always have hardware sqrt
Lokathor File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I may not have been clear enough earlier, but
soft-float
andhard-float
are not features, I don't think thattarget_feature = "soft-float"
is ever set (nor forhard-float
)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I got the name of those target features from
rustc --print target-features --target TARGET
, so I think at least rustc thinks they exist.How else should we detect for hard or soft float?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately
-C target-feature
, what--print target-features
does, is not equivalent to#[cfg(target_feature = "..")]
. I'd recommend looking at compiled code to see how the#[cfg]
here isn't working.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, I see now that
target_feature = "soft-float"
will always evaluate to false (for example), but that brings us back to the question: how do we respect the hard-float/soft-float configuration of the target? If we can't do that, we can't ever be sure that a hardware instruction exists to be called.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I don't always have the answers to all the questions, I'm just pointing out how this isn't working as intended. I don't know how this would be detected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's totally fine if you don't know it all! You just happen to usually know the most, and you're "here already". We'll have to ask around I guess.
Without an answer to this problem we're just plain stuck. It would be incorrect to call back to the LLVM intrinsic if the arch's version of hardware floating point isn't enabled, since LLVM will end up calling us again and it's just infinite recursion.
The only real relief to be had is that if there is hardware sqrt then LLVM won't ever call us here, so we could just skip trying to divert back to LLVM at all. Except that's where we started basically and that's what could potentially give the performance regressions that we're trying to avoid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened an issue to seek a wider audience for support: rust-lang/rust#64514
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if I'm understanding the issue it's that if we export
sqrt
andsqrtf
fromcompiler_builtins
unconditionally then any C code that's linked will use this version and it might perform slower than what it's already using. I think the solution is to not exportsqrt
andsqrtf
unconditionally not only to avoid the performance issue but also because C code could depend on the specific error handling of the libm it was compiled against that this code can't emulate.core
already depends onfmod
so I thinksqrt
can and should be handled in the same way.