-
Notifications
You must be signed in to change notification settings - Fork 162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HPC-GAP: Allow configuration of TLS and improve native TLS performance on macOS #3502
Conversation
My understanding is that this contains PR #2845, correct? If so, it really would be nice if we could get that PR finished and merged, before building new stuff atop of it... |
Strictly speaking, this PR does not actually depend on the other one, but is committed on top of that branch for convenience. I am already working on cleaning up the other PR as part of my current work to make multi-threaded Oscar a possibility, so that we can move on from that. |
d99b7e5
to
83ee9a8
Compare
7eb6ab0
to
f132902
Compare
f132902
to
de252d1
Compare
This is now a revised version that mirrors the Mono logic. It can also now be more easily extended to equally support Windows implementations. |
bd302fb
to
51464e8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The commit 'Update .gitignore and .travis.yml files from master.' needs a new commit message.
Otherwise, this is shaping up well, but we need to figure out how to get it into a mergeable state, see my comments.
src/system.h
Outdated
#endif | ||
|
||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very unimportant suggestion, feel free to ignore: but I try to restrict to at most 2 empty lines in a row.
etc/ci-prepare.sh
Outdated
unward/bin/unward --inplace src | ||
# commit the result to prevent the docomp test from triggering | ||
git commit -m "Unward" src |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this means people who want to use the code in this PR still need to have unward, and run it, and then have lots of modifications in their tree. That's IMHO the major blocker for merging this PR, and until we resolve it, there is no way this can be merged. So I see two ways forward:
- We run
unward
and commit the modifications once and for all, and move on (my understanding always was that this is the plan?) - We determine that we will have to keep using
unward
for the foreseeable. In that case, we will have to use it likeward
, and thus need to restore the parts of the build system which transparently handled applyingward
(now:unward
) to sources followed by compilation (instead of working inlaces, the results would be placed intogen/src/
; and build rules would ensuregen/src/FOO.c
gets rebuilt wheneversrc/FOO.c
or one of its header dependencies is touched.
Thing is, we need people to use this code so it gets tested, so keeping it in a PR indefinitely is not a good way forward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, this is something that I don't have a good solution for myself. My plan was to not put the unward changes in for a transition period. The reason is that if we need to change something about how they work (e.g. include an option for disabling guards for performance reasons), they mess up diffs and git blame, as they're pretty extensive.
But that leads to the problem that if you build HPC-GAP without unward, you get a near immediate crash, as an unnecessary read or write guard fails (one that gets removed by unward).
Possible solution: add a --without-guards
configure option. That allows builds to (unsafely) work without unward for the transition period and would also later allow HPC-GAP to be built without guards for performance reasons if performance is critical.
If we were to put unward into the build system, it already supports a -o <dir>
option to place the transformed sources (and all untransformed ones, too) into a separate directory, say gen/
. This takes less than a second, is incremental (it doesn't overwrite files that didn't get changed and therefore doesn't change their time stamps) and thus make-friendly, and is in principle possible.
But this would again introduce an external dependency, which I'm not very enthusiastic about (even though unward, unlike ward, is itself dependency-free and only requires a C++-98 or later compiler). A large part of the raison d'être for unward is, after all, that it can be dropped from the build system eventually and does not continue to be a maintenance burden.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds as if you are aiming for option 1 eventually. But the big question then is: when would we be ready for it? What's the hard criterium for making that decision? I keep coming back to this very same questions since when you first started talking about unward. You keep saying it's meant to be used once, but also keep saying that it may need to be changed, and then re-run. Unless there is a clear goal as to when it's "done", I don't see a solution out of this dilemma, other than to accept that the notion of "unward will be run once and then discarded" is hopeless. Hence option 2.
At the same time, it's also totally unclear to me what major changes to unward output you think might be needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two possible changes that I'm contemplating right now and that may require changes in the output:
- Allow disabling of guards at compile time.
- Add
#ifdef HPCGAP
around the inline functions, so that they don't get compiled when not built with HPC-GAP. This was unnecessary when they were still static, but now they're not static anymore and should not show up in the code regardless.
My goal with unward was to keep the choice open-ended. This is why I specifically supported both the following options:
- Make the process idempotent, so that you can rerun unward if necessary.
- Add a
-o
option to allow for use as a preprocessor in the build system (i.e. like ward).
This means that we wouldn't be forced by design to pick one or the other.
As for the process, I was hoping to basically get the following done:
- Add the current PR (cleaned up) to GAP.
- Ask a couple of people to test that it works with unward (there aren't many who are using HPC-GAP, but there are a couple at least).
- If that's the case, then after a short delay (couple of weeks) add the unward changes in a separate patch.
I am totally fine with rolling them into the current PR, too, if you think that's the best way forward. In the end, you probably have a better view of the big picture as far as GAP is concerned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rbehrends (CC: @stevelinton @ChrisJefferson ) regarding your first two points, it seems you already added --enable-guards
/ --disable-guards
(which is perfectly fine by me). Note sure around which inline functions you want(ed?) to add #ifdef HPCGAP
, but I guess it's fine, too.
Regarding how to proceed: My main concern about merging this PR is how it would affect people who are using HPC-GAP right now, but who do not yet know about unward and thus are not using it. Will they get a horribly broken HPC-GAP, or just one that's more or less on the same level as what we have in master
right now anyway?
If it was the latter, there is indeed nothing major stopping this from being merged (my remaining remarks are all very minor nitpicks and should be easy to resolve).
However, it seems to be the former (starting GAP built from your branch w/o running unward causes an immediate segfault due to violated guards). That makes me somewhat more reserved about the idea of merging this as-is. It'd be still OK by me if there was a very concrete timeline for it, and also a plan B should we determined that result of the merge is unusable. But my main concern is who the "couple of people" would be we'd ask to test it, and what makes us think they'd be more likely to test it then, than they would be to test this PR / your branch if we asked them explicitly to test it?
Until we come up with a good strategy to deal with that, it might indeed be best to split this commit into two: a non-controversial commit we could merge right away (which would contain the TLS changes, the renaming of HAVE_NATIVE_TLS
to USE_NATIVE_TLS
and a bunch of other minor changes), and the rest (mainly: the guards changes).
791ac67
to
d9203e4
Compare
@rbehrends the commit here is titled "Make HPC-GAP work without Ward." but that's not really what this is about, given that we already are not using ward... |
etc/ci-prepare.sh
Outdated
unward/bin/unward --inplace src | ||
# commit the result to prevent the docomp test from triggering | ||
git commit -m "Unward" src |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rbehrends (CC: @stevelinton @ChrisJefferson ) regarding your first two points, it seems you already added --enable-guards
/ --disable-guards
(which is perfectly fine by me). Note sure around which inline functions you want(ed?) to add #ifdef HPCGAP
, but I guess it's fine, too.
Regarding how to proceed: My main concern about merging this PR is how it would affect people who are using HPC-GAP right now, but who do not yet know about unward and thus are not using it. Will they get a horribly broken HPC-GAP, or just one that's more or less on the same level as what we have in master
right now anyway?
If it was the latter, there is indeed nothing major stopping this from being merged (my remaining remarks are all very minor nitpicks and should be easy to resolve).
However, it seems to be the former (starting GAP built from your branch w/o running unward causes an immediate segfault due to violated guards). That makes me somewhat more reserved about the idea of merging this as-is. It'd be still OK by me if there was a very concrete timeline for it, and also a plan B should we determined that result of the merge is unusable. But my main concern is who the "couple of people" would be we'd ask to test it, and what makes us think they'd be more likely to test it then, than they would be to test this PR / your branch if we asked them explicitly to test it?
Until we come up with a good strategy to deal with that, it might indeed be best to split this commit into two: a non-controversial commit we could merge right away (which would contain the TLS changes, the renaming of HAVE_NATIVE_TLS
to USE_NATIVE_TLS
and a bunch of other minor changes), and the rest (mainly: the guards changes).
I'll address the remaining comments tomorrow, but about how to go forward with a merge: we could flip the default for |
d9203e4
to
4167799
Compare
4167799
to
5e87c82
Compare
Still has guards failures here:
gives me
Using
The method cache is per-thread, so the access should be fine; but I think we never set a region on those lists, hence the access error. @rbehrends can you take a look? |
@fingolfin Yes, I'm working on that. There are a couple more (and oddly enough, one I see only locally, not on Travis). |
15d1803
to
cd73786
Compare
There is still an error in the same place as before (the wt := qs!.collector![SCP_WEIGHTS];
wt{n+[1..nhwg]} := [1..nhwg] * 0 + class; # <- guard is violated here The guard is violated because |
(This also means there is an easy workaround, but we should first better understand what's going on, so we don't break multi threaded stuff) diff --git a/lib/pquot.gi b/lib/pquot.gi
index e30e26d0c..6c39cf5c8 100644
--- a/lib/pquot.gi
+++ b/lib/pquot.gi
@@ -502,8 +502,9 @@ UpdateWeightInfo := function( qs )
class := class + 1;
qs!.collector![SCP_CLASS] := class;
- wt := qs!.collector![SCP_WEIGHTS];
+ wt := ShallowCopy(qs!.collector![SCP_WEIGHTS]);
wt{n+[1..nhwg]} := [1..nhwg] * 0 + class;
+ qs!.collector![SCP_WEIGHTS] := MakeImmutable(wt);
avc2 := [1..n]+0;
for g in [1..n] do |
There is also a problem with the FactInt package in |
@rbehrends some optimization opportunities:
My suggestion here would be to change T_FUNCTION and T_BODY to not be in the public region by default; instead, we create them thread local, and then once we are done setting them up, change them to the readonly region (resp. for T_BODY, we might even be ok marking them as read-only, with the caveat that atomically setting those profiling bits must still work) |
This is a general optimization opportunity for all objects that are immutable. Currently, we have these tagged via I would, however, propose to postpone such optimizations until everything is working and integrated, unless you think there are good reasons to include them in this PR? |
No, there is no need to include those in this PR. BTW, there are more reasons for this than optimization: also safety / correctness. Right now, we create a lot of bags in the public region, then modify them extensively. This is OK because those objects are not visible to the outside while we are modifying them; and then later on, we never modify them (or only in a "safe") fashion. Or so we claim, but there is no good way to actually check that. But if we instead created those objects thread local, and only made them public once the setup is complete, then we could (at least in principle) forbid |
9db1e68
to
3fd7fb5
Compare
Reminder: once this PR is merged, we need to update https://github.com/gap-system/pkg-ci-scripts/blob/master/build_gap.sh so that |
e36cbb5
to
92c3aa3
Compare
92c3aa3
to
e478074
Compare
e478074
to
e1913b8
Compare
This is a tracking PR for progress on HPC-GAP changes. Aside from older changes in this PR, we also aim at improving our implementations of thread-local storage, specifically so that HPC-GAP can be used as a library.
Our current implementation requires HPC-GAP to be run as the main program, which in particular prevents its use as a library. Alternative portable implementations are considerably slower on systems that do not properly support efficient thread-local storage (such as macOS and CygWin).
We're attempting to address this problem in two ways.
One issue is implementations that rely on
pthread_getspecific()
directly or indirectly do not treat such function calls as pure functions and thus do not get used for common subexpression elimination or hoisted out of loops. As those systems all rely on gcc/clang, we can tagpthread_getspecific()
as__attribute__((pure))
, which will immediately lead to a measurable performance improvement.On macOS, we can go one step further. While there is no guaranteed stable ABI,
pthread_getspecific()
has consistently been implemented bymovq %gs:(,%rdi, 8), %rax
. We therefore can verify thatpthread_getspecific()
actually uses this implementation (memcmp()
against the machine code) and then use a direct assembly language implementation that has nearly the same performance characteristics as our current implementation.In addition, we store the key in a static variable that is initialized with an
__attribute__((constructor)__
function. This is generally slightly faster than putting it in a non-static variable on macOS in particular, as we're saving one level of indirection.Also, in the worst case, this code functions with just
pthread_getspecific()
and no other mechanism available, it is just slower.Todo:
__attribute__((pure))
and definition of a macro for it.__attribute__((constructor))
and definition of a macro for it.__pthread_getspecific()
as pure.