-
Notifications
You must be signed in to change notification settings - Fork 6
249 alp passes 0 as max calls to lpf collectives initialisation #302
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
249 alp passes 0 as max calls to lpf collectives initialisation #302
Conversation
…lectives. Decouple the non-standard (internal) collectives implementation and use from the grb::collectives implementation.
53c2765 to
01c4158
Compare
|
Crossref #303 |
|
Concept release notes while waiting for tests and reviews: This MR fixes a bug where While fixing the bug, this MR furthermore detaches the implementation of the public As always, the MR includes both related and unrelated code style fixes. |
…ublic collectives and internal collectives. Fixed, together with its code style
|
Some issues with smoke tests show that the previous pre/post amble programming pattern for collective communication is wholly incompatible with driving the LPF collectives as its specs require-- and sadly to the point #303 should be resolved now. Getting on with that now... (At this point though: all unit tests with LPF are OK) |
…tionalities: memcpy and folding a raw matrix column-wise into a raw vector.
…ble. Rationale: that coding pattern could not correctly capture the number of calls, number of bytes that may be reduced, nor the number of bytes that may be subject to other collectives. Removed implementation of reduce/allreduce of vector data -- the vector data should first be reduced locally, and then a reduction using a scalar (all)reduce should follow. TODO: remove the disabled code related to the latter removal in bsp1d/vector.hpp, tests/smoke/hook/collectives_blas1.cpp, and tests/smoke/hook/collectives_blas1_raw.cpp
|
Handled #303 as well with the latest commits. Running the full test suite with LPF (including perftests) to make sure there are no snafus that snuck in for the BSP1D and hybrid backends. |
|
There are snafus:
process A: Process B: Process B is still in daemon mode that process A is supposed to resolve via its call to while the trace confirms |
…casting between input and i/o domains on the one hand, and operator domains on the other hand. Also, the specification of grb::collectives<>::{allreduce,reduce} was broken, in that sometimes a monoid is needed. TODO: bring base specification up to date with this new implementation in BSP1D, after testing it has completed.
|
After merging in latest develop, found not all LPF tests complete successfully. This was to a bug regarding casting behaviour in the new allreduce implementation. This has been fixed by creating a generic allreduce/reduce implementation with corrected casting. However, fixing it found that sometimes a monoid identity is required for proper behaviour, which leads to a specification change. Before making that final in the base spec, now first making sure the tests are OK again. Another TODO remains to apply Alberto's suggested fixes. |
|
Quickly before flying out: the last commit is bugged again-- pinnedVector for BSP1D P=2 deadlocks while several other unit tests fail. The easiest to debug are likely set and capacity. |
…ectives-initialisation
…. Also updated the base spec.
…n-by-monoid (where a monoid was indeed available)
|
Updated concept release notes: This MR fixes a bug where Within LPF, there are two classes of collectives: the one exposed via the public API The revision of the collectives also now includes a systematic choice where collectives on scalars are orchestrated through the global buffer so as to prevent having to register the scalar memory addresses every time a collective is called. This is likely to cut latency costs significantly compared to the status prior to this MR. For collectives on arrays, the source and/or destination arrays will be registered which incurs a latency costs-- but a cost that, assuming large enough array sizes, is much lower than that of having to copy the payload to and from the global buffer. Features this MR introduces in realising the refactored collectives:
Other issues this MR addresses:
As always, the MR includes both related and unrelated code style fixes. |
|
Hi @alberto-scolari @KADichev this one is ready, pending CIs and a manual test run. @alberto-scolari your comments have been handled in line with the (resolved) discussions above. If one of you could flag it as approved by tomorrow morning we can merge it (if tests indeed all OK:) |
|
All tests and CIs OK at this point |
alberto-scolari
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good to me, previous issues were indeed addressed
|
Hi @KADichev -- this MR has been merged and is closed, but I'll retain the branch so you can still refer to the above comments in case useful for the LPF collectives |
Closes #249 .
Closes #303 .