Skip to content

Commit 40f33ca

Browse files
authored
249 alp passes 0 as max calls to lpf collectives initialisation (#302)
This MR fixes a bug where `lpf_coll_t` was not initialised properly. This bug never resulted in a noticeable issue, since the present implementation of the LPF collectives ignores the erroneous argument that ALP was passing in. The nevertheless not-up-to-spec initialisation of LPF collectives happened repeatedly throughout the code base. This MR totally overhauls the implementation of the collectives to address the issue. Within LPF, there are two classes of collectives: the one exposed via the public API `grb::collectives<>`, and raw internal collectives that are used in the implementation of e.g. the `BSP1D` and `hybrid` backends. Prior to this MR, the implementations of these collectives were disjoint, while with this MR, both types of collectives are managed by a shared context (internally encapsulated by the `BSP1D_Data` object). This central management means every process maintains but a single `lpf_coll_t` instance that is reused every time a public or internal collective is called. The revision of the collectives also now includes a systematic choice where collectives on scalars are orchestrated through the global buffer so as to prevent having to register the scalar memory addresses every time a collective is called. This is likely to cut latency costs significantly compared to the status prior to this MR. For collectives on arrays, the source and/or destination arrays will be registered which incurs a latency costs-- but a cost that, assuming large enough array sizes, is much lower than that of having to copy the payload to and from the global buffer. Features this MR introduces in realising the refactored collectives: - a notion of a final backend, which is the final backend a composed backend relies on. The FinalBackend class collects a set of minimal and low-level functions that all final backends should provide, such as memcpy; and - a utility that translates LPF error codes into corresponding ALP error codes. Other issues this MR addresses: - BSP1D did not seem to handle some border cases of `grb::setElement` correctly -- these would trigger only in out-of-memory conditions, or in erroneous use of the primitive, so impact of the bug is expected to be minor; - the `grb::setElement` and `grb::select` base specifications did not list all error conditions or did not do so correctly, herewith fixed; - a metabug in `grb::select` had in `_DEBUG` mode an attempt to print an iterator to `stdout`, instead of first dereferencing it-- herewith fixed. As always, the MR includes both related and unrelated code style fixes.
1 parent 88d1950 commit 40f33ca

36 files changed

+3831
-2796
lines changed

include/CMakeLists.txt

Lines changed: 10 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -44,15 +44,16 @@ set( root_files
4444
"graphblas/backends.hpp" "graphblas/benchmark.hpp"
4545
"graphblas/blas0.hpp" "graphblas/blas1.hpp" "graphblas/blas2.hpp"
4646
"graphblas/blas3.hpp" "graphblas/collectives.hpp" "graphblas/config.hpp"
47-
"graphblas/coordinates.hpp" "graphblas/descriptors.hpp" "graphblas/distribution.hpp"
48-
"graphblas/exec.hpp" "graphblas/identities.hpp" "graphblas/init.hpp"
49-
"graphblas/internalops.hpp" "graphblas/io.hpp" "graphblas/iomode.hpp"
50-
"graphblas/matrix.hpp" "graphblas/monoid.hpp" "graphblas/ops.hpp"
51-
"graphblas/phase.hpp" "graphblas/pinnedvector.hpp" "graphblas/properties.hpp"
52-
"graphblas/rc.hpp" "graphblas/semiring.hpp" "graphblas/spmd.hpp"
53-
"graphblas/tags.hpp" "graphblas/type_traits.hpp" "graphblas/utils.hpp"
54-
"graphblas/vector.hpp" "graphblas/synchronizedNonzeroIterator.hpp"
55-
"graphblas/nonzeroStorage.hpp" "graphblas/selection_ops.hpp"
47+
"graphblas/coordinates.hpp" "graphblas/descriptors.hpp"
48+
"graphblas/distribution.hpp" "graphblas/exec.hpp" "graphblas/final.hpp"
49+
"graphblas/identities.hpp" "graphblas/init.hpp" "graphblas/internalops.hpp"
50+
"graphblas/io.hpp" "graphblas/iomode.hpp" "graphblas/matrix.hpp"
51+
"graphblas/monoid.hpp" "graphblas/ops.hpp" "graphblas/phase.hpp"
52+
"graphblas/pinnedvector.hpp" "graphblas/properties.hpp" "graphblas/rc.hpp"
53+
"graphblas/semiring.hpp" "graphblas/spmd.hpp" "graphblas/tags.hpp"
54+
"graphblas/type_traits.hpp" "graphblas/utils.hpp" "graphblas/vector.hpp"
55+
"graphblas/synchronizedNonzeroIterator.hpp" "graphblas/nonzeroStorage.hpp"
56+
"graphblas/selection_ops.hpp"
5657
)
5758

5859
set( GRB_INCLUDE_INSTALL_DIR "${INCLUDE_INSTALL_DIR}/graphblas")

include/graphblas/base/blas3.hpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -511,7 +511,7 @@ namespace grb {
511511
* not match. All input data containers are left
512512
* untouched if this exit code is returned; it will be
513513
* be as though this call was never made.
514-
* @return #grb::FAILED If \a phase is #grb::EXECUTE, indicates that the
514+
* @return #grb::ILLEGAL If \a phase is #grb::EXECUTE, indicates that the
515515
* capacity of \a B was insufficient. The output
516516
* matrix \a B is cleared, and the call to this function
517517
* has no further effects.

include/graphblas/base/collectives.hpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -120,7 +120,7 @@ namespace grb {
120120
>
121121
static RC allreduce( IOType &inout, const Operator op = Operator() ) {
122122
(void) inout;
123-
(void)op;
123+
(void) op;
124124
return PANIC;
125125
}
126126

include/graphblas/base/final.hpp

Lines changed: 126 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,126 @@
1+
2+
/*
3+
* Copyright 2021 Huawei Technologies Co., Ltd.
4+
*
5+
* Licensed under the Apache License, Version 2.0 (the "License");
6+
* you may not use this file except in compliance with the License.
7+
* You may obtain a copy of the License at
8+
*
9+
* http://www.apache.org/licenses/LICENSE-2.0
10+
*
11+
* Unless required by applicable law or agreed to in writing, software
12+
* distributed under the License is distributed on an "AS IS" BASIS,
13+
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14+
* See the License for the specific language governing permissions and
15+
* limitations under the License.
16+
*/
17+
18+
/**
19+
* @file
20+
*
21+
* Describes raw functionalities that a so-called <em>final backend</em> should
22+
* provide.
23+
*
24+
* A final backend is one that other backends rely on for implementing core
25+
* computations.
26+
*
27+
* @author A. N. Yzelman
28+
* @date 13th of February, 2024
29+
*/
30+
31+
#ifndef _H_GRB_BASE_FINAL
32+
#define _H_GRB_BASE_FINAL
33+
34+
#include <string>
35+
36+
#include <graphblas/rc.hpp>
37+
#include <graphblas/ops.hpp>
38+
#include <graphblas/backends.hpp>
39+
#include <graphblas/descriptors.hpp>
40+
41+
42+
namespace grb::internal {
43+
44+
/**
45+
* This class gathers raw functionalities that non-final backends cannot
46+
* implement directly because it is unaware whether final computations should
47+
* occur in parallel or not, while, if it should execute in parallel, it is
48+
* unaware which parallelisation scheme it should employ.
49+
*
50+
* The base implementation defines all functions every final backend should
51+
* implement and provides a sequential implementation for each such function.
52+
* Therefore, only parallel final backends should override this class.
53+
*/
54+
template< grb::Backend backend >
55+
class FinalBackend {
56+
57+
public:
58+
59+
/**
60+
* Provides a basic memory copy.
61+
*
62+
* @param[out] out The area where to write to.
63+
* @param[in] in The memory area to copy from.
64+
* @param[in] size How many bytes should be copied.
65+
*
66+
* The input and output memory areas are not allowed to overlap.
67+
*/
68+
static void memcpy(
69+
void * const __restrict__ out,
70+
const void * const __restrict__ in,
71+
const size_t size
72+
) {
73+
(void) std::memcpy( out, in, size );
74+
}
75+
76+
/**
77+
* Folds (reduces) every column of a matrix into a vector.
78+
*
79+
* @tparam descr The descriptor to be taken into account.
80+
* @tparam IOType The type of vector \em and matrix elements.
81+
* @tparam OP The operator used for reduction.
82+
*
83+
* @param[in,out] out The output vector.
84+
*
85+
* Pre-existing values in \a out are reduced into.
86+
*
87+
* @param[in] matrix The matrix which should be column-wise reduced into
88+
* \a out.
89+
* @param[in] cols The colums of \a matrix.
90+
* @param[in] rows The rows of \a matrix.
91+
* @param[in] skip Which column of \a matrix to skip.
92+
*
93+
* Taking \a skip higher or equal to \a cols will mean no column is
94+
* skipped.
95+
*
96+
* @param[in] op The operator by which to reduce.
97+
*/
98+
template< grb::Descriptor descr, typename IOType, typename OP >
99+
static void foldMatrixToVector(
100+
IOType * const __restrict__ out,
101+
const IOType * const __restrict__ matrix,
102+
const size_t cols, const size_t rows,
103+
const size_t skip,
104+
const OP &op
105+
) {
106+
(void) op;
107+
if( skip >= cols ) {
108+
for( size_t j = 0; j < cols; ++j ) {
109+
OP::eWiseFoldlAA( out, matrix + j * rows, rows );
110+
}
111+
} else {
112+
for( size_t j = 0; j < skip; ++j ) {
113+
OP::eWiseFoldlAA( out, matrix + j * rows, rows );
114+
}
115+
for( size_t j = skip + 1; j < cols; ++j ) {
116+
OP::eWiseFoldlAA( out, matrix + j * rows, rows );
117+
}
118+
}
119+
}
120+
121+
};
122+
123+
}
124+
125+
#endif // end ``_H_GRB_BASE_FINAL´´
126+

include/graphblas/base/internalops.hpp

Lines changed: 9 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2836,6 +2836,7 @@ namespace grb {
28362836
/** The output domain. */
28372837
typedef typename OP::result_type D3;
28382838

2839+
28392840
public:
28402841

28412842
/** @return Whether this operator is mathematically associative. */
@@ -2868,7 +2869,10 @@ namespace grb {
28682869
* @param[out] z The output element.
28692870
*/
28702871
template< typename InputType1, typename InputType2, typename OutputType >
2871-
static void apply( const InputType1 & x, const InputType2 & y, OutputType & z ) {
2872+
static void apply(
2873+
const InputType1 &x, const InputType2 &y,
2874+
OutputType &z
2875+
) {
28722876
const D1 a = static_cast< D1 >( x );
28732877
const D2 b = static_cast< D2 >( y );
28742878
D3 temp;
@@ -2881,7 +2885,7 @@ namespace grb {
28812885
* casting is required. This version will be automatically caled whenever
28822886
* possible.
28832887
*/
2884-
static void apply( const D1 & x, const D2 & y, D3 & out ) {
2888+
static void apply( const D1 &x, const D2 &y, D3 &out ) {
28852889
OP::apply( &x, &y, &out );
28862890
}
28872891

@@ -2946,7 +2950,7 @@ namespace grb {
29462950
* @param[in,out] y The value \a x is to be applied against.
29472951
*/
29482952
template< typename InputType, typename IOType >
2949-
static void foldr( const InputType & x, IOType & y ) {
2953+
static void foldr( const InputType &x, IOType &y ) {
29502954
typedef typename OperatorBase< OP >::D2 D2;
29512955
const D2 cache = static_cast< D2 >( y );
29522956
OperatorBase< OP >::apply( x, cache, y );
@@ -3110,6 +3114,7 @@ namespace grb {
31103114
typedef typename OperatorBase< OP >::D3 D3;
31113115
static constexpr size_t blocksize = OperatorBase< OP >::blocksize;
31123116

3117+
31133118
public:
31143119

31153120
/**
@@ -3127,7 +3132,7 @@ namespace grb {
31273132
* @param[in] x The value that is to be applied to \a y.
31283133
* @param[in,out] y The value \a x is to be applied against.
31293134
*/
3130-
static void foldr( const D1 & x, D3 & y ) {
3135+
static void foldr( const D1 &x, D3 &y ) {
31313136
OP::foldr( &x, &y );
31323137
}
31333138

include/graphblas/base/io.hpp

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1305,6 +1305,14 @@ namespace grb {
13051305
* the default is #grb::EXECUTE.
13061306
*
13071307
* @return #grb::SUCCESS Upon successful execution of this operation.
1308+
* @return #grb::OUTOFMEM If the capacity of \a x could not be resized to
1309+
* store a new value at coordinate \a i while \a phase
1310+
* is #grb::RESIZE.
1311+
* @return #grb::ILLEGAL If the capacity of \a x is insufficient to store a
1312+
* new value at coordinate \a i while \a phase is
1313+
* #grb::EXECUTE.
1314+
* @return #grb::ILLEGAL If \a x on input is not dense while the
1315+
* #grb::descriptors::dense descriptor was given.
13081316
* @return #grb::MISMATCH If \a i is greater or equal than the dimension of
13091317
* \a x.
13101318
*

include/graphblas/base/spmd.hpp

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -88,7 +88,9 @@ namespace grb {
8888
* initiating messages.
8989
* \endinternal
9090
*/
91-
static enum RC sync( const size_t msgs_in = 0, const size_t msgs_out = 0 ) noexcept {
91+
static enum RC sync(
92+
const size_t msgs_in = 0, const size_t msgs_out = 0
93+
) noexcept {
9294
(void) msgs_in;
9395
(void) msgs_out;
9496
return PANIC;

0 commit comments

Comments
 (0)