This repository has been archived by the owner on Apr 15, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Kokkos_PG_Views.tex
736 lines (629 loc) · 32.2 KB
/
Kokkos_PG_Views.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
\chapter{View: Multidimensional array}\label{C:View}
After reading this chapter, you should understand the following:
\begin{itemize}
\item A Kokkos View is an array of zero or more dimensions
\item How to use View's first template parameter to specify the type
of entries, the number of dimensions, and whether the dimensions are
determined at run time or compile time
\item Kokkos handles array deallocation automatically
\item Kokkos chooses array layout at compile time for best overall
performance, as a function of the computer architecture
\item How to set optional template parameters of View for low-level
control of execution space, layout, and how Kokkos accesses array
entries
\end{itemize}
In all code examples in this chapter, we assume that all classes in
the \lstinline!Kokkos! namespace have been imported into the working
namespace.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Why Kokkos needs multidimensional arrays}\label{S:View:Why}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Many scientific and engineering codes spend a lot of time computing
with arrays of data. Programmers invest a lot of effort making array
computations as fast as possible. This effort is often intimately
bound to details of the computer architecture, run-time environment,
language, and programming model. For example, optimal array layout
may differ based on the architecture, with a large integer factor
penalty if wrong. Low-level issues like pointer alignment, array
layout, indexing overhead, and initialization all affect performance.
This is true even with sequential codes. Thread parallelism adds even
more pitfalls, like first-touch allocation and false sharing.
For best performance, coders need to tie details of how they manage
arrays, to details of how parallel code accesses and manages those
arrays. Programmers who write architecture-specific code then need to
mix low-level features of the architecture and programming model into
high-level algorithms. This makes it hard to port codes between
architectures, or to write a code that still performs well as
architectures evolve.
Kokkos aims to relieve some of this burden, by optimizing array
management and access for the specific architecture. Tying arrays to
shared-memory parallelism lets Kokkos optimize the former to the
latter. For example, Kokkos can easily do first-touch allocation,
because it controls threads that it can use to initialize arrays.
Kokkos' architecture awareness lets it pick optimal layout and pad
allocations for good alignment. Expert coders can also use Kokkos to
access low-level or more architecture-specific optimizations in a more
user-friendly way. For instance, Kokkos makes it easy to experiment
with different array layouts.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Creating and using a View}\label{S:View:CreateUse}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Constructing a View}
A View is an array of zero or more dimensions. Programmers set both
the type of entries, and the number of dimensions, at compile time, as
part of the type of the View. For example, the following specifies
and allocates a View with four dimensions, whose entries have type
\texttt{int}:
\begin{lstlisting}
const size_t N0 = ...;
const size_t N1 = ...;
const size_t N2 = ...;
const size_t N3 = ...;
View<int****> a ("some label", N0, N1, N2, N3);
\end{lstlisting}
The string argument is a label, which Kokkos uses for debugging.
Different Views may have the same label. The ellipses indicate some
integer dimensions specified at run time. Users may also set some
dimensions at compile time. For example, the following View has two
dimensions. The first one (represented by the asterisk) is a run-time
dimension, and the second (represented by [3]) is a compile-time
dimension. Thus, the View is an N by 3 array of type double, where N
is specified at run time in the View's constructor.
\begin{lstlisting}
const size_t N = ...;
View<double*[3]> b ("another label", N);
\end{lstlisting}
Views may have up to (at least) 8 dimensions, and any number of these
may be run-time or compile-time. The only limitation is that the
run-time dimensions (if any) must appear first, followed by all the
compile-time dimensions (if any). For example, the following are
valid three-dimensional View types:
\begin{itemize}
\item \lstinline!View<int***>! (3 run-time dimensions)
\item \lstinline!View<int**[8]>! (2 run-time, 1 compile-time)
\item \lstinline!View<int*[3][8]>! (1 run-time, 2 compile-time)
\item \lstinline!View<int[4][3][8]>! (3 compile-time)
\end{itemize}
and the following are \emph{not} valid three-dimensional View types:
\begin{itemize}
\item \lstinline!View<int[4]**>!
\item \lstinline!View<int[4][3]*>!
\item \lstinline!View<int[4]*[8]>!
\item \lstinline!View<int*[3]*>!
\end{itemize}
This limitation comes from the implementation of View using C++
templates. View's first template parameter must be a valid C++ type.
Note that the above used constructor is not necessarily available
for all view types. Specific Layouts or Memory Spaces may require more
specialized allocators. This will be discussed later.
Another important thing to keep in mind is that a \lstinline|View| handle is a
stateful object. It is not legal to create a \lstinline|View| handle from
raw memory by typecasting a pointer. To call any operator on a \lstinline|View|
its constructor must have been called before. This includes the assignment
operator. If it is necessary to initialize raw memory with a \lstinline|View|
handle, one can legally do so using a move constructor (``placement new'').
The above has nothing to do with the data a \lstinline|View| is referencing.
It is completely legal to give a typecast pointer to the constructor of an
unmanaged \lstinline|View|.
\begin{lstlisting}
View<int*> * a_ptr = (View<int*>*) malloc(10*sizeof(View<int*);
a_ptr[0] = View<int*>("A0",1000); // This is illegal
new(&a_ptr[1]) View<int*>("A1",10000); // This is legal
\end{lstlisting}
\subsection{What types of data may a View contain?}
C++ lets users construct data types that may ``look like'' numbers in
terms of syntax, but do arbitrarily complicated things inside. Some
of those things may not be thread safe, like unprotected updates to
global state. Others may perform badly in parallel, like fine-grained
dynamic memory allocation. Therefore it is strongly advised to use
only simple data types inside Kokkos Views.
Users may always construct a View whose entries are
\begin{itemize}
\item built-in data types (``plain old data''), like \texttt{int} or
\texttt{double}, or
\item structs of built-in data types.
\end{itemize}
While it is in principal possible to have Kokkos Views of arbitrary
objects there are a number of restrictions. The \lstinline|T| type
must have a default constructor and destructor. For portability reasons
\lstinline|T| is not allowed to have virtual functions and one is not
allowed to call functions with dynamic memory allocations inside of
kernel code. Furthermore assignment operators as well as default constructor
and deconstructor must be marked with \lstinline|KOKKOS_[INLINE_]FUNCTION|.
Keep in mind that the semantics of the resulting View are a combination of the
Views 'view' semantics and the behaviour of the element type.
\subsection{Const Views}
A view can have const data semantics (i.e. its entries are read-only) by
specifying a `const` data type. It is a compile-time error to assign to an
entry of a ``const View''. Assignment semantics are equivalent to a pointer to
const data.
A const View means the \emph{entries} are const. You
may still assign to a const View. \lstinline!View<const double*>!
corresponds exactly to \lstinline!const double*!, and
\lstinline!const View<double*>! to \lstinline!double* const!.
Therefore it does not make sense to allocate a const View,
since you could not obtain a non-const view of the same data and you
can not assign to it. You can however assign a non-const view to a const
view. Here is an example:
\begin{lstlisting}
const size_t N0 = ...;
View<double*> a_nonconst ("a_nonconst", N0);
// Assign a nonconst View to a const View
View<const double*> a_const = a_nonconst;
// Pass the const View to some read-only function.
const double result = readOnlyFunction (a_const);
\end{lstlisting}
Const Views often enables the compiler to optimize more aggressively by allowing it
to reason about possible write conflicts and data aliasing. For example in a vector
update \lstinline|a(i+1)+=b(i)| with skewed indexing it is safe to vectorize if
\lstinline|b| is a View of const data.
\subsection{Accessing entries (indexing)}
You may access an entry of a View using parentheses enclosing a
comma-delimited list of integer indices. This looks just like a
Fortran multidimensional array access. For example:
\begin{lstlisting}
const size_t N = ...;
View<double*[3][4]> a ("some label", N);
// KOKKOS_LAMBDA macro includes capture-by-value specifier [=].
parallel_for (N, KOKKOS_LAMBDA (const ptrdiff_t i) {
const size_t j = ...;
const size_t k = ...;
const double a_ijk = a(i,j,k);
/* rest of the loop body */
});
\end{lstlisting}
Note how in the above example, we only access the View's entries in a
parallel loop body. In general, you may only access a View's entries
in an execution space which is allowed to access that View's memory
space. For example, if the default execution space is \lstinline!Cuda!,
a View for which no specific Memory Space was given may not be accessed
in host code \footnote{An exemption is if you specified for CUDA compilation
that the default memory space is CudaUVMSpace, which can be accessed from
the host.}.
Furthermore, access costs (e.g., latency and
bandwidth) may vary, depending on the View's ``native'' memory and
execution spaces, and the execution space from which you access it.
CUDA UVM may work, but it may also be slow, depending on your access
pattern and performance requirements. Thus, best practice is to
access the View only in a Kokkos parallel for, reduce, or scan, using
the same execution space as the View. This also ensures that access
to the View's entries respect first-touch allocation. The first
(leftmost) dimension of the View is the \emph{parallel dimension},
over which it is most efficient to do parallel array access if the
default memory layout is used (e.g. if no specific memory layout is
specified).
\subsection{Reference counting}
Kokkos automatically manages deallocation of Views, through a
reference-counting mechanism. Otherwise, Views behave like raw
pointers. Copying or assigning a View does a shallow copy, and
changes the reference count. (The View copied has its reference count
incremented, and the assigned-to View has its reference count
decremented.) A View's destructor (called when the View falls out of
scope or during a stack unwind due to an exception) decrements the
reference count. Once the reference count reaches zero, Kokkos may
deallocate the View.
For example, the following code allocates two Views, then assigns one
to the other. That assignment may deallocate the first View, since it
reduces its reference count to zero. It then increases the reference
count of the second View, since now both Views point to it.
\begin{lstlisting}
View<int*> a ("a", 10);
View<int*> b ("b", 10);
a = b; // assignment does shallow copy
\end{lstlisting}
For efficiency, View allocation and reference counting turn off inside
of Kokkos' parallel for, reduce, and scan operations. This affects
what you can do with Views inside of Kokkos' parallel operations.
\subsection{Resizing}
Kokkos Views can be resized using the \lstinline|resize| non-member function.
It takes an existing view as its input by reference and the new dimension information
corresponding to the constructor arguments. A new view with the new dimensions will
be created and a kernel will be run in the views execution space to copy the data
element by element from the old view to the new one. Note that the old allocation is
only deleted if the view to be resized was the {\it only} view referencing the underlying
allocation.
\begin{lstlisting}
// Allocate a view with 100x50x4 elements
View<int**[4]> a( "a", 100,50);
// Resize a to 200x50x4 elements; the original allocation is freed
resize(a, 200,50);
// Create a second view b viewing the same data as a
View<int**[4]> b = a;
// Resize a again to 300x60x4 elements; b is still 200x50x4
resize(a,300,60);
\end{lstlisting}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Layout}\label{S:View:Layout}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Strides and dimensions}\label{SS:View:Layout:Strides}
\emph{Layout} refers to the mapping from a logical multidimensional
index $(i, j, k, \dots)$ to a physical memory offset. Different
programming languages may have different layout conventions. For
example, Fortran uses \emph{column-major} or ``left'' layout, where
consecutive entries in the same column of a 2-D array are contiguous
in memory. Kokkos calls this \lstinline!LayoutLeft!. C, C++, and Java use
\emph{row-major} or ``right'' layout, where consecutive entries in the
same row of a 2-D array are contiguous in memory. Kokkos calls this
\lstinline!LayoutRight!.
The generalization of both left and right layouts is ``strided.'' For
a strided layout, each dimension has a \emph{stride}. The stride for
that dimension determines how far apart in memory two array entries
are, whose indices in that dimension differ only by one, and whose
other indices are all the same. For example, with a 3-D strided view
with strides $(s_1, s_2, s_3)$, entries $(i, j, k)$ and $(i, j+1, k)$
are $s_2$ entries (not bytes) apart in memory. Kokkos calls this
\lstinline!LayoutStride!.
Strides may differ from dimensions. For example, Kokkos reserves the
right to pad each dimension for cache or vector alignment. You may
access the dimensions of a View using the \lstinline!dimension! method,
which takes the index of the dimension.
Strides are accessed using the \lstinline!stride! method. It takes a raw
integer array, and only fills in as many entries as the rank of the View.
For example:
\begin{lstlisting}
const size_t N0 = ...;
const size_t N1 = ...;
const size_t N2 = ...;
View<int***> a ("a", N0, N1, N2);
int dim1 = a.dimension (1); // returns dimension 1
size_t strides[3]
a.strides (dims); // fill 'strides' with strides
\end{lstlisting}
You may also refer to specific dimensions without a runtime parameter:
\begin{lstlisting}
const size_t n0 = a.dimension_0 ();
const size_t n2 = a.dimension_2 ();
\end{lstlisting}
Note the return type of \lstinline|dimension_N()| is the \lstinline|size_type| of the
views memory space. This causes some issues if warning free compilation
should be achieved since it will typically be necessary to cast the return value.
In particular in cases where the \lstinline|size_type| is more conservative than
required it can be beneficial to cast the value to \lstinline|int|, since signed 32 bit
integers typically give the best performance when used as index types. In index heavy
codes this performance difference can be significant compared to using \lstinline|size_t|
since the vector length on many architectures is twice as long for 32 bit values as for
64 bit values, and signed integers have less stringent overflow testing requirements than
unsigned integers.
Users of the BLAS and LAPACK libraries may be familiar with the ideas
of layout and stride. These libraries only accept matrices in
column-major format. The stride between consecutive entries in the
same column is 1, and the stride between consecutive entries in the
same row is \lstinline!LDA! (``leading dimension of the matrix A''). The
number of rows may be less than \lstinline!LDA!, but may not be greater.
\subsection{Other layouts}\label{SS:View::Layout::Other}
Other layouts are possible. For example, Kokkos has a ``tiled''
layout, where a tile's entries are stored contiguously (in either row-
or column-major order) and tiles have compile-time dimensions. One
may also use Kokkos to implement Morton ordering or variants thereof.
In order to write a custom layout one has to define a new layout class and
specialise the \lstinline|ViewMapping| class for that layout.
The \lstinline|ViewMapping| class implements the offset operator as well
as stride calculation for regular layouts.
A good way to start such a customization is by copying the implementation of
\lstinline|LayoutLeft| and its associated \lstinline|ViewMapping| specialization,
renaming the layout and then change the offset operator.
\subsection{Default layout depends on execution space}\label{SS:View:Layout:Default}
Kokkos selects a View's default layout for optimal parallel access
over the leftmost dimension, based on its execution space. For
example, \lstinline!View<int**, Cuda>! has \lstinline!LayoutLeft!, so that
consecutive threads in the same warp access consecutive entries in
memory. This \emph{coalesced access} gives the code better memory
bandwidth.
In contrast, \lstinline!View<int**, OpenMP>! has
\lstinline!LayoutRight!, so that a single thread accesses contiguous
entries of the array. This avoids wasting cache lines and helps
prevent false sharing of a cache line between threads.
In section \ref{S:View:Placement} more details will be discussed.
\subsection{Explicitly specifying layout}\label{SS:View:Layout:Explicit}
We prefer that users let Kokkos determine a View's layout, based on
its execution space. However, sometimes you really need to specify
the layout. For example, the BLAS and LAPACK libraries only accept
column-major arrays. If you want to give a View to the BLAS or
LAPACK library, that View must be \lstinline!LayoutLeft!. You may specify the
layout as a template parameter of View. For example:
\begin{lstlisting}
const size_t N0 = ...;
const size_t N1 = ...;
View<double**, LayoutLeft> A ("A", N0, N1);
// Get 'LDA' for BLAS / LAPACK
int strides[2]; // any integer type works in stride()
A.stride (strides);
const int LDA = strides[1];
\end{lstlisting}
You may ask a View for its layout via its \lstinline!array_layout! typedef.
This can be helpful for C++ template metaprogramming. For example:
\begin{lstlisting}
template<class ViewType>
void callBlas (const ViewType& A) {
typedef typename ViewType::array_layout array_layout;
if (std::is_same<array_layout, LayoutLeft>::value) {
callSomeBlasFunction (A.data(), ...);
} else {
throw std::invalid_argument ("A is not LayoutLeft");
}
}
\end{lstlisting}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Managing Data Placement}\label{S:View:Placement}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Memory spaces}
Views are allocated by default, in the default execution space's
default memory space. You may access the View's execution space via
its \lstinline!execution_space! typedef, and its memory space via its
\lstinline!memory_space! typedef. You may also specify the memory space
explicitly as a template parameter. For example, the following
allocates a View in CUDA device memory:
\begin{lstlisting}
View<int*, CudaSpace> a ("a", 100000);
\end{lstlisting}
and the following allocates a View in ``host'' memory, using the default
host execution space for first-touch initialization:
\begin{lstlisting}
View<int*, HostSpace> a ("a", 100000);
\end{lstlisting}
Since there is no bijective association between execution spaces and memory
spaces, Kokkos provides a way to explicitly provide both to a View as a
\lstinline|Device|.
\begin{lstlisting}
View<int*, Device<Cuda,CudaUVMSpace> > a ("a", 100000);
View<int*, Device<OpenMP,CudaUVMSpace> > b ("b", 100000);
\end{lstlisting}
In this case \lstinline|a| and \lstinline|b| will live in the same memory space,
but \lstinline|a| will be initialized on the GPU while \lstinline|b| will be
initialized on the host.
The \lstinline|Device| type can be accessed as a views \lstinline|device_type|
typedef. A \lstinline|Device| has only three typedef members
\lstinline|device_type|, \lstinline|execution_space|, and \lstinline|memory_space|.
The \lstinline|execution_space|, and \lstinline|memory_space| typedefs are
the same for a view and the \lstinline|device_type| typedef.
It is important to understand that accessibility of a View does not
depend on its execution space directly. It is only determined by its
memory space. Therefore both \lstinline|a| and \lstinline|b| have the
same access properties. They differ only in how they are initialized,
as well as in where parallel kernels associated with operations such
as resizing or deep copies are run.
\subsection{Initialization}
A View's entries are initialized to zero by default. Initialization
happens in parallel, for first-touch allocation over the first
(leftmost) dimension of the View using the execution space
of the View.
You may allocate a View without initializing. For example:
\begin{lstlisting}
View<int*> x (ViewAllocateWithoutInitializing (label), 100000);
\end{lstlisting}
This is useful in situations where your dominant use of the View
exhibits a complicated access pattern. In this case it is best
to run the most costly kernel directly after initialization to
execute the first touch pattern and get optimal memory affinity.
\subsection{Deep copy and HostMirror}
Copying data from one view to another, in particular between views
in different memory spaces, is called deep copy.
Kokkos never performs a hidden deep copy. To do so a user has to
call the \lstinline!deep_copy! function. For example:
\begin{lstlisting}
View<int*> a ("a", 10);
View<int*> b ("b", 10);
deep_copy (a, b); // copy contents of b into a
\end{lstlisting}
Deep copies can only be performed between views with an identical
memory layout and padding. For example the following two operations
are not valid:
\begin{lstlisting}
View<int*[3], CudaSpace> a ("a", 10);
View<int*[3], HostSpace> b ("b", 10);
deep_copy (a, b); // This will give a compiler error
View<int*[3], LayoutLeft, CudaSpace> c ("c", 10);
View<int*[3], LayoutLeft, HostSpace> d ("d", 10);
deep_copy (c, d); // This might give a runtime error
\end{lstlisting}
The first one will not work because the default layouts of \lstinline|CudaSpace|
and \lstinline|HostSpace| are different. The compiler will catch that since
no overload of the \lstinline|deep_copy| function exists to copy view from
one layout to another. The second case will fail at runtime if padding settings
are different for the two memory spaces. This would result in different
allocation sizes and thus prevent a direct memcopy.
The reasoning for allowing only direct bitwise copies is that a deep copy
between different memory spaces would otherwise require a temporary
copy of the data to which a bitwise copy is performed followed by a parallel
kernel to transfer the data element by element.
Kokkos provides the following way to work around those limitations.
First views have a \lstinline|HostMirror| typedef which is a view type with
compatible layout inside the \lstinline|HostSpace|. Additionally there is a
\lstinline|create_mirror| and \lstinline|create_mirror_view| function which
allocate views of the \lstinline|HostMirror| type of a view.
The difference between the two is that \lstinline|create_mirror| will always
allocate a new view, while \lstinline|create_mirror_view| will only create a
new view if the original one is not in \lstinline|HostSpace|.
\begin{lstlisting}
View<int*[3], MemorySpace> a ("a", 10);
// Allocate a view in HostSpace with the
// layout and padding of a
typename View<int*[3], MemorySpace>::HostMirror b =
create_mirror(a);
// This is always a memcopy
deep_copy (a, b);
typename View<int*[3]>::HostMirror c =
create_mirror_view(a);
// This is a no-op if MemorySpace is HostSpace
deep_copy (a,c)
\end{lstlisting}
\subsection{How do I get the raw pointer?}
We discourage access to a View's ``raw'' pointer. This circumvents
reference counting. That is, the memory may be deallocated once the
View's reference count goes to zero, so holding on to a raw pointer
may result in invalid memory access. Furthermore, it may not even be
possible to access the View's memory from a given execution space.
For example, a View in the \lstinline!Cuda! space points to CUDA device
memory. Also using raw pointers would normally defeat the usability
of polymorpic layouts and automatic padding.
Nevertheless, sometimes you really need access to the
pointer. For such cases, we provide the \lstinline!data()!
method. For example:
\begin{lstlisting}
// Legacy function that takes a raw pointer.
extern void legacyFunction (double* x_raw, const size_t len);
// Your function that takes a View.
void myFunction (const View<double*>& x) {
// DON'T DO THIS UNLESS YOU MUST
double* a_raw = a.data ();
const size_t N = x.dimension_0 ();
legacyFunction (a_raw, N);
}
\end{lstlisting}
A user is in most cases also allowed to obtain a pointer to a specific
element via the usual \lstinline|&| operator. For example
\begin{lstlisting}
// Legacy function that takes a raw pointer.
void someLibraryFunction (double* x_raw);
KOKKOS_INLINE_FUNCTION
void foo(const View<double*>& x) {
someLibraryFunction(&x[3]);
}
\end{lstlisting}
This is only valid if a Views reference type is an \lstinline|lvalue|.
That property can be queried statically at compile time from the view through
its \lstinline|reference_is_lvalue| member.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Access traits}\label{S:View:AccessTraits}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Another way to get optimized data accesses is to specify a memory trait.
These traits are used to declare intended use of the particular view of
an allocation. For example a particular kernel might use a view only for
streaming writes. By declaring that intention Kokkos can insert the
appropriate store intrinsics on each architecture if available. Access traits
are specified through an optional template parameter which comes last in the list
of parameters. Multiple traits can be combined with binary ``or'' operators:
\begin{lstlisting}
View<double*, MemoryTraits<SomeTrait> > a;
View<const double*, MemoryTraits<SomeTrait | SomeOtherTrait> > b;
View<int*, LayoutLeft, MemoryTraits<SomeTrait | SomeOtherTrait> > c;
View<int*, MemorySpace, MemoryTraits<SomeTrait | SomeOtherTrait> > d;
View<int*, LayoutLeft, MemorySpace, MemoryTraits<SomeTrait> > e;
\end{lstlisting}
\subsection{Atomic access}
The \lstinline!Atomic! access trait lets you create a View of data , such that every
read or write to any entry uses an atomic update. Kokkos supports atomics for
all data types independent of size. Restrictions are that you are (i) not allowed to alias data
for which atomic operations are performed, and the results of non atomic accesses (
including read) to data which is at the same time atomically accessed is not defined.
Performance characteristics of atomic operation depend on the data type.
Some types (in particular integer types) are natively supported and might even provide
asynchronous atomic operations. Others (such as 32 bit and 64 bit atomics for non
integer types) are often implemented using CAS loops of integers.
Everything else is implemented with a locking approach, where an atomic operations
aquires a lock based on a hash of the pointer value of the data element.
Types for which atomic access are performed must support the
necessary operators such as =,+=,-=,+,- etc. as well as have a number of \lstinline|volatile|
overloads of functions such as assign and copy constructors defined.
\begin{lstlisting}
View<int*> a("a" , 100);
View<int*, MemoryTraits<Atomic> > a_atomic = a;
a_atomic(1) += 1; // This access will do an atomic addition
\end{lstlisting}
\subsection{Random Access}
The \lstinline|RandomAccess| trait declares the intent to access a View
irregularly (in particular non consecutively). If used for a const View
in the \lstinline|CudaSpace| or \lstinline|CudaUVMSpace|, Kokkos will use
texture fetches for accesses when executing in the \lstinline|Cuda| execution
space.
For example:
\begin{lstlisting}
const size_t N0 = ...;
View<int*> a_nonconst ("a", N0); // allocate nonconst View
// Assign to const, RandomAccess View
View<const int*, RandomAccess> a_ra = a_nonconst;
\end{lstlisting}
If the default execution space is \lstinline!Cuda!, access to a
\lstinline!RandomAccess! View may use CUDA texture fetches. Texture
fetches are not cache coherent with respect to writes, which is why
you must use read-only access. The texture cache is optimized for
noncontiguous access, since it has a shorter cache line than the
regular cache.
While \lstinline!RandomAccess! is valid for other execution spaces,
currently no specific optimizations are performed. But in the future a view
allocated with the \lstinline!RandomAccess! attribute might for example
use a larger page size, and thus reducing page faults in the memory
system.
\subsection{Standard idiom for specifying access traits}
The standard idiom for View is to pass it around using as few template
parameters as possible. Then, assign to a View with the desired
access traits only at the ``last moment,'' when those access traits
are needed just before entering a computational kernel. This lets you
template C++ classes and functions on the View type, without
proliferating instantiations. Here is an example:
\begin{lstlisting}
// Compute a sparse matrix-vector product, for a sparse
// matrix stored in compressed sparse row (CSR) format.
void
spmatvec (const View<double*>& y,
const View<const size_t*>& ptr,
const View<const int*>& ind,
const View<const double*>& val,
const View<const double*>& x)
{
// Access to x has less locality than access to y.
View<const double*, RandomAccess> x_ra = x;
typedef View<const size_t*>::size_type size_type;
parallel_for (y.dimension_0 (), KOKKOS_LAMBDA (const size_type i) {
double y_i = y(i);
for (size_t k = ptr(i); k < ptr(i+1); ++k) {
y_i += val(k) * x_ra(ind(k));
}
y(i) = y_i;
});
}
\end{lstlisting}
\subsection{Unmanaged Views}
It's always better to let Kokkos control memory allocation, but
sometimes you don't have a choice. You might have to work with an
application or an interface that returns a raw pointer, for example.
Kokkos lets you wrap raw pointers in an \emph{unmanaged View}.
``Unmanaged'' means that Kokkos does \emph{not} do reference counting
or automatic deallocation for those Views. The following example
shows how to create an unmanaged View of host memory. You may do this
for CUDA device memory too, or indeed for memory allocated in any
memory space, by specifying the View's execution or memory space
accordingly.
\begin{lstlisting}
// Sometimes other code gives you a raw pointer, ...
const size_t N0 = ...;
double* x_raw = malloc (N0 * sizeof (double));
{
// ... but you want to access it with Kokkos.
//
// malloc() returns host memory, so we use the
// host memory space HostSpace. Unmanaged
// Views have no label, because labels work with
// the reference counting system.
View<double*, HostSpace, MemoryTraits<Unmanaged> >
x_view (x_raw, N0);
functionThatTakesKokkosView (x_view);
// It's safest for unmanaged Views to fall out of
// scope, before freeing their memory.
}
free (x_raw);
\end{lstlisting}
\subsection{Conversion Rules and Function Specialization}
Not all view types can be assigned to each other. In summary
the data type and dimension have to match, the layout must be
compatible and the memory space has to match. Examples illustrating
the rules are:
\begin{enumerate}
\item Memory Spaces must match
\subitem \lstinline{ Kokkos::View<int*> -> Kokkos::View<int*,HostSpace>}
\subitem \lstinline{ //ok if default memory space is HostSpace}
\item Data Type and Rank has to Match
\subitem \lstinline{int* -> int* //ok}
\subitem \lstinline{int* -> const int* //ok}
\subitem \lstinline{const int* -> int* //not ok, const violation}
\subitem \lstinline{int** -> int* //not ok, rank mismatch}
\subitem \lstinline{int*[3] -> int** //ok}
\subitem \lstinline{int** -> int*[3] //ok if runtime dimension check matches}
\subitem \lstinline{int* -> long* //not ok, type mismatch}
\item Layouts must be compatible
\subitem \lstinline{LayoutRight -> LayoutRight //ok}
\subitem \lstinline{LayoutLeft -> LayoutRight //not ok}
\subitem \lstinline{LayoutLeft -> LayoutSride //ok}
\subitem \lstinline{LayoutStride -> LayoutLeft //ok if runtime dimensions allow assignment}
\item Memory Traits
\end{enumerate}