|
112 | 112 | "cell_type": "markdown",
|
113 | 113 | "metadata": {},
|
114 | 114 | "source": [
|
115 |
| - "## Exercise 0: from raw DAXPY loop to serial C++ transform algorithm\n", |
| 115 | + "## Exercise 1: from raw DAXPY loop to serial C++ transform algorithm\n", |
116 | 116 | "\n",
|
117 | 117 | "The goal of this first exercise is to re-write the raw DAXPY loop using the C++ standard library `transform` algorithms (see the documentation of [transform] to pick the right overload - number (3)).\n",
|
118 | 118 | "\n",
|
119 | 119 | "[transform]: https://en.cppreference.com/w/cpp/algorithm/transform\n",
|
120 | 120 | "\n",
|
121 |
| - "A template for the solution is provided in [exercise0.cpp]. The `TODO`s indicate the parts of the template that must be completed.\n", |
| 121 | + "A template for the solution is provided in [exercise1.cpp]. The `TODO`s indicate the parts of the template that must be completed.\n", |
122 | 122 | "To complete this first exercise, the `daxpy` function needs to be rewritten to use the C++ standatd library algorithms and this will require adding some headers:\n",
|
123 | 123 | "\n",
|
124 | 124 | "```c++\n",
|
|
133 | 133 | "}\n",
|
134 | 134 | "```\n",
|
135 | 135 | "\n",
|
136 |
| - "[exercise0.cpp]: ./exercise0.cpp\n", |
| 136 | + "[exercise1.cpp]: ./exercise1.cpp\n", |
137 | 137 | "\n",
|
138 | 138 | "The example compiles and runs as provided, but it produces incorrect results due to the incomplete `daxpy` implementation.\n",
|
139 | 139 | "Once you fix it, the following blocks should compile and run correctly:\n"
|
|
145 | 145 | "metadata": {},
|
146 | 146 | "outputs": [],
|
147 | 147 | "source": [
|
148 |
| - "!g++ -std=c++20 -Ofast -march=native -DNDEBUG -o daxpy exercise0.cpp\n", |
| 148 | + "!g++ -std=c++20 -Ofast -march=native -DNDEBUG -o daxpy exercise1.cpp\n", |
149 | 149 | "!./daxpy 1000000"
|
150 | 150 | ]
|
151 | 151 | },
|
|
155 | 155 | "metadata": {},
|
156 | 156 | "outputs": [],
|
157 | 157 | "source": [
|
158 |
| - "!clang++ -std=c++20 -Ofast -march=native -DNDEBUG -o daxpy exercise0.cpp\n", |
| 158 | + "!clang++ -std=c++20 -Ofast -march=native -DNDEBUG -o daxpy exercise1.cpp\n", |
159 | 159 | "!./daxpy 1000000"
|
160 | 160 | ]
|
161 | 161 | },
|
|
165 | 165 | "metadata": {},
|
166 | 166 | "outputs": [],
|
167 | 167 | "source": [
|
168 |
| - "!nvc++ -std=c++20 -O4 -fast -march=native -Mllvm-fast -DNDEBUG -o daxpy exercise0.cpp\n", |
| 168 | + "!nvc++ -std=c++20 -O4 -fast -march=native -Mllvm-fast -DNDEBUG -o daxpy exercise1.cpp\n", |
169 | 169 | "!./daxpy 1000000"
|
170 | 170 | ]
|
171 | 171 | },
|
172 | 172 | {
|
173 | 173 | "cell_type": "markdown",
|
174 | 174 | "metadata": {},
|
175 | 175 | "source": [
|
176 |
| - "### Solutions Exercise 0\n", |
| 176 | + "### Solutions Exercise 1\n", |
177 | 177 | "\n",
|
178 | 178 | "The solutions for each example are available in the [`solutions/`] sub-directory.\n",
|
179 | 179 | "\n",
|
180 | 180 | "[`solutions/`]: ./solutions\n",
|
181 | 181 | "\n",
|
182 |
| - "The solution for this first exercise is in [`solutions/exercise0.cpp`].\n", |
| 182 | + "The solution for this first exercise is in [`solutions/exercise1.cpp`].\n", |
183 | 183 | "\n",
|
184 |
| - "[`solutions/exercise0.cpp`]: ./solutions/exercise0.cpp\n", |
| 184 | + "[`solutions/exercise1.cpp`]: ./solutions/exercise1.cpp\n", |
185 | 185 | "\n",
|
186 |
| - "The following blocks compile and run the solutions for Exercise 0 using different compilers." |
| 186 | + "The following blocks compile and run the solutions for Exercise 1 using different compilers." |
187 | 187 | ]
|
188 | 188 | },
|
189 | 189 | {
|
|
192 | 192 | "metadata": {},
|
193 | 193 | "outputs": [],
|
194 | 194 | "source": [
|
195 |
| - "!g++ -std=c++17 -Ofast -march=native -DNDEBUG -o daxpy solutions/exercise0.cpp\n", |
| 195 | + "!g++ -std=c++17 -Ofast -march=native -DNDEBUG -o daxpy solutions/exercise1.cpp\n", |
196 | 196 | "!./daxpy 1000000"
|
197 | 197 | ]
|
198 | 198 | },
|
|
202 | 202 | "metadata": {},
|
203 | 203 | "outputs": [],
|
204 | 204 | "source": [
|
205 |
| - "!clang++ -std=c++17 -Ofast -march=native -DNDEBUG -o daxpy solutions/exercise0.cpp\n", |
| 205 | + "!clang++ -std=c++17 -Ofast -march=native -DNDEBUG -o daxpy solutions/exercise1.cpp\n", |
206 | 206 | "!./daxpy 1000000"
|
207 | 207 | ]
|
208 | 208 | },
|
|
212 | 212 | "metadata": {},
|
213 | 213 | "outputs": [],
|
214 | 214 | "source": [
|
215 |
| - "!nvc++ -std=c++17 -O4 -fast -march=native -Mllvm-fast -DNDEBUG -o daxpy solutions/exercise0.cpp\n", |
| 215 | + "!nvc++ -std=c++17 -O4 -fast -march=native -Mllvm-fast -DNDEBUG -o daxpy solutions/exercise1.cpp\n", |
216 | 216 | "!./daxpy 1000000"
|
217 | 217 | ]
|
218 | 218 | },
|
219 | 219 | {
|
220 | 220 | "cell_type": "markdown",
|
221 | 221 | "metadata": {},
|
222 | 222 | "source": [
|
223 |
| - "# Exercise 1\n", |
| 223 | + "# Exercise 2: from raw initialization to `std::fill_n` and `std::for_each_n`\n", |
224 | 224 | "\n",
|
225 |
| - "In Exercise 2 we will parallelize `daxpy` to allow it to run on accelerator devices like a GPUs.\n", |
| 225 | + "In Exercise 3 we will parallelize `daxpy` to allow it to run on accelerator devices like a GPUs.\n", |
226 | 226 | "When doing so, it is important to avoid unnecessary memory migrations across devices.\n",
|
227 | 227 | "\n",
|
228 |
| - "The goal of this exercise is to initialize the memory using the standard library algorithms, so that when we parallelize the initialization in Exercise 2, it will happen on the accelerator device itself.\n", |
| 228 | + "The goal of this exercise is to initialize the memory using the standard library algorithms, so that when we parallelize the initialization in Exercise 3, it will happen on the accelerator device itself.\n", |
229 | 229 | "\n",
|
230 | 230 | "Since we need to initialize two vectors - `x` and `y` - lets use a different approach to initialize each:\n",
|
231 | 231 | "\n",
|
|
236 | 236 | "[for_each_n]: https://en.cppreference.com/w/cpp/algorithm/for_each_n \n",
|
237 | 237 | "[iota_view]: https://en.cppreference.com/w/cpp/ranges/iota_view\n",
|
238 | 238 | "\n",
|
239 |
| - "* `std::for_each_n` algorithms with `std::views::iota` for ind (see [for_each\n", |
240 |
| - "\n", |
241 |
| - "A template for the solution is provided in [exercise1.cpp]. The `TODO`s indicate the parts of the template that must be completed.\n", |
| 239 | + "A template for the solution is provided in [exercise2.cpp]. The `TODO`s indicate the parts of the template that must be completed.\n", |
242 | 240 | "To complete this first exercise, the `initialize` function needs to be rewritten to use the C++ standatd library algorithms and this will require adding some headers for accessing `std::views::iota`:\n",
|
243 | 241 | "\n",
|
244 | 242 | "```c++\n",
|
|
253 | 251 | "}\n",
|
254 | 252 | "```\n",
|
255 | 253 | "\n",
|
256 |
| - "[exercise1.cpp]: ./exercise1.cpp\n", |
| 254 | + "[exercise2.cpp]: ./exercise2.cpp\n", |
257 | 255 | "\n",
|
258 | 256 | "The example compiles and runs as provided, but it produces incorrect results due to the incomplete `initialize` implementation.\n",
|
259 | 257 | "In the compilation commands below, the C++ standard version is now C++20, to enable the use of `views::iota`.\n",
|
|
267 | 265 | "metadata": {},
|
268 | 266 | "outputs": [],
|
269 | 267 | "source": [
|
270 |
| - "!g++ -std=c++20 -Ofast -march=native -DNDEBUG -o daxpy exercise1.cpp\n", |
| 268 | + "!g++ -std=c++20 -Ofast -march=native -DNDEBUG -o daxpy exercise2.cpp\n", |
271 | 269 | "!./daxpy 1000000"
|
272 | 270 | ]
|
273 | 271 | },
|
|
277 | 275 | "metadata": {},
|
278 | 276 | "outputs": [],
|
279 | 277 | "source": [
|
280 |
| - "!clang++ -std=c++20 -Ofast -march=native -DNDEBUG -isystem/usr/local/range-v3/include -o daxpy exercise1.cpp\n", |
| 278 | + "!clang++ -std=c++20 -Ofast -march=native -DNDEBUG -isystem/usr/local/range-v3/include -o daxpy exercise2.cpp\n", |
281 | 279 | "!./daxpy 1000000"
|
282 | 280 | ]
|
283 | 281 | },
|
|
287 | 285 | "metadata": {},
|
288 | 286 | "outputs": [],
|
289 | 287 | "source": [
|
290 |
| - "!nvc++ -std=c++20 -O4 -fast -march=native -Mllvm-fast -o daxpy exercise1.cpp\n", |
| 288 | + "!nvc++ -std=c++20 -O4 -fast -march=native -Mllvm-fast -DNDEBUG -o daxpy exercise2.cpp\n", |
291 | 289 | "!./daxpy 1000000"
|
292 | 290 | ]
|
293 | 291 | },
|
294 | 292 | {
|
295 | 293 | "cell_type": "markdown",
|
296 | 294 | "metadata": {},
|
297 | 295 | "source": [
|
298 |
| - "### Solutions Exercise 1\n", |
| 296 | + "### Solutions Exercise 2\n", |
299 | 297 | "\n",
|
300 |
| - "The solution for this exercise is in [`solutions/exercise1.cpp`].\n", |
| 298 | + "The solution for this exercise is in [`solutions/exercise2.cpp`].\n", |
301 | 299 | "\n",
|
302 |
| - "[`solutions/exercise1.cpp`]: ./solutions/exercise1.cpp\n", |
| 300 | + "[`solutions/exercise2.cpp`]: ./solutions/exercise2.cpp\n", |
303 | 301 | "\n",
|
304 | 302 | "The following compiles and runs the solutions for Exercise 1 using different compilers."
|
305 | 303 | ]
|
|
311 | 309 | "outputs": [],
|
312 | 310 | "source": [
|
313 | 311 | "# Using iota range for initialize \n",
|
314 |
| - "!g++ -std=c++20 -Ofast -march=native -DNDEBUG -o daxpy solutions/exercise1.cpp\n", |
| 312 | + "!g++ -std=c++20 -Ofast -march=native -DNDEBUG -o daxpy solutions/exercise2.cpp\n", |
315 | 313 | "!./daxpy 1000000"
|
316 | 314 | ]
|
317 | 315 | },
|
|
321 | 319 | "metadata": {},
|
322 | 320 | "outputs": [],
|
323 | 321 | "source": [
|
324 |
| - "!clang++ -std=c++20 -Ofast -march=native -DNDEBUG -o daxpy solutions/exercise1.cpp\n", |
| 322 | + "!clang++ -std=c++20 -Ofast -march=native -DNDEBUG -o daxpy solutions/exercise2.cpp\n", |
325 | 323 | "!./daxpy 1000000"
|
326 | 324 | ]
|
327 | 325 | },
|
|
331 | 329 | "metadata": {},
|
332 | 330 | "outputs": [],
|
333 | 331 | "source": [
|
334 |
| - "!nvc++ -std=c++20 -O4 -fast -march=native -Mllvm-fast -o daxpy solutions/exercise1.cpp\n", |
| 332 | + "!nvc++ -std=c++20 -O4 -fast -march=native -Mllvm-fast -DNDEBUG -o daxpy solutions/exercise2.cpp\n", |
335 | 333 | "!./daxpy 1000000"
|
336 | 334 | ]
|
337 | 335 | },
|
338 | 336 | {
|
339 | 337 | "cell_type": "markdown",
|
340 | 338 | "metadata": {},
|
341 | 339 | "source": [
|
342 |
| - "## Exercise 2: parallelizing DAXPY using C++ parallel algorithms\n", |
| 340 | + "## Exercise 3: parallelizing DAXPY and Initialization using C++ parallel algorithms\n", |
343 | 341 | "\n",
|
344 | 342 | "The goal of this final exercise in this section is to parallelize the `initialize` and `daxpy` functions to compute the results in parallel using CPUs or GPUs.\n",
|
345 | 343 | "\n",
|
346 |
| - "A template for the solution is provided in [exercise2.cpp].\n", |
| 344 | + "A template for the solution is provided in [exercise3.cpp].\n", |
347 | 345 | "\n",
|
348 | 346 | "```c++\n",
|
349 | 347 | "#include <ranges>\n",
|
|
368 | 366 | "}\n",
|
369 | 367 | "```\n",
|
370 | 368 | "\n",
|
371 |
| - "[exercise2.cpp]: ./exercise2.cpp\n", |
| 369 | + "[exercise3.cpp]: ./exercise3.cpp\n", |
372 | 370 | "\n",
|
373 | 371 | "Compiling with support for the parallel algorithms requires:\n",
|
374 | 372 | "* `g++` and `clang++`: link against Intel TBB with `-ltbb`\n",
|
|
387 | 385 | "metadata": {},
|
388 | 386 | "outputs": [],
|
389 | 387 | "source": [
|
390 |
| - "!g++ -std=c++20 -Ofast -march=native -DNDEBUG -o daxpy exercise2.cpp -ltbb\n", |
| 388 | + "!g++ -std=c++20 -Ofast -march=native -DNDEBUG -o daxpy exercise3.cpp -ltbb\n", |
391 | 389 | "!./daxpy 1000000"
|
392 | 390 | ]
|
393 | 391 | },
|
|
397 | 395 | "metadata": {},
|
398 | 396 | "outputs": [],
|
399 | 397 | "source": [
|
400 |
| - "!clang++ -std=c++20 -Ofast -march=native -DNDEBUG -o daxpy exercise2.cpp -ltbb\n", |
| 398 | + "!clang++ -std=c++20 -Ofast -march=native -DNDEBUG -o daxpy exercise3.cpp -ltbb\n", |
401 | 399 | "!./daxpy 1000000"
|
402 | 400 | ]
|
403 | 401 | },
|
|
407 | 405 | "metadata": {},
|
408 | 406 | "outputs": [],
|
409 | 407 | "source": [
|
410 |
| - "!nvc++ -stdpar=multicore -std=c++20 -O4 -fast -march=native -Mllvm-fast -o daxpy exercise2.cpp\n", |
| 408 | + "!nvc++ -stdpar=multicore -std=c++20 -O4 -fast -march=native -Mllvm-fast -DNDEBUG -o daxpy exercise3.cpp\n", |
411 | 409 | "!./daxpy 1000000"
|
412 | 410 | ]
|
413 | 411 | },
|
|
417 | 415 | "metadata": {},
|
418 | 416 | "outputs": [],
|
419 | 417 | "source": [
|
420 |
| - "!nvc++ -stdpar=gpu -std=c++20 -O4 -fast -march=native -Mllvm-fast -o daxpy exercise2.cpp\n", |
| 418 | + "!nvc++ -stdpar=gpu -std=c++20 -O4 -fast -march=native -Mllvm-fast -DNDEBUG -o daxpy exercise3.cpp\n", |
421 | 419 | "!./daxpy 1000000"
|
422 | 420 | ]
|
423 | 421 | },
|
424 | 422 | {
|
425 | 423 | "cell_type": "markdown",
|
426 |
| - "metadata": {}, |
| 424 | + "metadata": { |
| 425 | + "tags": [] |
| 426 | + }, |
427 | 427 | "source": [
|
428 |
| - "### Solutions for Exercise 2\n", |
| 428 | + "### Solutions for Exercise 3\n", |
429 | 429 | "\n",
|
430 |
| - "The solution for this exercise is in [`solutions/exercise2.cpp`].\n", |
| 430 | + "The solution for this exercise is in [`solutions/exercise3.cpp`].\n", |
431 | 431 | "\n",
|
432 |
| - "[`solutions/exercise2.cpp`]: ./solutions/exercise2.cpp\n", |
| 432 | + "[`solutions/exercise3.cpp`]: ./solutions/exercise3.cpp\n", |
433 | 433 | "\n",
|
434 |
| - "The following blocks compile and run the solutions for Exercise 2 using different compilers on the CPU.\n", |
| 434 | + "The following blocks compile and run the solutions for Exercise 3 using different compilers on the CPU.\n", |
435 | 435 | "\n",
|
436 |
| - "The last block compiles and runs the solution for Exercise 2 on the GPU. If you get an error, make sure that the lambda captures are captiruing scalars by value, and that when capturing a vector to access its data, one captures a pointer to its data by value as well using `[x = x.data()]`." |
| 436 | + "The last block compiles and runs the solution for Exercise 3 on the GPU. If you get an error, make sure that the lambda captures are captiruing scalars by value, and that when capturing a vector to access its data, one captures a pointer to its data by value as well using `[x = x.data()]`." |
437 | 437 | ]
|
438 | 438 | },
|
439 | 439 | {
|
|
442 | 442 | "metadata": {},
|
443 | 443 | "outputs": [],
|
444 | 444 | "source": [
|
445 |
| - "!g++ -std=c++20 -Ofast -march=native -DNDEBUG -o daxpy solutions/exercise2.cpp -ltbb\n", |
| 445 | + "!g++ -std=c++20 -Ofast -march=native -DNDEBUG -o daxpy solutions/exercise3.cpp -ltbb\n", |
446 | 446 | "!./daxpy 1000000"
|
447 | 447 | ]
|
448 | 448 | },
|
|
452 | 452 | "metadata": {},
|
453 | 453 | "outputs": [],
|
454 | 454 | "source": [
|
455 |
| - "!clang++ -std=c++20 -Ofast -march=native -DNDEBUG -o daxpy solutions/exercise2.cpp -ltbb\n", |
| 455 | + "!clang++ -std=c++20 -Ofast -march=native -DNDEBUG -o daxpy solutions/exercise3.cpp -ltbb\n", |
456 | 456 | "!./daxpy 1000000"
|
457 | 457 | ]
|
458 | 458 | },
|
|
462 | 462 | "metadata": {},
|
463 | 463 | "outputs": [],
|
464 | 464 | "source": [
|
465 |
| - "!nvc++ -stdpar=multicore -std=c++20 -O4 -fast -march=native -Mllvm-fast -o daxpy solutions/exercise2.cpp\n", |
| 465 | + "!nvc++ -stdpar=multicore -std=c++20 -O4 -fast -march=native -Mllvm-fast -DNDEBUG -o daxpy solutions/exercise3.cpp\n", |
466 | 466 | "!./daxpy 1000000"
|
467 | 467 | ]
|
468 | 468 | },
|
|
472 | 472 | "metadata": {},
|
473 | 473 | "outputs": [],
|
474 | 474 | "source": [
|
475 |
| - "!nvc++ -stdpar=gpu -std=c++20 -O4 -fast -march=native -Mllvm-fast -o daxpy solutions/exercise2.cpp\n", |
| 475 | + "!nvc++ -stdpar=gpu -std=c++20 -O4 -fast -march=native -Mllvm-fast -DNDEBUG -o daxpy solutions/exercise3.cpp\n", |
476 | 476 | "!./daxpy 1000000"
|
477 | 477 | ]
|
478 | 478 | }
|
|
0 commit comments