GitHub - azteca1998/aa-dotopt

# DotOpt - Optimized matrix multiplication

Development environment configuration

Update the container.

./scripts/update-all.sh

Install the required dependencies.

./scripts/install-deps.sh

Fetch third party libraries

./scripts/fetch-libs.sh

WARNING: Benchmarking requires SYS_CAP_ADMIN.

Project overview

Main library

The main library is implemented as a python module which computes matrix multiplication operations using NumPy types.

Testing

Tests are implemented in Python, and added to test/CMakeLists.txt so that they are executed when testing.

Benchmarking

Benchmarks are implemented in C/C++ using Google's benchmarking library.

Warning: CPU frequency scaling should be disabled during the benchmarks.

# To query the current CPU frequency scaler:
sudo cpupower frequency-info

# To force the CPU to run at full power:
sudo cpupower frequency-set --governor performance

# To restore the previous CPU fequency scaler profile:
sudo cpupower frequency-set --governor <previous governor>

Algorithm implementations

Sequential algorithm.

OpenMP parallelism

Parallel algorithm using OpenMP's loop parallelism.
Parallel algorithm using OpenMP's task parallelism.
Parallel algorithm with row/col tiling using OpenMP's loop parallelism.
Parallel algorithm with row/col tiling using OpenMP's task parallelism.
Parallel algorithm with 2d tiling using OpenMP's loop parallelism.
Parallel algorithm with 2d tiling using OpenMP's task parallelism.

Our IMTS implementation

Parallel algorithm using 1 level IMTS with random tile ordering.
Parallel algorithm using 1 level IMTS with tile ordering.
Parallel algorithm using 2 level ITMS with random tile ordering.
Parallel algorithm using 2 level ITMS with tile ordering.
Parallel algorithm using 3 level ITMS with random tile ordering.
Parallel algorithm using 3 level ITMS with tile ordering.

Extras

The best version to date with a hand-crafted matrix multiplication assembly code.
The hand-crafted version with 4-level ITMS tiling, where the smallest tiles fit into the vector registers. In other words, a hand crafted tiling level for our hand-crafted matrix multiplication implementation.

GPU using Vulkan compute shaders

(Not implemented) A GPU version where each thread computes an output.
(Not implemented) A tiling GPU version where each thread computes a 4x4 tile (GLSL).
(Not implemented) A tiling GPU version where each SM computes a tile using the shared memory as a scratchpad for inputs and/or outputs.

GPU using OpenCL (if we can make it work)

(Not implemented) Same as #16.
(Not implemented) Same as #17.
(Not implemented) Same as #18.

External implementations (for reference when comparing)

(Testing only) NumPy implementation.
(Not used) TensorFlow implementation.
(Not used) OpenBLAS (maybe?).
(Not used) clBLAS (maybe?).
(Not used) ArrayFire (OpenCL; maybe?).

Intelligent Multilevel Tiling Scheduler

The IMTS scheduler divides the workload into 2d tiles taking into account the sizes of the LLC. Those are then passed on to another tiler which further divides the workload into smaller tiles which fit into the next-to-last level cache. The same process is repeated until the first level cache is reached.

The order in which the tiles are computed is predefined to minify the number of transfers between the caches. For example, a cache will need both input tiles and the output. When switching tiles, only one of those will be loaded, and preferably one of the inputs, since the outputs will have to be transferred back to the next level.

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
.devcontainer		.devcontainer
.vscode		.vscode
bench		bench
doc		doc
scripts		scripts
src		src
test		test
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Development environment configuration

Project overview

Algorithm implementations

Intelligent Multilevel Tiling Scheduler

About

Releases

Packages

Contributors 2

Languages

azteca1998/aa-dotopt

Folders and files

Latest commit

History

Repository files navigation

Development environment configuration

Project overview

Algorithm implementations

Intelligent Multilevel Tiling Scheduler

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages