Skip to content

Commit

Permalink
Clang transpiler integration (#756)
Browse files Browse the repository at this point in the history
* development based clang transpiler integration

* added missing GitSubmodules.cmake

* fixes for code review & OpenMP/Serial bug fix of non-polymorphic call was used

* refactoring of integration, use function composition & callbacks strategy

* make unchanged files unchanged

* fix hipDeviceProp_t type to be the same as original HIP & revert back buildIncludes implementation

* fix package build without occa-transpiler

* update occa-transpiler version to v1.1

* update occa-transpiler to latest devel(fix cuda/hip intrinsics)

* update occa-transpiler taggeed version

* move to tag v1.1 occa-transpiler

* added example with occa-transpiler and C++ featured okl kernel

* fixes for code review, move getTranspilerVersion from options to bin/occa.cpp as local function

* update INSTALL.md & README.md documentation files

* update occa-transpiler repo

* add option to build new transpiler with local installed clang

* fix example of new oklt to support serial, openmp modes; remove debug print

* add unsigned int to OCCA builtin types

* update README and deps

* update occa-transpiler to v1.1

* Remove occa-tranpiler as a submodule

* Make changes to link occa-transpiler as a library

* Add a link to occa-transpiler README in INSTALL.md

* Fix a few typos

* Add a link to occa-transpiler repo

---------

Co-authored-by: Viktor Yastrebov <v.yastrebov90@gmail.com>
Co-authored-by: Iurii Kobein <ikobein@softserveinc.com>
Co-authored-by: Thilina Ratnayaka <thilinarmtb@gmail.com>
Co-authored-by: Iurii Kobein <61540607+IuriiKobein@users.noreply.github.com>
  • Loading branch information
5 people authored Nov 20, 2024
1 parent 6c2e7d3 commit 0e177a1
Show file tree
Hide file tree
Showing 21 changed files with 937 additions and 116 deletions.
Empty file added .gitmodules
Empty file.
13 changes: 12 additions & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ option(OCCA_ENABLE_DPCPP "Build with SYCL/DPCPP if available" ON)
option(OCCA_ENABLE_TESTS "Build tests" OFF)
option(OCCA_ENABLE_EXAMPLES "Build simple examples" OFF)
option(OCCA_ENABLE_FORTRAN "Enable Fortran interface" OFF)
option(OCCA_CLANG_BASED_TRANSPILER "Build with occa-transpiler dependecy" OFF)

if(OCCA_ENABLE_FORTRAN)
enable_language(Fortran)
Expand Down Expand Up @@ -67,6 +68,11 @@ else()
set(OCCA_OS "OCCA_WINDOWS_OS")
endif()

# INFO: order is important, deps should not apply compiler flags
if (OCCA_CLANG_BASED_TRANSPILER)
find_package(oklt REQUIRED)
endif()

include(SetCompilerFlags)
include(CheckCXXCompilerFlag)

Expand Down Expand Up @@ -113,6 +119,11 @@ target_include_directories(libocca PRIVATE
$<BUILD_INTERFACE:${OCCA_SOURCE_DIR}/src>)

target_compile_definitions(libocca PRIVATE -DUSE_CMAKE)
if (OCCA_CLANG_BASED_TRANSPILER)
target_link_libraries(libocca PRIVATE occa::occa-transpiler)
target_compile_definitions(libocca PRIVATE -DBUILD_WITH_CLANG_BASED_TRANSPILER)
endif()

#=======================================

#---[ OpenMP ]--------------------------
Expand Down Expand Up @@ -231,7 +242,7 @@ if(OCCA_ENABLE_METAL AND APPLE)
endif()
endif()
#=======================================

if(NOT OCCA_IS_TOP_LEVEL)
# OCCA is being built as a subdirectory in another project
set(OCCA_OPENMP_ENABLED ${OCCA_OPENMP_ENABLED} PARENT_SCOPE)
Expand Down
26 changes: 23 additions & 3 deletions INSTALL.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,18 +10,19 @@

### Optional

- Fortan 90 compiler
- Fortran 90 compiler
- CUDA 9 or later
- HIP 3.5 or later
- SYCL 2020 or later
- OpenCL 2.0 or later
- OpenMP 4.0 or later
- Support Clang based transpiler

## Linux

### **Configure**

OCCA uses the [CMake] build system. For convenience, the shell script `configure-cmake.sh` has been provided to drive the Cmake build. The following table gives a list of build parameters which are set in the file. To override the default value, it is only necessary to assign the variable an alternate value at the top of the script or at the commandline.
OCCA uses the [CMake] build system. For convenience, the shell script `configure-cmake.sh` has been provided to drive the CMake build. The following table gives a list of build parameters which are set in the file. To override the default value, it is only necessary to assign the variable an alternate value at the top of the script or at the commandline.

Example
```shell
Expand All @@ -46,6 +47,7 @@ $ CC=clang CXX=clang++ OCCA_ENABLE_OPENMP="OFF" ./configure-cmake.sh
| OCCA_ENABLE_TESTS | Build OCCA's test harness | `ON` |
| OCCA_ENABLE_EXAMPLES | Build OCCA examples | `ON` |
| OCCA_ENABLE_FORTRAN | Build the Fortran language bindings | `OFF`|
| OCCA_CLANG_BASED_TRANSPILER | Build clang based transpiler that support C++ in OKL | `OFF`|
| FC | Fortran 90 compiler | `gfortran` |
| FFLAGS | Fortran compiler flags | *empty* |

Expand All @@ -67,7 +69,25 @@ After CMake configuration is complete, OCCA can be built with the command
$ cmake --build build --parallel <number-of-threads>
```

When cross compiling for a different platform, the targeted hardware doesn't need to be available; however all dependencies&mdash;e.g., headers, libraries&mdash;must be present. Commonly this is the case for large HPC systems, where code is compiled on login nodes and run on compute nodes.
When cross compiling for a different platform, the targeted hardware doesn't need to be available; however all dependencies&mdash;e.g., headers, libraries&mdash;must be present. Commonly this is the case for large HPC systems, where code is compiled on login nodes and run on compute nodes.


#### Building with Clang transpiler

occa-transpiler repository can be found in [libocca/occa-transpiler](https://github.com/libocca/occa-transpiler/).
Please refer [occa-transpiler README](https://github.com/libocca/occa-transpiler/blob/main/README.md) for instructions on how to
build and install the occa-transpiler.
Then you can use the following commands to install OCCA with occa-transpiler enabled.
Please replace `<occa-transpiler install dir>` by the root directory of your
occa-transpiler installation.

```shell
$ mkdir build
$ cd build
$ cmake -DCMAKE_BUILD_TYPE=Release -DOCCA_CLANG_BASED_TRANSPILER=ON -DCMAKE_PREFIX_PATH=<occa-transpiler install dir>/lib/cmake ..
$ cmake --build . --parallel <number-of-threads>
$ cmake --install . --prefix install
```

### Testing

Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,12 +42,13 @@ Mission critical computational science and engineering applications from the pub

### Optional

- Fortan 90 compiler
- Fortran 90 compiler
- CUDA 9 or later
- HIP 4.2 or later
- SYCL 2020 or later
- OpenCL 2.0 or later
- OpenMP 4.0 or later
- C++ support for OKL with clang based transpiler [new-okl-transpiler](https://github.com/libocca/occa-transpiler)

## Build, Test, Install

Expand All @@ -67,7 +68,6 @@ $ cmake --install build --prefix install

If dependencies are installed in a non-standard location, set the corresponding [environment variable](INSTALL.md#dependency-paths) to this path.


## Use

### Environment
Expand Down
10 changes: 10 additions & 0 deletions examples/cpp/31_oklt_v3_moving_avg/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
compile_cpp_example_with_modes(oklt_v3_moving_avg main.cpp)

add_custom_target(cpp_example_oklt_v3_moving_avg_cpy ALL
COMMAND ${CMAKE_COMMAND} -E copy ${CMAKE_CURRENT_SOURCE_DIR}/constants.h constants.h
COMMAND ${CMAKE_COMMAND} -E copy ${CMAKE_CURRENT_SOURCE_DIR}/movingAverage.okl movingAverage.okl)
add_dependencies(examples_cpp_oklt_v3_moving_avg cpp_example_oklt_v3_moving_avg_cpy)
target_sources(examples_cpp_oklt_v3_moving_avg
PRIVATE ${CMAKE_CURRENT_SOURCE_DIR}/movingAverage.okl
PRIVATE ${CMAKE_CURRENT_SOURCE_DIR}/constants.h
)
5 changes: 5 additions & 0 deletions examples/cpp/31_oklt_v3_moving_avg/constants.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
#pragma once

constexpr const int THREADS_PER_BLOCK = 1024;
//INFO: it's not possible to setup dynamicaly extern @shared array for CUDA
constexpr const int WINDOW_SIZE = 16;
93 changes: 93 additions & 0 deletions examples/cpp/31_oklt_v3_moving_avg/main.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
#include <iostream>
#include <occa.hpp>
#include <vector>
#include "constants.h"

std::vector<float> buildData(std::size_t size,
float initialValue,
float fluctuation)
{
std::vector<float> buffer(size);
float currentValue = initialValue;
float longIncrement = 1.0f;
float fluctuationIncrement = fluctuation;
for(std::size_t i = 0; i < buffer.size(); ++i) {
buffer[i] = currentValue;
fluctuationIncrement = -fluctuationIncrement;
if(i % WINDOW_SIZE == 0) {
longIncrement = -longIncrement;
}
currentValue += longIncrement + fluctuationIncrement;
}
return buffer;
}

std::vector<float> goldMovingAverage(const std::vector<float> &hostVector) {
std::vector<float> result(hostVector.size() - WINDOW_SIZE);
for(std::size_t i = 0; i < result.size(); ++i) {
float value = 0.0f;
for(std::size_t j = 0; j < WINDOW_SIZE; ++j) {
value += hostVector[i + j];
}
result[i] = value / WINDOW_SIZE;
}
return result;
}

bool starts_with(const std::string &str, const std::string &substring) {
return str.rfind(substring, 0) == 0;
}

occa::json getDeviceOptions(int argc, const char **argv) {
for(int i = 0; i < argc; ++i) {
std::string argument(argv[i]);
if((starts_with(argument,"-d") || starts_with(argument, "--device")) && i + 1 < argc)
{
std::string value(argv[i + 1]);
return occa::json::parse(value);
}
}
return occa::json::parse("{mode: 'Serial'}");
}

int main(int argc, const char **argv) {

occa::json deviceOpts = getDeviceOptions(argc, argv);
auto inputHostBuffer = buildData(THREADS_PER_BLOCK * WINDOW_SIZE + WINDOW_SIZE, 10.0f, 4.0f);
std::vector<float> outputHostBuffer(inputHostBuffer.size() - WINDOW_SIZE);

occa::device device(deviceOpts);
occa::memory deviceInput = device.malloc<float>(inputHostBuffer.size());
occa::memory deviceOutput = device.malloc<float>(outputHostBuffer.size());

occa::json buildProps({
{"transpiler-version", 3}
});

occa::kernel movingAverageKernel = device.buildKernel("movingAverage.okl", "movingAverage32f", buildProps);

deviceInput.copyFrom(inputHostBuffer.data(), inputHostBuffer.size());

movingAverageKernel(deviceInput,
static_cast<int>(inputHostBuffer.size()),
deviceOutput,
static_cast<int>(deviceOutput.size()));

// Copy result to the host
deviceOutput.copyTo(&outputHostBuffer[0], outputHostBuffer.size());

auto goldValue = goldMovingAverage(inputHostBuffer);

constexpr const float EPSILON = 0.001f;
for(std::size_t i = 0; i < outputHostBuffer.size(); ++i) {
bool isValid = std::abs(goldValue[i] - outputHostBuffer[i]) < EPSILON;
if(!isValid) {
std::cout << "Comparison with gold values has failed" << std::endl;
return 1;
}
}
std::cout << "Comparison with gold has passed" << std::endl;
std::cout << "Moving average finished" << std::endl;

return 0;
}
85 changes: 85 additions & 0 deletions examples/cpp/31_oklt_v3_moving_avg/movingAverage.okl
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
#include "constants.h"

template<class T,
int THREADS,
int WINDOW>
struct MovingAverage {
MovingAverage(int inputSize,
int outputSize,
T *shared_input,
T *shared_output)
:_inputSize(inputSize)
,_outputSize(outputSize)
,_shared_data(shared_input)
,_result_data(shared_output)
{}

void syncCopyFrom(const T *input, int block_idx, int thread_idx) {
int linearIdx = block_idx * THREADS + thread_idx;
//INFO: copy base chunk
if(linearIdx < _inputSize) {
_shared_data[thread_idx] = input[linearIdx];
}
//INFO: copy WINDOW chunk
int tailIdx = (block_idx + 1) * THREADS + thread_idx;
if(tailIdx < _inputSize && thread_idx < WINDOW) {
_shared_data[THREADS + thread_idx] = input[tailIdx];
}
@barrier;
}

void process(int thread_idx) {
T sum = T();
for(int i = 0; i < WINDOW; ++i) {
sum += _shared_data[thread_idx + i];
}
_result_data[thread_idx] = sum / WINDOW;
@barrier;
}

void syncCopyTo(T *output, int block_idx, int thread_idx) {
int linearIdx = block_idx * THREADS + thread_idx;
if(linearIdx < _outputSize) {
output[linearIdx] = _result_data[thread_idx];
}
@barrier;
}
private:
int _inputSize;
int _outputSize;

//INFO: not supported
// @shared T _data[THREADS_PER_BLOCK + WINDOW_SIZE];
// @shared T _result[THREADS_PER_BLOCK];

T *_shared_data;
T *_result_data;
};

@kernel void movingAverage32f(@restrict const float *inputData,
int inputSize,
@restrict float *outputData,
int outputSize)
{
@outer(0) for (int block_idx = 0; block_idx < outputSize / THREADS_PER_BLOCK + 1; ++block_idx) {
@shared float blockInput[THREADS_PER_BLOCK + WINDOW_SIZE];
@shared float blockResult[THREADS_PER_BLOCK];
MovingAverage<float, THREADS_PER_BLOCK, WINDOW_SIZE> ma{
inputSize,
outputSize,
blockInput,
blockResult
};
@inner(0) for(int thread_idx = 0; thread_idx < THREADS_PER_BLOCK; ++thread_idx) {
ma.syncCopyFrom(inputData, block_idx, thread_idx);
}

@inner(0) for(int thread_idx = 0; thread_idx < THREADS_PER_BLOCK; ++thread_idx) {
ma.process(thread_idx);
}

@inner(0) for(int thread_idx = 0; thread_idx < THREADS_PER_BLOCK; ++thread_idx) {
ma.syncCopyTo(outputData, block_idx, thread_idx);
}
}
}
4 changes: 4 additions & 0 deletions examples/cpp/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,10 @@ add_subdirectory(19_stream_tags)
add_subdirectory(20_native_dpcpp_kernel)
add_subdirectory(30_device_function)


if (OCCA_CLANG_BASED_TRANSPILER)
add_subdirectory(31_oklt_v3_moving_avg)
endif()
# Don't force-compile OpenGL examples
# add_subdirectory(16_finite_difference)
# add_subdirectory(17_mandelbulb)
2 changes: 2 additions & 0 deletions include/occa/dtype/builtins.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,9 @@ namespace occa {
extern const dtype_t char_;
extern const dtype_t short_;
extern const dtype_t int_;
extern const dtype_t uint_;
extern const dtype_t long_;
extern const dtype_t ulong_;
extern const dtype_t float_;
extern const dtype_t double_;

Expand Down
4 changes: 3 additions & 1 deletion src/dtype/builtins.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,9 @@ namespace occa {
const dtype_t char_("char", sizeof(char), true);
const dtype_t short_("short", sizeof(short), true);
const dtype_t int_("int", sizeof(int), true);
const dtype_t uint_("unsigned int", sizeof(unsigned int), true);
const dtype_t long_("long", sizeof(long), true);
const dtype_t ulong_("unsigned long", sizeof(unsigned long), true);
const dtype_t float_("float", sizeof(float), true);
const dtype_t double_("double", sizeof(double), true);

Expand Down Expand Up @@ -111,7 +113,7 @@ namespace occa {
}

template <> dtype_t get<unsigned long>() {
return long_;
return ulong_;
}

template <> dtype_t get<long long>() {
Expand Down
2 changes: 2 additions & 0 deletions src/dtype/dtype.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -413,6 +413,8 @@ namespace occa {
dtypeMap["long"] = &dtype::long_;
dtypeMap["float"] = &dtype::float_;
dtypeMap["double"] = &dtype::double_;
dtypeMap["unsigned long"] = &dtype::ulong_;
dtypeMap["unsigned int"] = &dtype::uint_;

// Sized primitives
dtypeMap["int8"] = dtype::get<int8_t>().ref;
Expand Down
Loading

0 comments on commit 0e177a1

Please sign in to comment.