updated README.md

rgiduthuri · rgiduthuri · commit f868f04c6838 · 2017-09-29T21:23:52.000-07:00
diff --git a/README.md b/README.md
@@ -1,9 +1,10 @@
-﻿# AMD OpenVX (AMDOVX)
-AMD OpenVX (beta preview) is a highly optimized open source implementation of the [Khronos OpenVX](https://www.khronos.org/registry/vx/) computer vision specification. It allows for rapid prototyping as well as fast execution on a wide range of computer hardware, including small embedded x86 CPUs and large workstation discrete GPUs.
+# AMD OpenVX (AMDOVX)
+AMD OpenVX (beta) is a highly optimized open source implementation of the [Khronos OpenVX](https://www.khronos.org/registry/vx/) computer vision specification. It allows for rapid prototyping as well as fast execution on a wide range of computer hardware, including small embedded x86 CPUs and large workstation discrete GPUs.
 
 The amdovx-core project consists of two components:
 * [OpenVX](openvx/README.md): AMD OpenVX library
 * [RunVX](runvx/README.md): command-line utility to execute OpenVX graph described in GDF text file
+* [RunCL](runcl/README.md): command-line utility to build, execute, and debug OpenCL programs
 
 The OpenVX framework provides a mechanism to add new vision functions to OpenVX by 3rd party vendors. Look into github [amdovx-modules](https://github.com/GPUOpen-ProfessionalCompute-Libraries/amdovx-modules) project for additional OpenVX modules and utilities.
 * **vx_nn**: OpenVX neural network module that was built on top of [MIOpen](https://github.com/ROCmSoftwarePlatform/MIOpen)
@@ -22,26 +23,29 @@ This software is provided under a MIT-style license,  see the file COPYRIGHT.txt
 
 ## Pre-requisites
 * CPU: SSE4.1 or above CPU, 64-bit.
-* GPU: Radeon R7 Series or above (Kaveri+ APU), Radeon 3xx Series or above (optional)
-  * DRIVER: AMD Catalyst 15.7 or higher (version 15.20) with OpenCL 2.0 runtimes
-  * AMD APP SDK 3.0 [download](http://developer.amd.com/tools-and-sdks/opencl-zone/amd-accelerated-parallel-processing-app-sdk/).
+* GPU: Radeon Professional Graphics Cards or Vega Family of Products (16GB required for vx_loomsl and vx_nn libraries)
+  * Windows: install the latest drivers and OpenCL SDK [download](https://github.com/GPUOpen-LibrariesAndSDKs/OCL-SDK/releases)
+  * Linux: install [ROCm](https://rocm.github.io/ROCmInstall.html)
+* OpenCV 3 (optional) [download](https://github.com/opencv/opencv/releases) for RunVX
+  * Set OpenCV_DIR environment variable to OpenCV/build folder
 
 ## Build Instructions
 Build this project to generate AMD OpenVX library and RunVX executable. 
 * Refer to [openvx/include/VX](openvx/include/VX) for Khronos OpenVX standard header files.
 * Refer to [openvx/include/vx_ext_amd.h](openvx/include/vx_ext_amd.h) for vendor extensions in AMD OpenVX library.
 * Refer to [runvx/README.md](runvx/README.md) for RunVX details. 
+* Refer to [runcl/README.md](runcl/README.md) for RunCL details. 
 
 ### Build using Visual Studio Professional 2013 on 64-bit Windows 10/8.1/7
 * Install OpenCV 3 with contrib [download](https://github.com/opencv/opencv/releases) for RunVX tool to support camera capture and image display (optional)
 * OpenCV_DIR environment variable should point to OpenCV/build folder
 * Use amdovx-core/amdovx.sln to build for x64 platform
-* If AMD GPU (or OpenCL 2.0) is not available, set build flag ENABLE_OPENCL=0 in openvx/openvx.vcxproj and runvx/runvx.vcxproj.
+* If AMD GPU (or OpenCL) is not available, set build flag ENABLE_OPENCL=0 in openvx/openvx.vcxproj and runvx/runvx.vcxproj.
 
 ### Build using CMake
 * Install CMake 2.8 or newer [download](http://cmake.org/download/).
 * Install OpenCV 3 with contrib [download](https://github.com/opencv/opencv/releases) for RunVX tool to support camera capture and image display (optional)
 * OpenCV_DIR environment variable should point to OpenCV/build folder
 * Install libssl-dev on linux (optional)
 * Use CMake to configure and generate Makefile
-* If AMD GPU (or OpenCL 2.0) is not available, use build flag -DCMAKE_DISABLE_FIND_PACKAGE_OpenCL=TRUE.
+* If AMD GPU (or OpenCL) is not available, use build flag -DCMAKE_DISABLE_FIND_PACKAGE_OpenCL=TRUE.
diff --git a/runcl/README.md b/runcl/README.md
@@ -0,0 +1,67 @@
+# AMD RunCL
+RunCL is a command-line tool to build, execute, and debug OpenCL programs, with a simple, easy-to-use interface.
+
+## RunCL Usage
+
+    Usage: runcl [platform-options] [-I<include-dir>] [[-D<name>=<value>] ...]
+                 <kernel.[cl|elf]> [kernel-arguments] 
+                 <arguments> <num_work_items>[/<work_group_size>]
+    
+       [platform-options]
+           -v                    verbose
+           -gpu                  use GPU device (default)
+           -cpu                  use CPU device
+           -device <name>|#<num> use specified device
+           -bo <string>          OpenCL build option
+    
+       [kernel-options]
+           -k <kernel-name>      kernel name
+           -p                    use persistence flag
+           -r[link] <exec-count> execution count
+           -w <msec>             waiting time
+           -dumpcl               dump OpenCL code after pre-processing
+           -dumpilisa            dump ISA of kernel and show ISA statistics
+           -dumpelf              dump ELF binary
+    
+       The <arguments> shall be given in the order as required by the kernel.
+         For value arguments use   
+             iv#<int/float>[,<int/float>...] or 
+             iv:<file> (e.g., iv#10.2,10,0x10)
+         For local memory use      
+             lm#<local-memory-size> (e.g., lm#8192)
+         For input buffer use      
+             if[#<buffer-size>]:[<file>][#[[<checksum>][/<file>[@<offset>#<end>]]]]
+             (e.g., if:input.bin)
+         For output (or RW) buffer 
+             of[#<buffer-size>]:[#]<file>[@<ofile>][#[[<checksum>][/[+<float-tolerance>]<file>[@<offset>#<end>]]]] 
+             (e.g., of#16384:output.bin)
+         For input image  use      
+             ii#<width>x<height>,<stride>,<u8/s16/u16/bgra/rgba/argb>:<file> 
+             (e.g., ii#1920x1080,7680,bgra:screen1920x1080.rgb)
+         For output image  use     
+             oi#<width>x<height>,<stride>,<u8/s16/u16/bgra/rgba/argb>:<file> 
+             (e.g., oi#1920x1080,7680,bgra:screen1920x1080.rgb
+
+## Example
+
+    % cat subtract.cl
+    __kernel __attribute__((reqd_work_group_size(64, 1, 1)))
+    void subtract(
+        __global float * a, 
+        __global float * b, 
+        __global float * c, 
+        uint count)
+    {
+        uint id = get_global_id(0);
+        if(id < count) {
+            c[id] = a[id] - b[id];
+        }
+    }
+    % runcl subtract.cl if#4000:a.f32 if#4000:b.f32 of#4000:#out.f32 iv#1000 1024,1,1/64,1,1
+    OK: Using GPU device#0 [...]
+    OK: COMPILATION on GPU took   0.1268 sec for subtract
+    OK: kernel subtract info reqd_work_group_size(64,1,1)
+    OK: kernel subtract info work_group_size(256)
+    OK: kernel subtract info local_mem_size(0)
+    OK: kernel subtract info local_private_size(0)
+    OK: RUN SUCCESSFUL on GPU work:{1024,1,1}/{64,1,1} [  0.00025 sec/exec] subtract (1st execution)