Skip to content

Commit ba55cda

Browse files
committed
Updating samples- Mandelbrot, DCT and MonteCarlo with newest coding guidelines
1 parent 84cfcbf commit ba55cda

File tree

16 files changed

+797
-0
lines changed

16 files changed

+797
-0
lines changed
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
if(WIN32)
2+
set(CMAKE_CXX_COMPILER "dpcpp-cl")
3+
else()
4+
set(CMAKE_CXX_COMPILER "dpcpp")
5+
endif()
6+
# Set default build type to RelWithDebInfo if not specified
7+
if (NOT CMAKE_BUILD_TYPE)
8+
message (STATUS "Default CMAKE_BUILD_TYPE not set using Release with Debug Info")
9+
set (CMAKE_BUILD_TYPE "RelWithDebInfo" CACHE
10+
STRING "Choose the type of build, options are: None Debug Release RelWithDebInfo MinSizeRel"
11+
FORCE)
12+
endif()
13+
cmake_minimum_required (VERSION 3.0)
14+
project (DiscreteCosineTransform)
15+
add_subdirectory (src)
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
2+
Microsoft Visual Studio Solution File, Format Version 12.00
3+
# Visual Studio Version 16
4+
VisualStudioVersion = 16.0.30011.22
5+
MinimumVisualStudioVersion = 10.0.40219.1
6+
Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "DCT", "msvs\DCT.vcxproj", "{B2789FC1-D75F-4C6E-BF93-05BE084FBC79}"
7+
EndProject
8+
Global
9+
GlobalSection(SolutionConfigurationPlatforms) = preSolution
10+
Intel Performance Test|x64 = Intel Performance Test|x64
11+
Intel Release|x64 = Intel Release|x64
12+
EndGlobalSection
13+
GlobalSection(ProjectConfigurationPlatforms) = postSolution
14+
{B2789FC1-D75F-4C6E-BF93-05BE084FBC79}.Intel Performance Test|x64.ActiveCfg = Intel Performance Test|x64
15+
{B2789FC1-D75F-4C6E-BF93-05BE084FBC79}.Intel Performance Test|x64.Build.0 = Intel Performance Test|x64
16+
{B2789FC1-D75F-4C6E-BF93-05BE084FBC79}.Intel Release|x64.ActiveCfg = Intel Release|x64
17+
{B2789FC1-D75F-4C6E-BF93-05BE084FBC79}.Intel Release|x64.Build.0 = Intel Release|x64
18+
EndGlobalSection
19+
GlobalSection(SolutionProperties) = preSolution
20+
HideSolutionNode = FALSE
21+
EndGlobalSection
22+
GlobalSection(ExtensibilityGlobals) = postSolution
23+
SolutionGuid = {55E32F12-32F5-4567-B736-8B38914AAB2E}
24+
EndGlobalSection
25+
EndGlobal
Lines changed: 110 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,110 @@
1+
# `DPC++ Discrete Cosine Transform` Sample
2+
3+
Discrete Cosine Transform (DCT) and Quantization are the first two steps in the JPEG compression standard. This sample demonstrates how DCT and Quantizing stages can be implemented to run faster using Data Parallel C++ (DPC++) by offloading the work of image processing to a GPU or other device.
4+
5+
For comprehensive instructions regarding DPC++ Programming, go to https://software.intel.com/en-us/oneapi-programming-guide and search based on relevant terms noted in the comments.
6+
7+
| Optimized for | Description
8+
|:--- |:---
9+
| OS | Linux* Ubuntu* 18.04; Windows 10
10+
| Hardware | Skylake with GEN9 or newer
11+
| Software | Intel® oneAPI DPC++/C++ Compiler;
12+
| What you will learn | How to parallel process image data using DPC++ for producing a Discrete Cosine Transform
13+
| Time to complete | 15 minutes
14+
15+
16+
## Purpose
17+
18+
DCT is a lossy compression algorithm that is used to represent every data point value using the sum of cosine functions, which are linearly orthogonal to each other. The program shows the possible effect of quality reduction in the image when one performs DCT, followed by quantization as found in JPEG compression.
19+
20+
This program generates an output image by first processing an input image using DCT and quantization, undoing the process through Inverse DCT and de-quantizing, and then writing the output to a BMP image file. The processing stage is performed on 8x8 subsections of the pixel image, referred to as 'blocks' in the code sample.
21+
22+
Since individual blocks of data can be processed independently, the overall image can be decomposed and processed in parallel, where each block represents a work item. Using DPC++, parallelization can be implemented relatively quickly, and with few changes to a serial version. The blocks are processed in parallel using the SYCL parallel_for(), and the code will attempt to first execute on an available GPU and fallback to the system's CPU if a compatible GPU is not detected. The device used for the compilation is displayed in the output along with elapsed time to render the processed image.
23+
24+
The DCT process converts image data from the pixel representation (where the color value of each pixel is stored) to a sum of cosine representation, where the color pattern of subsets of the image is represented as the sum of multiple cosine functions. In an 8x8 image, only eight discrete cosine functions are needed to produce the entire image, and the only information needed to reconstruct the image is the coefficient associated with each cosine function. This is why the image is processed in 8x8 blocks. The DCT process converts an 8x8 matrix of pixels into an 8x8 matrix of these coefficients.
25+
26+
The quantizing process is what allows this data to be compressed to a smaller size than the original image. Each element of the matrix yielded by the DCT process is divided by a corresponding element of a quantizing matrix. This quantizing matrix is designed to reduce the number of coefficients required to represent the image, by prioritizing the cosine functions which are most significant to the image's definition. The resulting matrix from this quantization step will look like a series of numbers followed by many zeros if read diagonally (which is how the data is stored in memory, allowing the large series of zeros to be compressed).
27+
28+
The Code Sample can be run in two different modes, based on preprocessor definitions supplied at compile-time:
29+
30+
* With no preprocessor definition, the code will run the DCT process one time on the input image and write it to the output file.
31+
* With PERF_NUM enabled, the code will process the input image five times, write it to the output, and print the average execution time to the console.
32+
33+
Because the image undergoes de-quantizing and IDCT before being written to a file, the output image data will not be more compact than the input. However, it will reflect the image artifacts caused by lossy compression methods such as JPEG.
34+
35+
36+
## Key Implementation Details
37+
38+
The basic DPC++ implementation explained in the code includes device selector, buffer, accessor, kernel, and command groups.
39+
40+
The ProcessImage() function uses a parallel_for() to calculate the index of each 8x8 block. It passes that index to the ProcessBlock() function, which performs the DCT and Quantization steps, along with de-quantization and IDCT.
41+
42+
The DCT representation is calculated through the multiplication of a DCT matrix (created by calling the CreateDCT() function) by a given color channel's data matrix, with the resulting matrix then multiplied by the inverse of the DCT matrix. The quantization calculation is performed through the division of each element of the resulting matrix by its corresponding element in the chosen quantization matrix. The inverse operations are performed to produce the de-quantized matrix and then the raw image data.
43+
44+
45+
## License
46+
47+
This code sample is licensed under MIT license.
48+
49+
50+
## Building the `DPC++ Discrete Cosine Transform` Program for CPU and GPU
51+
52+
### Running Samples In DevCloud
53+
If running a sample in the Intel DevCloud, remember that you must specify the compute node (CPU, GPU, FPGA) as well whether to run in batch or interactive mode. For more information see the Intel® oneAPI Base Toolkit Get Started Guide (https://devcloud.intel.com/oneapi/get-started/base-toolkit/)
54+
55+
### On a Linux* System
56+
Perform the following steps:
57+
1. Build the program with `cmake` using the following shell commands.
58+
From the root directory of the DCT project:
59+
```
60+
$ mkdir build
61+
$ cd build
62+
$ cmake .. (or "cmake -D PERF_NUM=1 .." to enable performance test)
63+
$ make
64+
```
65+
66+
2. Run the program:
67+
```
68+
make run
69+
```
70+
71+
3. Clean the program using:
72+
```
73+
make clean
74+
```
75+
76+
### On a Windows* System Using Visual Studio* Version 2017 or Newer
77+
* Build the program using VS2017 or VS2019
78+
Right-click on the solution file and open using either VS2017 or VS2019 IDE.
79+
Set the configuration to 'Intel Release' for normal execution or 'Intel Performance Test' to take performance metrics.
80+
Right-click on the project in Solution Explorer and select Rebuild.
81+
82+
To run:
83+
From the top menu, select Debug -> Start without Debugging.
84+
85+
* Build the program using MSBuild
86+
Open "Intel oneAPI command prompt for Microsoft Visual Studio 2019" and use your shell of choice to navigate to the DCT sample directory
87+
Run command - MSBuild DCT.sln /t:Rebuild /p:Configuration="Intel Release" (or Configuration="Intel Performance Test" for performance tabulation)
88+
89+
To run:
90+
Run command - '.\x64\Intel Release\DCT.exe' ./res/willyriver.bmp ./res/willyriver_processed.bmp
91+
92+
93+
## Running the Sample
94+
95+
### Application Parameters
96+
Different levels of quantization can be set by changing which of the quant[] array definitions is used inside of ProcessBlock(). Uncomment the chosen quantization level and leave the others commented out.
97+
98+
The queue definition in ProcessImage() uses the SYCL default selector, which will prioritize offloading to GPU but will run on the host device if none is found. You can force the code to run on the CPU by changing default_selector{} to cpu_selector{} on line 220.
99+
100+
### Example of Output
101+
```
102+
Filename: ..\res\willyriver.bmp W: 5184 H: 3456
103+
104+
Start image processing with offloading to GPU...
105+
Running on Intel(R) UHD Graphics 620
106+
--The processing time is 6.27823 seconds
107+
108+
DCT successfully completed on the device.
109+
The processed image has been written to ..\res\willyriver_processed.bmp
110+
```
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
Copyright Intel Corporation
2+
3+
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
4+
5+
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
6+
7+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
8+

0 commit comments

Comments
 (0)