What is this?

A testing ground for various niche rendering techniques and approaches. Mostly focussed on unity and the idea of precomputing certain aspects of the rendering pipeline. Not well organized and without a clear roadmap. Just the results of messing around with and learning about computer graphics. Interactive demos of some of these things can be found here.

Hybrid Rasterizer

Reprojection of high-res depth maps is a very computationally intense task. But a hybrid rasterizer (a tile binned software rasterizer for regions of small tris and the hardware rasterization pipeline for the rest) might be able to provide a fast and efficient way to archieve this. The basic flow of the compute shader is as follows:

Basic flow

Pass (gets dispatched with one thread per pixel in input texture)
- Sample the depth texture for each pixel
- Compute and transform the respective 3d vetex position
- Store it in lookup texture (shown as "Vertex Buffer" in example images)
Pass (gets dispatched with one thread per entry in vertex buffer)
- Each point in the vertex buffer is responsible for handling the quad that it shares with its bottom, right and bottom right neighbors. This ensures complete coverage of the surface.
- A thread samples the 4 vertices that create the quad
  - Backfacing quads get culled here
  - Degenerate quads get filled with a single InterlockedMin operation on the target texture
- Quad type gets computed (allows for some optimizations if none or only on tri of the quad need(s) to be rasterized)
- The target texture is covered by 2 grids that are offset by half the tile size. A method checks if the quad fits into a cell of either of those grids.
  - If it fits into Grid A or Grid B, it is small enough to get tile binned and therefore efficiently software rasterized using the next compute shader pass. The quad gets stored in the respective tile buffer (before that, a packed aabb bounding box gets calculated and stored alongside).
  - Otherwise, the quad gets pushed to an AppendStructuredBuffer (has support for atomic operations) for rasterization using unitys standard rendering pipeline.
Pass (one dispatch per pixel in output texture)
- This thread just has to iterate over the all quads in the tile it's contained in (just the aabb's could be loaded into groupshared memory in the future, one by each thread).
- For each quad, it first checks if it contains the point with an extremely fast check of the quad's aabb. The actual vertices only need to get sampled from the vertex buffer if a quad passes that check.
- If the actual quad vertices also contain the texel, an InterlockMin write to the output texture is performed (could be done within a tile's groupshared memory in the future to reduce writes operations on the global target texture to a single non-atomic write).
Final post processing shader (responsible for combining the hardware and software rasterizer output).

Memory layout

The hardware vertex buffer is comprized of simple uints, where each one holds the quad type (which partial triangles are valid) in the last 2 bits and the index in the vertex buffer in the remaining bits. The software vertex buffer packs tile index, vertex buffer index, quad type and aabb into a uint3. The uint tile counters are responsible for keeping track of the number of quads being binned into each tile (using a simple InterlockedAdd).

Images

C++ Video decoder

A simple video decoder using OpenCV in C++. The idea was to create a decoder that can be integrated asynchronously into a potential unity render pipeline. That can run in another thread and pass decoded data to a shader running within the unity environment with minimal overhead. This actually ended up working pretty well by utilizing the following structure:

A C++ DLL that holds the OpenCV instance, implements some callbacks and provides the necessary wrapper functionality
A C# script in unity that uses an atomic safety handle and a native array to directly pass the frame data that was decoded by the DLL to the shader without the need for a single copy operation on the CPU side
A HLSL shader that is able to read the frame buffer provided by OpenCV and render the decoded image based on that

The C++ DLL exposes the following methods and callbacks:

struct VideoInfo
{
	int width, height, fps;
	size_t frame_count;
};

FrameCallback frame_ready;
ErrorMessage error_callback;
VideoInfo video_info;

extern "C" DECODER uchar** InitializeDecoder(
	char* videoPath, int threads,
	FrameCallback frameCallback, ErrorMessage errorCallback,
	VideoInfo &rInfo);
extern "C" DECODER size_t CurrentFrame(int threadIdx);
extern "C" DECODER bool Seek(size_t frameIdx, int threadIdx);
extern "C" DECODER bool Read(size_t frameIdx, int threadIdx);
extern "C" DECODER bool ReadImage(char* path, int threadIdx);
extern "C" DECODER void ReleaseDecoder();

Getting the shader right was actually quite tricky, here are some of the iterations it took (more can be found here):

Optimal chunk width and decoding times

The ideal size of a chunk is heavily dependent on seek and frame times (frame time = time to decode one frame). Below is a look at how the optimal chunk size changes with different resolutions and decoding stats:

Installation and Setup

Setting up the build environment

Download and install the OpenCV binaries (tested with v4.5.5)
Download and install Visual Studio (tested with Visual Studio 2022 Community)
- Make sure to include the Desktop Development with C++ workload when running the installer
Open the pre-rendering/src/video-decoder/video-decoder.sln solution in Visual Studio
Go to View > Other Windows > Property Manager
Expand any configuration and open the LibraryPaths Property Sheet
Go to Common Properties > User Macros
Enter the path of your OpenCV installation as the value for the OpenCV macro (e.g. C:/libraries/opencv)
Hit OK, Apply, and then OK again
Open a terminal inside the repo and run the following command

git update-index --assume-unchanged .\src\video-decoder\LibraryPaths.props
Add the path of your OpenCV binaries as an enviromnment variable (e.g. C:/libraries/opencv/build/x64/vc15/bin)

The installation process should be complete now. Try building the video-decoder solution by pressing STRG + B

Downhill simplex approximation of a projection

So in retrospect i don't really see why I thought this could work (the algorithm just ended up not nearly converging fast enough), but it still made for a really interesting experiment. Especially the visuals were stunning. More images can be found here.

The basic downhill simplex algorithm used

float2 downhillSimplex(float2 x0, float2 x1, float2 x2) {
  // initialization
  float3 b = float3(x0, objective(x0));
  float3 g = float3(x1, objective(x1));
  float3 w = float3(x2, objective(x2));

  [unroll(ITERATIONS)] for (int i = 0; i < ITERATIONS; i++) {
    // sort
    float3 t;

    if (b.z > g.z) {
      t = g;
      g = b;
      b = t;
    }

    if (g.z > w.z) {
      t = g;
      g = w;
      w = t;

      if (b.z > g.z) {
        t = g;
        g = b;
        b = t;
      }
    }

    // midpoint
    float3 m;
    m.xy = (g + b) / 2;

    // reflection
    float3 r;
    r.xy = m.xy + ALPHA * (m.xy - w.xy);
    r.z = objective(r.xy);

    if (r.z < g.z)
      w = r;

    else {
      if (r.z < w.z) w = r;

      float3 h;
      h.xy = (w.xy + m.xy) / 2.0;  // try int 2
      h.z = objective(h.xy);

      if (h.z < w.z) w = h;
    }

    // expansion
    if (r.z < b.z) {
      float3 e;
      e.xy = m.xy + GAMMA * (r.xy - m.xy);
      e.z = objective(e.xy);

      if (e.z < r.z)
        w = e;

      else
        w = r;
    }

    // contraction
    if (r.z > g.z) {
      float3 c;
      c.xy = m.xy + BETA * (w.xy - m.xy);
      c.z = objective(c.xy);

      if (c.z < w.z) w = c;
    }
  }

  return b.xy;
}

fixed4 frag(v2f i) : SV_Target {
  // fixed4 col = tex2D(_MainTex, i.uv);
  float err = objective(i.uv);
  float2 opt = downhillSimplex(i.uv, X1, X2);

  fixed4 col = fixed4(opt.xy * FAC + OFF, tan(1 - opt.x), 1);
  return col;
}

More details can be found here.

Camera robot

A way to generate panorama images in a scanline path (as the blender plugin does virtually) in the real world would be nice. I tried building a Lego Mindstorms robot that is able to have a 2 axis camera arm and still drive around using tank steering. But I only had 3 motors. Eventually got it working using a ratcheting mechanism, but it was pretty janky and wobbly.

Blender plugin

A blender plugin that allows you to

Dynamically create scanline paths for a camera to take
Set up a node network for the camera to render with equirectangular projection
Creates a compositor group that allows users to generate "map" files from a video render
Write a config file to be read as part of that map by a unity loader script

Builds for various iterations of this plugin can be found here.

Image channel combining

I wanted to be able to encode the depth information required for reprojection in video files for quick hardware accelerated decoding. But those formats were often limited to a percision of 8 bits, which is insufficient for depth information. So I experimented splitting 16 bit depth information into two 8 bit channels (and later recombining them).

Storing alternative color channels

The idea was to maybe not store diffuse color as is usually the case. But rather raw properties and then apply them dynamically in the rendering pipeline. This never went very far though.

diffuse	emision	intensity	inverted

Codec testing

Losless storage isn't viable in most cases, so depth information stored in images will also be subjected to compression artifacts. I tried to take a look at how different file formats compress depth and normal information (particularly at the edges). The files can be found here.

More data

More extensive data from various tests (about 10GB as of now) can be found here. Feel free to check it out (the Images folder has lots of interesting screenshots from the different experiments).

Credit

https://github.com/bodhid/UnityEquiCam

@inproceedings{zhang2018single, title = {Single Image Reflection Separation with Perceptual Losses}, author = {Zhang, Xuaner and Ng, Ren and Chen, Qifeng} booktitle = {IEEE Conference on Computer Vision and Pattern Recognition}, year = {2018} }

Name		Name	Last commit message	Last commit date
Latest commit History 542 Commits
.vs		.vs
.vscode		.vscode
demos		demos
src		src
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
_config.yml		_config.yml
dp1.png		dp1.png
dp2.png		dp2.png
dpm_bgr.png		dpm_bgr.png
dpm_ycrcb.png		dpm_ycrcb.png
megaignore		megaignore
overview.png		overview.png
ref.png		ref.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

What is this?

Hybrid Rasterizer

Basic flow

Memory layout

Images