Skip to content

A testing ground for various niche rendering techniques and approaches. Mostly focussed on unity and the idea of precomputing certain aspects of the rendering pipeline. Not well organized and without a clear roadmap. Just the results of messing around with and learning about computer graphics.

Notifications You must be signed in to change notification settings

wanjawischmeier/pre-rendering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

What is this?

A testing ground for various niche rendering techniques and approaches. Mostly focussed on unity and the idea of precomputing certain aspects of the rendering pipeline. Not well organized and without a clear roadmap. Just the results of messing around with and learning about computer graphics. Interactive demos of some of these things can be found here.

Reprojection of high-res depth maps is a very computationally intense task. But a hybrid rasterizer (a tile binned software rasterizer for regions of small tris and the hardware rasterization pipeline for the rest) might be able to provide a fast and efficient way to archieve this. The basic flow of the compute shader is as follows:

Basic flow

  1. Pass (gets dispatched with one thread per pixel in input texture)
    • Sample the depth texture for each pixel
    • Compute and transform the respective 3d vetex position
    • Store it in lookup texture (shown as "Vertex Buffer" in example images)
  2. Pass (gets dispatched with one thread per entry in vertex buffer)
    • Each point in the vertex buffer is responsible for handling the quad that it shares with its bottom, right and bottom right neighbors. This ensures complete coverage of the surface.
    • A thread samples the 4 vertices that create the quad
      • Backfacing quads get culled here
      • Degenerate quads get filled with a single InterlockedMin operation on the target texture
    • Quad type gets computed (allows for some optimizations if none or only on tri of the quad need(s) to be rasterized)
    • The target texture is covered by 2 grids that are offset by half the tile size. A method checks if the quad fits into a cell of either of those grids.
      • If it fits into Grid A or Grid B, it is small enough to get tile binned and therefore efficiently software rasterized using the next compute shader pass. The quad gets stored in the respective tile buffer (before that, a packed aabb bounding box gets calculated and stored alongside).
      • Otherwise, the quad gets pushed to an AppendStructuredBuffer (has support for atomic operations) for rasterization using unitys standard rendering pipeline.
  3. Pass (one dispatch per pixel in output texture)
    • This thread just has to iterate over the all quads in the tile it's contained in (just the aabb's could be loaded into groupshared memory in the future, one by each thread).
    • For each quad, it first checks if it contains the point with an extremely fast check of the quad's aabb. The actual vertices only need to get sampled from the vertex buffer if a quad passes that check.
    • If the actual quad vertices also contain the texel, an InterlockMin write to the output texture is performed (could be done within a tile's groupshared memory in the future to reduce writes operations on the global target texture to a single non-atomic write).
  4. Final post processing shader (responsible for combining the hardware and software rasterizer output).

Memory layout

The hardware vertex buffer is comprized of simple uints, where each one holds the quad type (which partial triangles are valid) in the last 2 bits and the index in the vertex buffer in the remaining bits. The software vertex buffer packs tile index, vertex buffer index, quad type and aabb into a uint3. The uint tile counters are responsible for keeping track of the number of quads being binned into each tile (using a simple InterlockedAdd).

Images

Screenshot 2025-10-09 202712

A simple video decoder using OpenCV in C++. The idea was to create a decoder that can be integrated asynchronously into a potential unity render pipeline. That can run in another thread and pass decoded data to a shader running within the unity environment with minimal overhead. This actually ended up working pretty well by utilizing the following structure:

  • A C++ DLL that holds the OpenCV instance, implements some callbacks and provides the necessary wrapper functionality
  • A C# script in unity that uses an atomic safety handle and a native array to directly pass the frame data that was decoded by the DLL to the shader without the need for a single copy operation on the CPU side
  • A HLSL shader that is able to read the frame buffer provided by OpenCV and render the decoded image based on that

The C++ DLL exposes the following methods and callbacks:

struct VideoInfo
{
	int width, height, fps;
	size_t frame_count;
};

FrameCallback frame_ready;
ErrorMessage error_callback;
VideoInfo video_info;

extern "C" DECODER uchar** InitializeDecoder(
	char* videoPath, int threads,
	FrameCallback frameCallback, ErrorMessage errorCallback,
	VideoInfo &rInfo);
extern "C" DECODER size_t CurrentFrame(int threadIdx);
extern "C" DECODER bool Seek(size_t frameIdx, int threadIdx);
extern "C" DECODER bool Read(size_t frameIdx, int threadIdx);
extern "C" DECODER bool ReadImage(char* path, int threadIdx);
extern "C" DECODER void ReleaseDecoder();

Getting the shader right was actually quite tricky, here are some of the iterations it took (more can be found here):

chunk_outimg3 chunk_outimg5 chunk_outimg7
outimg outimg3 outimg5

Optimal chunk width and decoding times

The ideal size of a chunk is heavily dependent on seek and frame times (frame time = time to decode one frame). Below is a look at how the optimal chunk size changes with different resolutions and decoding stats:

Installation and Setup

Setting up the build environment
  1. Download and install the OpenCV binaries (tested with v4.5.5)

  2. Download and install Visual Studio (tested with Visual Studio 2022 Community)

    • Make sure to include the Desktop Development with C++ workload when running the installer

  3. Open the pre-rendering/src/video-decoder/video-decoder.sln solution in Visual Studio

  4. Go to View > Other Windows > Property Manager

  5. Expand any configuration and open the LibraryPaths Property Sheet

  6. Go to Common Properties > User Macros

  7. Enter the path of your OpenCV installation as the value for the OpenCV macro (e.g. C:/libraries/opencv)

  8. Hit OK, Apply, and then OK again

  9. Open a terminal inside the repo and run the following command

    git update-index --assume-unchanged .\src\video-decoder\LibraryPaths.props

  10. Add the path of your OpenCV binaries as an enviromnment variable (e.g. C:/libraries/opencv/build/x64/vc15/bin)

The installation process should be complete now. Try building the video-decoder solution by pressing STRG + B


So in retrospect i don't really see why I thought this could work (the algorithm just ended up not nearly converging fast enough), but it still made for a really interesting experiment. Especially the visuals were stunning. More images can be found here.

The basic downhill simplex algorithm used
float2 downhillSimplex(float2 x0, float2 x1, float2 x2) {
  // initialization
  float3 b = float3(x0, objective(x0));
  float3 g = float3(x1, objective(x1));
  float3 w = float3(x2, objective(x2));

  [unroll(ITERATIONS)] for (int i = 0; i < ITERATIONS; i++) {
    // sort
    float3 t;

    if (b.z > g.z) {
      t = g;
      g = b;
      b = t;
    }

    if (g.z > w.z) {
      t = g;
      g = w;
      w = t;

      if (b.z > g.z) {
        t = g;
        g = b;
        b = t;
      }
    }

    // midpoint
    float3 m;
    m.xy = (g + b) / 2;

    // reflection
    float3 r;
    r.xy = m.xy + ALPHA * (m.xy - w.xy);
    r.z = objective(r.xy);

    if (r.z < g.z)
      w = r;

    else {
      if (r.z < w.z) w = r;

      float3 h;
      h.xy = (w.xy + m.xy) / 2.0;  // try int 2
      h.z = objective(h.xy);

      if (h.z < w.z) w = h;
    }

    // expansion
    if (r.z < b.z) {
      float3 e;
      e.xy = m.xy + GAMMA * (r.xy - m.xy);
      e.z = objective(e.xy);

      if (e.z < r.z)
        w = e;

      else
        w = r;
    }

    // contraction
    if (r.z > g.z) {
      float3 c;
      c.xy = m.xy + BETA * (w.xy - m.xy);
      c.z = objective(c.xy);

      if (c.z < w.z) w = c;
    }
  }

  return b.xy;
}

fixed4 frag(v2f i) : SV_Target {
  // fixed4 col = tex2D(_MainTex, i.uv);
  float err = objective(i.uv);
  float2 opt = downhillSimplex(i.uv, X1, X2);

  fixed4 col = fixed4(opt.xy * FAC + OFF, tan(1 - opt.x), 1);
  return col;
}

More details can be found here.

downhill_simplex_lowd2 downhill_simplex_abstract7 downhill_simplex_abstract6
downhill_simplex_abstract5 downhill_simplex_abstract3 downhill_simplex_abstract4

Camera robot

A way to generate panorama images in a scanline path (as the blender plugin does virtually) in the real world would be nice. I tried building a Lego Mindstorms robot that is able to have a 2 axis camera arm and still drive around using tank steering. But I only had 3 motors. Eventually got it working using a ratcheting mechanism, but it was pretty janky and wobbly.

output PXL_20210723_083920526

A blender plugin that allows you to

  • Dynamically create scanline paths for a camera to take
  • Set up a node network for the camera to render with equirectangular projection
  • Creates a compositor group that allows users to generate "map" files from a video render
  • Write a config file to be read as part of that map by a unity loader script

Builds for various iterations of this plugin can be found here.

grafik

I wanted to be able to encode the depth information required for reprojection in video files for quick hardware accelerated decoding. But those formats were often limited to a percision of 8 bits, which is insufficient for depth information. So I experimented splitting 16 bit depth information into two 8 bit channels (and later recombining them).

v1 v2

Storing alternative color channels

The idea was to maybe not store diffuse color as is usually the case. But rather raw properties and then apply them dynamically in the rendering pipeline. This never went very far though.

diffuse emision intensity inverted
diffuse emision intensity inverted

Codec testing

Losless storage isn't viable in most cases, so depth information stored in images will also be subjected to compression artifacts. I tried to take a look at how different file formats compress depth and normal information (particularly at the edges). The files can be found here.

Image

More data

More extensive data from various tests (about 10GB as of now) can be found here. Feel free to check it out (the Images folder has lots of interesting screenshots from the different experiments).

Credit

https://github.com/bodhid/UnityEquiCam

@inproceedings{zhang2018single, title = {Single Image Reflection Separation with Perceptual Losses}, author = {Zhang, Xuaner and Ng, Ren and Chen, Qifeng} booktitle = {IEEE Conference on Computer Vision and Pattern Recognition}, year = {2018} }

About

A testing ground for various niche rendering techniques and approaches. Mostly focussed on unity and the idea of precomputing certain aspects of the rendering pipeline. Not well organized and without a clear roadmap. Just the results of messing around with and learning about computer graphics.

Resources

Stars

Watchers

Forks