Eric Mellino

RenderDoc Integration in Veldrid

2019-01-23T06:00:00+00:00

This post describes an upcoming feature in Veldrid 4.6.0

Due to the complexity of modern graphics APIs and techniques, it can often be difficult to identify the source of rendering bugs. API’s like Vulkan include validation layers which help alert you to invalid usage at runtime. However, there is still a large class of programming errors which can result in problems like the dreaded “black screen”.

Where’s my triangle?

Tools like RenderDoc let you inspect fine-grained details about the state of your system at each draw call, which can save a lot of valuable time that might otherwise be spent agonizing over the details of your code. Problems like a bad depth test, bad face culling, or an invalid scissor rectangle can easily be identified and dealt with.

Although most of its functionality is easily accessible through the graphical UI, RenderDoc also exposes an in-application API. This allows you to configure and trigger RenderDoc captures from inside any part of your application. While RenderDoc’s graphical UI can only capture entire frames at once (which may include lots of extraneous information), the in-application API is more flexible, and can be used to capture a small portion of a frame, or some process that doesn’t occur strictly within a frame: perhaps a one-time render pass for an environment map (performed during startup), or a periodic compute pass. Capturing a smaller part of your code, and having the flexibility to start and stop the capture when you want, makes it much easier to isolate potentially problematic areas from each other.

Veldrid Support

The upcoming Veldrid 4.6.0 release includes a new helper library, Veldrid.RenderDoc, which is a simple .NET wrapper for RenderDoc’s in-application API. With it, you can configure, collect, and save RenderDoc captures from inside your application. Additionally, you can launch and manage the “Replay UI” from within your app to quickly debug a capture.

Veldrid.RenderDoc relies on the RenderDoc shared library being available on the library load path, or its path can be passed in explicitly:

public class RenderDoc
{
    public static bool Load(out RenderDoc renderDoc);
    public static bool Load(string renderDocLibPath, out RenderDoc renderDoc);
}

Once a RenderDoc instance is obtained, a variety of settings can be tweaked, and then captures can be taken quite easily:

RenderDoc.Load(out RenderDoc rd); // Load RenderDoc from the default locations.

rd.SetCaptureSavePath("my/special/path"); // Save captures into a particular folder.

rd.TriggerCapture(); // Capture the next frame.

rd.StartFrameCapture(); // Start capturing.
// Submit graphics or compute work to a Veldrid.GraphicsDevice.
rd.EndFrameCapture(); // Stop capturing and save.

rd.LaunchReplayUI(); // Launch the replay UI, with previous captures already loaded in.

In order for RenderDoc to successfully hook into the graphics API being used, a RenderDoc instance must be created before the Veldrid.GraphicsDevice that you wish to debug has been created. If your application allows it, you can re-create your GraphicsDevice after loading RenderDoc in order to allow it to hook the necessary functions. The Veldrid sample application does this, and allows RenderDoc to be loaded at any time, even after startup.

Platform Support

Veldrid.RenderDoc can be used on Windows and Linux. On Windows, RenderDoc.Load will attempt to load the shared library from the standard global install path, if it exists. On Linux, librenderdoc.so should be on the library load path, or the Load overload that accepts a full path should be used.

RenderDoc currently supports capturing and debugging Vulkan, Direct3D, OpenGL, and OpenGL ES applications. When using Veldrid, the Metal backend is the only one that RenderDoc cannot capture and debug.

Overhauling ImGui.NET

2018-10-12T01:00:00+00:00

I’m happy to finally release the new, overhauled version of ImGui.NET. The library has been re-built from the ground up, utilizing the new auto-generated cimgui library and its associated tools. Previously, updating ImGui.NET was a very time-consuming and elaborate process, done by hand, and it often lagged behind official Dear ImGui releases. Instead of continuing that, I’ve implemented a code generator which processes cimgui’s pre-parsed data files and spits out a bunch of C# code automatically. There are a lot of benefits to this new approach, and it will allow me to keep the library up to date much more easily and painlessly. I’ve also taken this opportunity to improve the usability of the library a great deal, and to line up the C# interface and versioning more closely with C++.

Layers and Design

As before, there are two layers to ImGui.NET: the raw, unsafe native layer (ImGuiNative), and the safe, C#-friendly layer (ImGui). Previously, a very common problem was some functionality being available to the low-level layer, but not to the friendly high-level layer. Since both layers are automatically generated now, there are no gaps between the two. Additionally, new features added to Dear ImGui will automatically surface in both places in ImGui.NET. In most cases, users of ImGui.NET should never need to touch unsafe code, and the high-level ImGui class will suffice. For advanced scenarios, low-level access may still be more convenient, and the option to drop down to ImGuiNative remains.

It is also painless to utilize the auto-generation machinery to create a different version of ImGui.NET for an experimental branch of Dear ImGui – for example, the docking branch. Pointing ImGui.NET’s code generator at the processed output from that branch will give you a fully-usable library exposing all of the new functions and types, including safe wrappers.

Safety

Some care has been taken to automatically generated safe wrapper code for the library. In many places, Dear ImGui expects that you will interact with it through various pointers to structures. In order to simplify and protect these patterns in C#, I’ve introduced a number of “Ptr”-suffixed structures, each of which represent a specific typed pointer. ImGui.GetIO, for example, now returns an ImGuiIOPtr, which is a safe struct wrapper over the native ImGuiIO* that the function returns. It provides safe access to all of that type’s members and functions, and requires no unsafe code to use. Unlike the previous version, it allocates no garbage-collected memory at all, and does not make any copies when fields are accessed, utilizing managed references instead. These “Ptr” structures should be viewed as a thin wrapper over a pointer, and are implicitly convertible back and forth with those native types.

Previously-opaque structures like ImVector<T> are also much more friendly to C# than they were before, and give you safe, copy-free access to individual elements inside the vector.

Breaking Changes

ImGui.NET 1.65.0 introduces some breaking changes if you are upgrading from 0.4.7. Many structure, enum, method, and parameters have been renamed so that they are identical to their C++ counterparts. These should be viewed as one-time breaks; future versions of ImGui.NET will continue to match the C++ naming as closely as possible.

As a result of the bump to Dear ImGui 1.65, there are also several functions that have been removed or deprecated. This category of breaking change is documented well in Dear ImGui’s release notes.

Versioning

Going forward, ImGui.NET will use a versioning scheme that more closely lines up with native Dear ImGui. To start, this initial NuGet package will be versioned 1.65.0, corresponding to v1.65 of Dear ImGui.

Note that previous releases of ImGui.NET were versioned from 0.1.0+. Version 1.65.0 contains breaking changes from 0.4.7 (the last release of the previous series), and will not necessarily maintain compatibility between updates. Going forward, I intend to inherit the deprecations and removals from Dear ImGui itself, rather than maintain strict binary compatibility between versions of ImGui.NET.

Veldrid Support for SPIR-V Shaders

2018-06-25T19:00:00+00:00

Veldrid is a low-level graphics library written in C# that allows you to create GPU-accelerated applications targeting wide variety of platforms, without dealing with platform-specific graphics APIs. Although Veldrid aims to be as portable as possible, one pain point has always been shader code, which differs between platforms. Writing your shaders multiple times is error-prone, limits your portability, and can quickly become a big hassle. Other portable graphics libraries and game engines take different approaches to tackling this problem. Many libraries support a single “official” shading language (often HLSL or a variant, but occasionally a custom shading language) and translate it into a number of shading languages, depending on the graphics APIs being targeted. My ShaderGen project can be seen as one such custom shading language.

A very promising new option for portable shaders is the Khronos Group’s SPIR-V language.

SPIR-V is a binary intermediate language for representing graphical-shader stages and compute kernels.

SPIR-V is a simple bytecode language for graphics and compute that can be targeted from several languages (including GLSL and HLSL), with more in development. There are also a variety of post-processing tools, optimizers, and debugging utilities available for it. Overall, it is a well-supported language with a very healthy and productive ecosystem developing around it.

Today, I’m releasing a Veldrid extension library called Veldrid.SPIRV, which provides support for loading SPIR-V shaders on all of Veldrid’s supported backends. Veldrid.SPIRV is built on top of SPIRV-Cross, a library for translating SPIR-V bytecode into several high-level shading languages. With Veldrid.SPIRV, you can write your shaders in any language targeting SPIR-V and use them easily with Veldrid.

Veldrid.SPIRV is available on NuGet.org:

Veldrid Support

Veldrid.SPIRV exposes several extension methods on ResourceFactory which allow you to create Shaders from SPIR-V bytecode. In order to create a Veldrid Pipeline from SPIR-V, you need to provide the bytecode for all shader stages being used. This is because Veldrid.SPIRV needs to be aware of the full set of shader resources (Buffers, Textures, and Samplers) used by a Pipeline in order to assign the correct “slots” for each resource.

Based on the type of ResourceFactory passed in, Veldrid.SPIRV will figure out which target language is needed, and will automatically generate the appropriate shader code and compile it for you. Most people will just need these two extension methods:

// Create a set of Shaders usable in a graphics Pipeline.
public static Shader[] CreateFromSpirv(
    this ResourceFactory factory,
    ShaderDescription vertexShaderDescription,
    ShaderDescription fragmentShaderDescription,
    CrossCompileOptions options);

// Create a Shader usable in a compute Pipeline.
public static Shader CreateFromSpirv(
    this ResourceFactory factory,
    ShaderDescription computeShaderDescription,
    CrossCompileOptions options);

Specialization Constants

SPIR-V and Vulkan have support for “Specialization Constants”, which are an interesting feature providing greater flexibility to shaders. Specialization Constants are constants within a shader program that can be substituted with new values when a Pipeline is created. Likewise, Metal shaders can contain “function constants”, which serve roughly the same purpose. I’ve added support for both of these concepts to Veldrid through a new SpecializationConstant type. When constructing a new Pipeline, you can provide an array of SpecializationConstants which will influence the behavior of your shaders.

Here is an example fragment shader which contains several Specialization Constants. When you create one or more Pipelines with this shader, you can override these values without generating new SPIR-V bytecode or re-compiling your shader at all.

#version 450

layout (set = 0, binding = 0) uniform texture2D Tex;
layout (set = 0, binding = 1) uniform sampler Smp;

layout (constant_id = 0) const bool UseTexture = false;
layout (constant_id = 1) const bool FlipTexture = false;

layout (constant_id = 2) const float RedChannel = 0.1f;
layout (constant_id = 3) const float GreenChannel = 0.1f;
layout (constant_id = 4) const float BlueChannel = 0.1f;

layout (location = 0) in vec2 fsin_TexCoords;
layout (location = 0) out vec4 fsout_Color0;

void main()
{
    if (UseTexture)
    {
        vec2 uv = fsin_TexCoords;
        if (FlipTexture) { uv.y = 1 - uv.y; }
        fsout_Color0 = texture(sampler2D(Tex, Smp), uv);
    }
    else
    {
        fsout_Color0 = vec4(RedChannel, GreenChannel, BlueChannel, 1.0);
    }
}

Click here to see the compiled SPIR-V bytecode for this shader.

If you want to enable the “UseTexture” and “FlipTexture” flags and substitute different color channels in, you can write code like the following:

ShaderSetDescription shaderSetDesc = new ShaderSetDescription(
    vertexLayoutDescriptions,
    new Shader[] { vertexShader, fragmentShader },
    new SpecializationConstant[]
    {
        new SpecializationConstant(0, true), // UseTexture = true
        new SpecializationConstant(1, true), // FlipTexture = true
        new SpecializationConstant(2, 0.95f), // RedChannel = 0.95f
        new SpecializationConstant(3, 0.0f), // GreenChannel = 0f
        new SpecializationConstant(4, 0.5f), // BlueChannel = 0.5f
    });

If this ShaderSetDescription is used to create a Vulkan or Metal Pipeline, then the SpecializationConstant values listed in the array will replace the pre-defined constants in the shader. It is therefore trivial to create another Pipeline which substitutes different constant values by passing in a different array. SPIR-V Specialization Constants always contain default values, so providing SpecializationConstants is optional. You may override a subset (or none) of the Specialization Constants defined in the shader.

Unfortunately, HLSL and OpenGL-style GLSL do not support any kind of specialization constants. All constant values used in the shader must be baked into the shader itself when it is compiled. However, Veldrid.SPIRV allows you to substitute new values in for each Specialization Constant before the shader is translated from SPIR-V into the target language. In practice, this allows you to use SPIR-V shaders with all of Veldrid’s backends and still take advantage of the flexibility of Specialization Constants. If you want to produce GLSL or HLSL (bytecode) at build-time for your application (e.g. to improve load time), you will need to manage the “specialization matrix” yourself.

Extras

Veldrid.SPIRV also supports compiling GLSL code into SPIR-V, by wrapping Google’s shaderc compiler library. This gives you even more flexibility at runtime with your shaders. It’s possible to defer all shader compilation til runtime and still maintain full portability with Veldrid.

Limitations

Veldrid.SPIRV relies on a native shared library (libveldrid-spirv), currently packaged for Windows, Linux, and macOS. This is a fairly small component with few dependencies, and is not difficult to compile for additional platforms.

Writing a Portable CPU/GPU Ray Tracer in C#

2018-05-19T17:07:00+00:00

Ray tracing is getting a lot of hype lately. Lots of advanced rendering techniques are emerging that involve scene tracing of some kind, there are several frameworks for high-performance ray tracing, D3D12 is getting integrated support for ray tracing, and there’s a proposed Vulkan extension for the same. I’ve written (bad) ray tracers in the past, but it’s been a while and I wanted to get back up to speed with how things are done. Lots of people are following along with Ray Tracing in One Weekend by Peter Shirley, which is a nice, short book going over the fundamentals of ray tracing. The structure of my ray tracer is adapted roughly from the C++ code in that book.

However, I wanted to try something more interesting than just a simple translation to C#. Although .NET is quite fast these days, it still can’t compare to how fast a GPU will chew through computations for a ray tracer. Enter ShaderGen, a project which lets you author portable shader code in C#. You give it regular C# structures and methods, and it automatically converts them to HLSL, GLSL SPIR-V, and Metal shaders. I wanted to see how far I could get by pushing my C# code through ShaderGen and using the resulting compute shaders with Veldrid. In theory, I can use the same C# code to run my ray tracer on the CPU and the GPU, across a bunch of graphics API’s – Vulkan, Direct3D, Metal, and OpenGL.

Goals

Ahead of time, I knew of a few snags that I was going to hit when doing this, but these were my primary goals:

Write all of the ray tracing logic in C#, including the compute shaders running on the GPU.
As much as possible, use the same C# code on the CPU and GPU.
Use the same code to run on Vulkan, Direct3D, Metal, and OpenGL.

GPU-friendly Tracing

The structure in Peter Shirley’s book is a good starting point, but it needs to be changed a bit to run in a compute shader. The original code’s “Sphere” class contains a pointer to an abstract “Material” object which has a virtual function controlling how instances behave. This pattern won’t work in a compute shader. In my version, a Material is another simple, flat structure, and there’s an array of them sitting next to the array of Spheres – one Material per Sphere. Instead of a virtual function, the Material struct just contains an enumerated value identifying its type, and the tracing logic switches off of that. All of these patterns work perfectly fine in C#, but you might not reach for them if you were writing a regular CPU ray tracer.

Another obvious limitation is that GPU code cannot recurse. A simple ray tracer will recurse for each reflection and refraction ray, up to a max depth, because it’s elegant and clean. In a GPU tracer, you’ll instead need to use an explicit loop, bounded by the max depth, which accumulates the color of reflected and refracted rays.

Results

All of the code is available in the Veldrid Ray Tracer repo on GitHub. Check out the README file there for instructions on how to run the program.

Ultimately, I was able to share most of the code between the two versions. There were some limitations (see below), some of which I was aware of already, on how much code could be shared. Obviously, there is some baseline “entry point” code that differs between a compute shader and a C# application. A compute shader runs inside a “thread” set up by the graphics API and drivers. Thread scheduling and dispatch is handled automatically – all that’s needed is the code that grabs the predefined dispatch ID, converts it to a screen coordinate, and traces a ray from it. In the C# app, though, job scheduling is handled manually. For simplicity, I’ve just used the built-in Parallel.For method to loop over each row of pixels in the output texture. Each job then loops over its row of pixels and traces rays through the scene in the same way as the compute shader. There’s not much of a difference here, and this is only a small part of the ray tracer.

I set up two test scenes: one from the final chapter of Peter Shirley’s book, and the other from Aras Pranckevičius’s ToyPathTracer project (image above). I’ve run both scenes on a couple of my machines, on the CPU and on several graphics API’s. I wasn’t really trying to squeeze the best performance out of this project, but I think it’s interesting to see how the code runs in these different contexts. It’s a “brute-force” ray tracer, and there’s some obvious optimizations that aren’t in place that would make it a lot faster. Also, ShaderGen doesn’t yet support some code patterns that could have made the CPU version quite a bit faster, like readonly structs and “in” parameters.

Book scene: 100 spheres, 1280x720, 4 samples per pixel

Windows 10, Nvidia GTX 770, Core i7-4770K

Vulkan: 96 million rays / sec
Direct3D 11: 98 million rays / sec
OpenGL: 85 million rays / sec
CPU: 2 million rays / sec

macOS 10.13, Intel Iris Plus 640, Intel Core i5 2.3 GHz

Metal: 58 million rays / sec
CPU: 1.15 million rays / sec

ToyPathTracer scene: 46 spheres, 1280x720, 4 samples per pixel

Windows 10, Nvidia GTX 770, Core i7-4770K

Vulkan: 196 million rays / sec
Direct3D 11: 182 million rays / sec
OpenGL: 165 million rays / sec
CPU: 3.5 million rays / sec

macOS 10.13, Intel Iris Plus 640, Intel Core i5 2.3 GHz

Metal: 118 million rays / sec
CPU: 2.4 million rays / sec

Each graphics API on Windows seems to be in roughly the same ballpark, with OpenGL underperforming a little bit, as expected. All versions are using 100% of my GPU or CPU. I was actually surprised at how well the Metal version runs, since my MacBook has a pretty weak GPU.

Limitations

One of my goals was “use as much of the same code as possible on the CPU and GPU”. In any ray tracer, you need to loop over multiple collections of structures: shapes, materials, lights, etc. In my code, these are stored in Veldrid “StructuredBuffer” objects. These correspond to StructuredBuffers in HLSL, storage buffer blocks in GLSL, and buffer-backed “constant pointers” in Metal. All of these shader concepts are roughly equivalent and work well for fine-grained methods operating on individual elements. However, there’s no way to pass one of these buffer blocks around as a method parameter in GLSL, unless it is “fixed-size”. This doesn’t work so well for my ray tracer, unless I lock the size in at compile time. I’d really like to be able to resize these buffers and pass them around freely to different chunks of tracing logic in individual functions, but GLSL doesn’t support it.

What this means is that I wasn’t able to share all of the tracing logic. Any methods that access the scene data need to be duplicated between my CPU and GPU code, at least until I figure out a clever solution to the above. Luckily, in my current version, the only method that this affects is the top-level “Color” method. That’s the one that actually loops over all of the spheres and determines which one intersects with the current ray. Once the sphere and its “Material” are identified, the remaining logic doesn’t need to worry about any of the other objects in the scene. I have two versions of this method which are roughly identical. However, things would get ugly if I wanted to add a collection of lights or light-emitting spheres, for example. Even after I know which object is hit, I would then still need to loop over the collection of lights (or emissive objects) to determine how each affects the sphere. I’d have to pull out even more code into GPU- and CPU-specific chunks.

The “OpenGL ES flavor” of GLSL has another quirky limitation – you can’t read and write to a Texture unless it has a single-channel pixel format. Perhaps this limitation could be worked around, but for the time being my shaders don’t work on OpenGL ES.

Practicality

So how useful would this be in an actual game? Well, I’m not sure, but the possibilities are interesting. For most problems utilizing a compute shader, a CPU fallback won’t be fast enough. Algorithms designed for a GPU are also likely to be different from the optimal algorithm for a CPU, at least for more complex problems. For lighter-weight applications, like a very simple particle system, maybe it would work well enough on both sides. For systems that don’t support compute shaders (OpenGL on macOS, for example), you could fall back to updating your particle buffers on the CPU, with the same code powering both versions. If your particles aren’t doing anything particularly taxing, then the performance could be acceptable, and you would no longer need to maintain two versions of the same particle system code.

From another angle, it’s a lot easier to debug regular C# than it is to debug anything running on your GPU. Having the option to debug problematic areas from the regular VS debugger could be valuable in itself.

Most of the code in my project is based on Ray Tracing in One Weekend, which is a great little intro book on ray tracing.
Aras Pranckevičius has a fun series on ray tracing here. In it, he has built a C# tracer which is much more optimized than mine, as well as various versions in other languages.

Eric Mellino

RenderDoc Integration in Veldrid

Veldrid Support

Platform Support

Overhauling ImGui.NET

Layers and Design

Safety

Breaking Changes

Versioning

Veldrid Support for SPIR-V Shaders

Veldrid Support

Specialization Constants

Extras

Limitations

Writing a Portable CPU/GPU Ray Tracer in C#

Goals

GPU-friendly Tracing

Results

Book scene: 100 spheres, 1280x720, 4 samples per pixel

ToyPathTracer scene: 46 spheres, 1280x720, 4 samples per pixel

Limitations

Practicality

Related