Programmer's fault: Crazy Forward Renderer

I want to share very crazy idea of rendering many lights using Forward Rendering. We all know that we use LPP(Lights Pre-Pass) or DS(Deferred Shading), if want many lights... But GPU's memory bus bandwidth is the worst thing that we could imagine, so DS\LPP implementation could become bottlenecked(with MSAA situation can be MORE worser) because of this. Ok, we can optimize implementation using various tricks like: Depth Bounds Test(hardware or emulated through shaders), Stencil Testing, Scrissors, Tiling using CPU or Compute Shader, etc. But what about lightweight scenes? Yes, we can have tons of lights using those(LPP, DS) techniques. Do we always need tons of lights on scene? On real scenes we have 8-12 lights overall in camera's frustum, while LPP and DS can handle thousands. Because of bandwidth it's better to use Forward Rendering(FR) instead of LPP or DS and get better frame rates(not always, but sometimes this rule is true).
On FR we use Uber-Shader, to handle many lights within one shader program, but there is one problem. One scene we can have object influenced by different quantity of lights: so, we need to setup every shader per-object. When we draw another object with other quantity of influenced lights, we switch renderer to another shader. This operation is very expensive, which causes many states switching inside driver. In real-world application we can "sit" inside driver a half time of rendering, up to 10 ms, that's really bad.

If we want to use multiple lights within shader firstly we remember "for" operator. Let's imagine something like this:

/// Here we iterate through all lights to calculate lighting
/// Parameter MAXLIGHTS is a maximum lights per-object
for( int i = 0; i < MAXLIGHTS; i++ ) {
    if( pixel influenced by light[i] ) {
    /// Some calculations to do lighting
    ...
    /// Write result
    ...
    }
}

This is really basic light iteration-calculation, it has some caveats: cycle 'for' and condition 'if' embranchement of first priority. Let's remember some technical information about GPU. First and main: GPU DOESN'T have command stack, so it will be painful to execute interdependent conditions. Second is a "wavefront" problem, when processors wait each other, because one depends on results from other processor(this is true, when processors get into deeper condition branch than previous branch. But this is false, when condition branches are the same on all processors). These all problems are in GPU(but I beleave, IHVs will rework their GPU and jump over these problems). Programmer may think "The problem is in N condition branch/cycle! F*** it!" and may delete unnecesseary cycle/branch. But stop! Let's think: Parallax Occlusion Mapping has branching with a lot of 'if's and 'while' and works good on old videocards like NVIDIA Geforce 6xxx/7xxx and ATI Radeon X1xxx, so there is something another.

Another thing is to unroll cycle into branches of first priority without any branches of second priority. So code will be linear. This is how it could be:

/// Check first light
if( lights > 0 ) | ( pixel influenced by light[0] ) {
    /// Some calculations to do lighting
    ...
    /// Write result
    ...
}
/// Check second light
if( lights > 1 ) | ( pixel influenced by light[1] ) {
    /// Some calculations to do lighting
    ...
    /// Write result
    ...
}
...
/// Check N light
if( lights > N ) | ( pixel influenced by light[n] ) {
    /// Some calculations to do lighting
    ...
    /// Write result
    ...
}

This is quite ugly, but performance of this method is the same as Deferred Shading, because we do not suffer from shader and uniform switching. This shader is already Uber, so we don't need to any million shader compilations for various material-lights. Also we are using Constant Buffer for this method, so we write all once on CPU and use it once on GPU. So this is how we can manage our renderer not to be CPU-bounded.

I know this is really crazy, but it works. I don't know if THIS type of Forward Renderer used by any of companies, even my method may be not new.

Programmer's fault

26.4.12

Crazy Forward Renderer

No comments:

Post a Comment

About Me