Today I was at work for the first time since August and nobody seems to have logged into my shared workstation since then: my account name was still in the login field.

The first weird thing that happened was that suddenly the application I’ve been working on didn’t run correctly anymore.

First let me tell you a bit about the app: It renders a volume texture with some custom raytracing shaders in one window and allows editing of the transfer function - this is a function (or texture) that maps a value in the volume texture to a color in another window. The transfer function is made up of shapes that you can colorize and stretch or deform the way you want to allow for easy editing. Look at this picture to see what a transfer function (editor) could look like - this isn’t ours though. The TF editor uses an FBO to render the transfer function into the transfer texture.

While the rest of the app was running mostly fine (that is AntTweakBar rendered fine and the background of the transfer function editor which is a histogram of the volume texture, too), the actual transfer function rendering was corrupted. On my workstation you could move the primitive around and while you did so it would either totally disappear, invert its color or show small light green or purple blocks that stayed stationary inside the window as you moved the primitive around - the whole look liked corrupted memory that stayed corrupted and was only visible when the primitive rendered over it.

It’s a pity I didn’t take any screenshots but I didn’t think of that back then.

I was totally at a loss with this behavior, so I rebooted which didn’t fix it. Somehow I was hoping that the gfx card broke in the last month and it wasn’t my code which caused this totally inexplicable behavior. To determine this I logged into another workstation in our lab, checked out my code and run the program. To my horror it had similar problems… not quite the same but still strangely colored sprinkles all over the editor window when the primitive was rendered there. Both workstations are using graphics cards from Nvidia (8800 GTX if I recall correctly).

I tried to determine the revision which broke rendering with a binary search like trial & error through our revision history at work and found out that it broke when I changed the way the primitives were rendered to the texture.

It used to simply iterate through all primitives and render them once into the FBO.

The paper I’m implementing at the moment required a special solution for overlapping primitives though, so I switched to a more complicated 2.5 pass rendering.

If \(c_i\) is the color value of the ith primitive and \(\alpha_i\) its alpha value, then the resulting color and alpha are calculated as follows: \[\left< c, \alpha \right> = \left< \frac{\displaystyle\sum_i^n{c_i \alpha_i}}{\displaystyle\sum_{i}^n{\alpha_i}},\max_{i=1..n}{\alpha_i} \right>\].

For this I do three passes and use two floating point textures:

  1. Render the primitives into a temp texture with

       glBlendFuncSeparate( GL_SRC_ALPHA, GL_ONE, GL_ONE, GL_ONE );

    to get \[\left< {\displaystyle\sum_i^n{c_i \alpha_i}},{\displaystyle\sum_{i}^n{\alpha_i}} \right>\]

  2. Next I set the FBO to the real transfer function texture and use the temp texture as source and divide the color values by their alpha using a custom shader.

  3. Finally I use

      glBlendEquation( GL_MAX );

and only masks all except the alpha channel and render all primitives again into the transfer function texture. This gets the correct term into the alpha channel of the texture.

This usually works and although it’s a bit cumbersome it yields the correct result. Only now I identified it to be the source of all my trouble.

Here’s the relevant part in our source code:

_checkGLErrors();
m_fbo.AttachTexture( GL_COLOR_ATTACHMENT0_EXT, m_tempTexture.glTarget, m_tempTexture.glTexID );
assert( m_fbo.IsValid() );
m_fbo.Bind();
_checkGLErrors();

glClearColor( 0.0, 0.0, 0.0, 0.0 );
glClear( GL_COLOR_BUFFER_BIT );

// C = srcC * srcA + dstC = C * A + C' * A' + ...
// A = srcA + dstA = A + A' + A'' + ...
glBlendFuncSeparate( GL_SRC_ALPHA, GL_ONE, GL_ONE, GL_ONE );
glBlendEquationSeparate( GL_FUNC_ADD, GL_FUNC_ADD );

for( Primitive::PList::const_iterator i = m_primitives.begin() ; i != m_primitives.end() ; i++ ) {
    (*i)->bake();
}

m_fbo.Unattach( GL_COLOR_ATTACHMENT0_EXT );
m_fbo.Disable();
_checkGLErrors();

// divide color by alpha
glMatrixMode( GL_PROJECTION );
glLoadIdentity();

glBlendFunc( GL_ONE, GL_ZERO );

procInf.imageOp( m_tempTexture, m_transferFunction, m_program );

// render again this time to figure out the max alpha value

glMatrixMode( GL_PROJECTION );
glLoadIdentity();
glOrtho( 0.0, 1.0, 0.0, 1.0, -1.0, 1.0 );

m_fbo.AttachTexture( GL_COLOR_ATTACHMENT0_EXT, m_transferFunction.glTarget, m_transferFunction.glTexID );
assert( m_fbo.IsValid() );
m_fbo.Bind();
_checkGLErrors();

glColorMask( GL_FALSE, GL_FALSE, GL_FALSE, GL_TRUE );
glBlendEquation( GL_MAX );

for( Primitive::PList::const_iterator i = m_primitives.begin() ; i != m_primitives.end() ; i++ ) {
    (*i)->bake();
}

Now I was ready to hunt the bug but the problem the revision was written more than a few days before I left in August and it was working back then. I refused to believe that it was broken and I went to my boss who had been using his rig in the last days and indeed it worked. I couldn’t really believe my eyes. Two workstations appeared to be broken for some reason (driver issues or even the gfx cards? - who would know..).

I went back and somehow managed to change the appearance of the bug on one of the workstations - all the framebuffer corruptions went away and were replaced with what seemed like an unchanging alpha channel of one of the FBO textures with alpha being 0 on the initial position of the primitive and 1.0 everywhere else, so it masked it out at its initial position.

For some reason I thought that maybe using BlendFuncSeparate and the normal BlendFunc function together maybe left OGL’s state machine in a mess for some reason so I added calls to disable and enable blending to code to render the call to glBlendFunc useless (see the code above and/or below).

_checkGLErrors();
m_fbo.AttachTexture( GL_COLOR_ATTACHMENT0_EXT, m_tempTexture.glTarget, m_tempTexture.glTexID );
assert( m_fbo.IsValid() );
m_fbo.Bind();
_checkGLErrors();

glClearColor( 0.0, 0.0, 0.0, 0.0 );
glClear( GL_COLOR_BUFFER_BIT );

// C = srcC * srcA + dstC = C * A + C' * A' + ...
// A = srcA + dstA = A + A' + A'' + ...
glBlendFuncSeparate( GL_SRC_ALPHA, GL_ONE, GL_ONE, GL_ONE );
glBlendEquationSeparate( GL_FUNC_ADD, GL_FUNC_ADD );

for( Primitive::PList::const_iterator i = m_primitives.begin() ; i != m_primitives.end() ; i++ ) {
    (*i)->bake();
}

m_fbo.Unattach( GL_COLOR_ATTACHMENT0_EXT );
m_fbo.Disable();
_checkGLErrors();

// divide color by alpha
glMatrixMode( GL_PROJECTION );
glLoadIdentity();

---> glDisable(GL_BLEND); <---
glBlendFunc( GL_ONE, GL_ZERO );

procInf.imageOp( m_tempTexture, m_transferFunction, m_program );

// render again this time to figure out the max alpha value

glMatrixMode( GL_PROJECTION );
glLoadIdentity();
glOrtho( 0.0, 1.0, 0.0, 1.0, -1.0, 1.0 );

m_fbo.AttachTexture( GL_COLOR_ATTACHMENT0_EXT, m_transferFunction.glTarget, m_transferFunction.glTexID );
assert( m_fbo.IsValid() );
m_fbo.Bind();
_checkGLErrors();

---> glEnable( GL_BLEND ); <---
glColorMask( GL_FALSE, GL_FALSE, GL_FALSE, GL_TRUE );
glBlendEquation( GL_MAX );

for( Primitive::PList::const_iterator i = m_primitives.begin() ; i != m_primitives.end() ; i++ ) {
    (*i)->bake();
}

m_fbo.Disable();

When I ran the program, it decided to work normally and render everything the way it should. I was stunned once again. Now the good thing with causality is that everything is supposed to be repeatable, so I removed the statements again and was expecting to see the bug appear again.. - but guess what it did not and suddenly the machine was working again as supposed.

I tried the same fix with the other workstation in our lab and it fixed it before I had time to take some screenshots of the persisting framebuffer corruption.

I’m still at a loss here and I guess I won’t be able to come up with an explanation of my own for this, so if anyone has an idea, feel free to tell me :-)

Cheers,
 Andreas