Tag Archives: OpenGL

My Bachelor Thesis

Screenshot of Utah

Screenshot of Utah

For the last two months I have been working on my bachelor thesis at the Chair of Computer Graphics and Visualization. It is about "Multi-Tile Terrain Rendering with OGL/Equalizer"´.
The chair has a very nice Direct3D 10 terrain rendering engine and they want to run it at the newly founded KAUST (King Abdullah University of Science and Technology´) in a massive CAVE environment. A CAVE is a room whose walls are actually screens.
The CAVE at KAUST even supports stereoscopic rendering. Thus in total 12 views can be rendered to.

My job was to port said terrain engine from Direct3D to OpenGL and afterwards to the Equalizer framework, which is an open-source framework for parallelizing OpenGL applications.

You can find/download an online version of my bachelor thesis here. I'll upload the LaTeX at a later date and update this post.

I've spent the last months writing about all this, so I don't feel like talking about the thesis itself anymore. Instead the remainder of this post will contain a post-mortem of it.
Read more »

From Vertex Positions to Packed Element Arrays

At my new workplace at university I'm currently porting an advanced terrain rendering engine from DirectX to OpenGL.
One of the performance optimizations the engine uses is that it draws the terrain tiles right from the index buffer without using a vertex buffer at all - that is it packs the vertex position into the 32-bit index and unpacks it in the vertex shader.

Why is this faster than rendering using a vertex buffer and no index buffer?

When using an index buffer the graphics card can make use of a cache of already transformed (vertex-shaded) vertices and when an index is reused, it can use the cached result instead of running the vertex shader again. Of course, this only works if there exists a certain temporal locality, but that is given.
If no index buffer is used, the vertex cache won't be used, because the implicit index is different for each vertex.

Since I need to port the engine from DirectX to OpenGL, I did some research to see if it is possible to do the same in OpenGL.
It's not really possible but you can achieve something quite similar in OpenGL 3.0.

Using the Input-Assembler Stage without Buffers (OpenGL)

This is meant as OpenGL analogon for Using the Input-Assembler Stage without Buffers (Direct3D 10).

I think the title is self-explanatory but for greater clarity let me rephrase it:
The aim is to render something without using vertex or index buffers, that is (in OpenGL speak) using neither vertex data nor an elements array to render something.
Instead the automatically supplied gl_VertexID attribute  (vertexId in DirectX) is used to determine the vertex the shader is currently processing.

The example in MSDN simply draws a triangle using vertexId:

VSOut VSmain(VSIn input)
{
VSOut output;

if (input.vertexId == 0)
output.pos = float4(0.0, 0.5, 0.5, 1.0);
else if (input.vertexId == 2)
output.pos = float4(0.5, -0.5, 0.5, 1.0);
else if (input.vertexId == 1)
output.pos = float4(-0.5, -0.5, 0.5, 1.0);

output.color = clamp(output.pos, 0, 1);

return output;
}

If you want to do the same thing in OpenGL, you have at least two problems:

  • OpenGL uses glVertex* as end marker for the specification of a single vertex and the array rendering commands (glDrawArrays, glDrawElements, etc.) all require an enabled vertex array.
  • gl_VertexID is only supplied if:
    • the vertex comes from a vertex array command that specifies a complete primitive (e.g. DrawArrays, DrawElements)
    • all enabled vertex arrays have non-zero buffer object bindings, and
    • the vertex does not come from a display list, even if the display list was compiled using DrawArrays / DrawElements with data sourced from buffer objects.

    (from GL_EXT_gpu_shader_4)

There is no way around these requirements, but what you can do is to create dummy vertex buffer with one element, bind it as vertex array and simply draw as many vertices as you want. If you don't access gl_Vertex there is no way that uninitialized data can affect the shader and although behavior is generally undefined in OpenGL, if you render beyond the vertex buffer size, it has worked so far that I've test this on.

Drawing a circle without accessing gl_Vertex

Drawing a circle without accessing gl_Vertex

You can download the source code here.

Packing Vertex Positions into the Elements Array

The next step is to start packing and unpacking data in the gl_VertexID. For this an integer type and bit operations (shifting and masking at least) are required in the vertex shader, so it requires GLSL 1.30 at least.

The code is quite short from my proof of concept project, so I'm pasting the shader here:

#version 130
#extension GL_EXT_gpu_shader4 : enable

out vec4 color;

vec3 unpackVertex(int index) {
return vec3( index & 0xFF, (index >> 8 ) & 0xFF, (index >> 16) & 0xFF ) * (2 / 255.0) - vec3(1.0);
}

void main()
{
vec3 unpackedData = unpackVertex( gl_VertexID );
gl_Position = vec4( unpackedData, 1.0 );
color = vec4( (unpackedData + 1.0) / 2.0, 1.0 );
}

In main.cpp the equivalent can be found for setting up the elements array:

#define packFloat(v)	(int(((v) + 1.0) / 2 * 255) & 255)
#define packVertex(x,y,z) (packFloat( x ) + (packFloat( y ) << 8 ) + (packFloat( z ) << 16))

void display(void) {
[...]

unsigned indices[] = {
packVertex( 0.0, 0.0, -1.0 ), packVertex( 1.0, 0.0, -1.0 ), packVertex( 1.0, 1.0, -1.0 ),
packVertex( 0.0, 0.0, -1.0 ), packVertex( -1.0, 0.0, -1.0 ), packVertex( -1.0, -1.0, -1.0 )
};
glDrawElements( GL_TRIANGLES, sizeof( indices ) / sizeof( *indices ), GL_UNSIGNED_INT, indices );

[...]
}

For the code to actually make sense I should initialize an index buffer/elements array buffer in OpenGL and upload the indices into it but this was just for testing.

Drawing two triangles by packing their position into the index buffer

Drawing two triangles by packing their position into the index buffer

You can download the source code here.

Note:
It has been brought to my attention (thanks Yagero), that on NVIDIA cards the driver currently reduces the element array data size from int to short, which causes the third byte (ie z values in this packing scheme) to always be 0.

A fix is to use an element array buffer (index buffer) which prevents the driver from messing with the data size.
You can download the adapted source code here. To show that it works, I have swapped the x and z component in the packing scheme.

Weird Things Happen

Today I was at work for the first time since August and nobody seems to have logged into my shared workstation since then since my account name was still in the login field.
The first weird thing that happened was that suddenly the application I've been working on didn't run correctly anymore.
First let me tell you a bit about the app:
It renders a volume texture with some custom raytracing shaders in one window and allows editing of the transfer function - this is a function/texture that maps a scalar value in the volume texture to a color in another window. The transfer function is made up of primitives/shapes that you can colorize and stretch or deform the way you want to allow for easy editing. Look at this picture to see what a transfer function (editor) could look like - this isn't ours though.
The TF editor uses an FBO to render the transfer function into the transfer texture.

While the rest of the app was running mostly fine (that is AntTweakBar rendered fine and the background of the transfer function editor which is a histogram of the volume texture, too), the actual transfer function rendering was corrupted.
On my workstation you could move the primitive around and while you did so it would either totally disappear, invert its color or show small light green or purple blocks that stayed stationary inside the window as you moved the primitive around - the whole look liked corrupted memory that stayed corrupted and was only visible when the primitive rendered over it.

It's a pity I didn't take any screenshots but I didn't think of that back then.

I was totally at a loss with this behavior, so I rebooted which didn't fix it. Somehow I was hoping that the gfx card broke in the last month and it wasn't my code which caused this totally inexplicable behavior.
To determine this I logged into another workstation in our lab, checked out my code and run the program. To my horror it had similar problems... now quite the same but still strangely colored sprinkles all over the editor window when the primitive was rendered there. Both workstations are using graphics cards from Nvidia (8800 GTX if I recall correctly).

I tried to determine the revision which broke rendering with a binary search like trial & error through our revision history at work and found out that it broke when I changed the way the primitives were rendered to the texture.

It used to simply iterate through all primitives and render them once to the FBO and it was done.

The paper I'm implementing at the moment required  a special solution for overlapping primitives though, so I switched to a more complicated 2.5 pass rendering.

If $latex c_i $ is the color value of the ith primitive and $latex \alpha _ i $ its alpha value, then the resulting color and alpha are calculated as follows: $latex \left< c, \alpha \right> = \left< \frac{\displaystyle\sum_i^n{c_i \alpha_i}}{\displaystyle\sum_{i}^n{\alpha_i}},\max_{i=1..n}{\alpha_i} \right>$.

For this I do three passes and use two floating point textures:

  1. Render the primitives into a temp texture with
    glBlendFuncSeparate( GL_SRC_ALPHA, GL_ONE, GL_ONE, GL_ONE );

    to get $latex \left< {\displaystyle\sum_i^n{c_i \alpha_i}},{\displaystyle\sum_{i}^n{\alpha_i}} \right> $.

  2. Next I set the FBO to the real transfer function texture and use the temp texture as source and divide the color values by their alpha using a custom shader.
  3. Finally I use
    glBlendEquation( GL_MAX );

    and only masks all except the alpha channel and render all primitives again into the transfer function texture. This gets the correct term into the alpha channel of the texture.

This usually works and although it's a bit cumbersome it yields the correct result. Only now I identified it to be the source of all my trouble.

Here's the relevant part in our source code:

_checkGLErrors();
m_fbo.AttachTexture( GL_COLOR_ATTACHMENT0_EXT, m_tempTexture.glTarget, m_tempTexture.glTexID );
assert( m_fbo.IsValid() );
m_fbo.Bind();
_checkGLErrors();

glClearColor( 0.0, 0.0, 0.0, 0.0 );
glClear( GL_COLOR_BUFFER_BIT );

// C = srcC * srcA + dstC = C * A + C' * A' + ...
// A = srcA + dstA = A + A' + A'' + ...
glBlendFuncSeparate( GL_SRC_ALPHA, GL_ONE, GL_ONE, GL_ONE );
glBlendEquationSeparate( GL_FUNC_ADD, GL_FUNC_ADD );

for( Primitive::PList::const_iterator i = m_primitives.begin() ; i != m_primitives.end() ; i++ ) {
(*i)->bake();
}

m_fbo.Unattach( GL_COLOR_ATTACHMENT0_EXT );
m_fbo.Disable();
_checkGLErrors();

// divide color by alpha
glMatrixMode( GL_PROJECTION );
glLoadIdentity();

glBlendFunc( GL_ONE, GL_ZERO );

procInf.imageOp( m_tempTexture, m_transferFunction, m_program );

// render again this time to figure out the max alpha value

glMatrixMode( GL_PROJECTION );
glLoadIdentity();
glOrtho( 0.0, 1.0, 0.0, 1.0, -1.0, 1.0 );

m_fbo.AttachTexture( GL_COLOR_ATTACHMENT0_EXT, m_transferFunction.glTarget, m_transferFunction.glTexID );
assert( m_fbo.IsValid() );
m_fbo.Bind();
_checkGLErrors();

glColorMask( GL_FALSE, GL_FALSE, GL_FALSE, GL_TRUE );
glBlendEquation( GL_MAX );

for( Primitive::PList::const_iterator i = m_primitives.begin() ; i != m_primitives.end() ; i++ ) {
(*i)->bake();
}

Now I was ready to hunt the bug but the problem the revision was written more than a few days before I left in August and it was working back then. I refused to believe that it was broken and I went to my boss who had been using his rig in the last days and indeed it worked. I couldn't really believe my eyes. Two workstations appeared to be broken for some reason (driver issues or even the gfx cards? - who would know..).

I went back and somehow managed to change the appearance of the bug on one of the workstations - all the framebuffer corruptions went away and were replaced with what seemed like an unchanging alpha channel of one of the FBO textures with alpha being 0 on the initial position of the primitive and 1.0 everywhere else, so it masked it out at its initial position.

For some reason I thought that maybe using BlendFuncSeparate and the normal BlendFunc function together maybe left OGL's state machine in a mess for some reason so I added calls to disable and enable blending to code to render the call to glBlendFunc useless (see the code above and/or below).

_checkGLErrors();
m_fbo.AttachTexture( GL_COLOR_ATTACHMENT0_EXT, m_tempTexture.glTarget, m_tempTexture.glTexID );
assert( m_fbo.IsValid() );
m_fbo.Bind();
_checkGLErrors();

glClearColor( 0.0, 0.0, 0.0, 0.0 );
glClear( GL_COLOR_BUFFER_BIT );

// C = srcC * srcA + dstC = C * A + C' * A' + ...
// A = srcA + dstA = A + A' + A'' + ...
glBlendFuncSeparate( GL_SRC_ALPHA, GL_ONE, GL_ONE, GL_ONE );
glBlendEquationSeparate( GL_FUNC_ADD, GL_FUNC_ADD );

for( Primitive::PList::const_iterator i = m_primitives.begin() ; i != m_primitives.end() ; i++ ) {
(*i)->bake();
}

m_fbo.Unattach( GL_COLOR_ATTACHMENT0_EXT );
m_fbo.Disable();
_checkGLErrors();

// divide color by alpha
glMatrixMode( GL_PROJECTION );
glLoadIdentity();

---> glDisable(GL_BLEND); <---
glBlendFunc( GL_ONE, GL_ZERO );

procInf.imageOp( m_tempTexture, m_transferFunction, m_program );

// render again this time to figure out the max alpha value

glMatrixMode( GL_PROJECTION );
glLoadIdentity();
glOrtho( 0.0, 1.0, 0.0, 1.0, -1.0, 1.0 );

m_fbo.AttachTexture( GL_COLOR_ATTACHMENT0_EXT, m_transferFunction.glTarget, m_transferFunction.glTexID );
assert( m_fbo.IsValid() );
m_fbo.Bind();
_checkGLErrors();

---> glEnable( GL_BLEND ); <---
glColorMask( GL_FALSE, GL_FALSE, GL_FALSE, GL_TRUE );
glBlendEquation( GL_MAX );

for( Primitive::PList::const_iterator i = m_primitives.begin() ; i != m_primitives.end() ; i++ ) {
(*i)->bake();
}

m_fbo.Disable();

When I ran the program, it decided to work normally and render everything the way it should. I was stunned once again.
Now the good thing with causality is that everything is supposed to be repeatable, so I removed the statements again and was expecting to see the bug appear again.. - but guess what it did not and suddenly the machine was working again as supposed.

I tried the same fix with the other workstation in our lab and it fixed it before I had time to take some screenshots of the persisting framebuffer corruption.

I'm still at a loss here and I guess I won't be able to come up with an explanation of my own for this, so if anyone has an idea, feel free to tell me :-)

Cheers,
Andreas