From Vertex Positions to Packed Element Arrays

At my new workplace at university I'm currently porting an advanced terrain rendering engine from DirectX to OpenGL.
One of the performance optimizations the engine uses is that it draws the terrain tiles right from the index buffer without using a vertex buffer at all - that is it packs the vertex position into the 32-bit index and unpacks it in the vertex shader.

Why is this faster than rendering using a vertex buffer and no index buffer?

When using an index buffer the graphics card can make use of a cache of already transformed (vertex-shaded) vertices and when an index is reused, it can use the cached result instead of running the vertex shader again. Of course, this only works if there exists a certain temporal locality, but that is given.
If no index buffer is used, the vertex cache won't be used, because the implicit index is different for each vertex.

Since I need to port the engine from DirectX to OpenGL, I did some research to see if it is possible to do the same in OpenGL.
It's not really possible but you can achieve something quite similar in OpenGL 3.0.

Using the Input-Assembler Stage without Buffers (OpenGL)

This is meant as OpenGL analogon for Using the Input-Assembler Stage without Buffers (Direct3D 10).

I think the title is self-explanatory but for greater clarity let me rephrase it:
The aim is to render something without using vertex or index buffers, that is (in OpenGL speak) using neither vertex data nor an elements array to render something.
Instead the automatically supplied gl_VertexID attribute  (vertexId in DirectX) is used to determine the vertex the shader is currently processing.

The example in MSDN simply draws a triangle using vertexId:

VSOut VSmain(VSIn input)
{
VSOut output;

if (input.vertexId == 0)
output.pos = float4(0.0, 0.5, 0.5, 1.0);
else if (input.vertexId == 2)
output.pos = float4(0.5, -0.5, 0.5, 1.0);
else if (input.vertexId == 1)
output.pos = float4(-0.5, -0.5, 0.5, 1.0);

output.color = clamp(output.pos, 0, 1);

return output;
}

If you want to do the same thing in OpenGL, you have at least two problems:

  • OpenGL uses glVertex* as end marker for the specification of a single vertex and the array rendering commands (glDrawArrays, glDrawElements, etc.) all require an enabled vertex array.
  • gl_VertexID is only supplied if:
    • the vertex comes from a vertex array command that specifies a complete primitive (e.g. DrawArrays, DrawElements)
    • all enabled vertex arrays have non-zero buffer object bindings, and
    • the vertex does not come from a display list, even if the display list was compiled using DrawArrays / DrawElements with data sourced from buffer objects.

    (from GL_EXT_gpu_shader_4)

There is no way around these requirements, but what you can do is to create dummy vertex buffer with one element, bind it as vertex array and simply draw as many vertices as you want. If you don't access gl_Vertex there is no way that uninitialized data can affect the shader and although behavior is generally undefined in OpenGL, if you render beyond the vertex buffer size, it has worked so far that I've test this on.

Drawing a circle without accessing gl_Vertex

Drawing a circle without accessing gl_Vertex

You can download the source code here.

Packing Vertex Positions into the Elements Array

The next step is to start packing and unpacking data in the gl_VertexID. For this an integer type and bit operations (shifting and masking at least) are required in the vertex shader, so it requires GLSL 1.30 at least.

The code is quite short from my proof of concept project, so I'm pasting the shader here:

#version 130
#extension GL_EXT_gpu_shader4 : enable

out vec4 color;

vec3 unpackVertex(int index) {
return vec3( index & 0xFF, (index >> 8 ) & 0xFF, (index >> 16) & 0xFF ) * (2 / 255.0) - vec3(1.0);
}

void main()
{
vec3 unpackedData = unpackVertex( gl_VertexID );
gl_Position = vec4( unpackedData, 1.0 );
color = vec4( (unpackedData + 1.0) / 2.0, 1.0 );
}

In main.cpp the equivalent can be found for setting up the elements array:

#define packFloat(v)	(int(((v) + 1.0) / 2 * 255) & 255)
#define packVertex(x,y,z) (packFloat( x ) + (packFloat( y ) << 8 ) + (packFloat( z ) << 16))

void display(void) {
[...]

unsigned indices[] = {
packVertex( 0.0, 0.0, -1.0 ), packVertex( 1.0, 0.0, -1.0 ), packVertex( 1.0, 1.0, -1.0 ),
packVertex( 0.0, 0.0, -1.0 ), packVertex( -1.0, 0.0, -1.0 ), packVertex( -1.0, -1.0, -1.0 )
};
glDrawElements( GL_TRIANGLES, sizeof( indices ) / sizeof( *indices ), GL_UNSIGNED_INT, indices );

[...]
}

For the code to actually make sense I should initialize an index buffer/elements array buffer in OpenGL and upload the indices into it but this was just for testing.

Drawing two triangles by packing their position into the index buffer

Drawing two triangles by packing their position into the index buffer

You can download the source code here.

Note:
It has been brought to my attention (thanks Yagero), that on NVIDIA cards the driver currently reduces the element array data size from int to short, which causes the third byte (ie z values in this packing scheme) to always be 0.

A fix is to use an element array buffer (index buffer) which prevents the driver from messing with the data size.
You can download the adapted source code here. To show that it works, I have swapped the x and z component in the packing scheme.

  1. I've tried this method, however it seems that attributes don't work. For instance a vec3 attribute. Or did you get this to work?

  2. I'm not exactly sure what you mean. The idea is not to use attributes but generate all data using the vertex id.

    Otherwise, sure attributes won't work, because you pack the vertex data into the index buffer, so additional attribute data would have to be at quite random positions in the additional vertex buffers (and these would have to be quite big, too - up to 4 GB ofc).

    Cheers,
    Andreas

  3. Thanks for this very interesting post. I tried it and I have an issue. In fact if I don't change anything in your source code, it will work on both ATI and NVIDIA GPU. But I have an issue on NVIDIA GPU: if I try to pack some data on unsigned int indices, I will be able to unpack only the first 16 bits of data.

    With your sample, it will work because the x and y coordinates are packed on the first 16 bits and are unpacked successfully. But the z coordinate is not unpacked successfully and will have wrong values (= 0.0) but nobody will see this issue because triangles are drawn even if z coordinate = 0.0.

    Try this with your sample: just switch y and z coordinates:

    #define packVertex(x,y,z) (packFloat( x ) + (packFloat( z ) << 8) + (packFloat( y ) << 16))


    vec3 unpackVertex(int index) {
    return vec3( index & 0xFF, (index >> 16) & 0xFF, (index >> 8) & 0xFF ) * (2 / 255.0) - vec3(1.0);
    }

    With an ATI GPU, it will work just fine. But with a NVIDIA GPU, nothing will be drawn because the y coordinate won't be unpacked successfully and will be equal to zero.

    It seems that gl_VertexID is stored like an unsigned short variable, even if your draw call uses GL_UNSIGNED_INT.

    Do you have the same issue? Do you know how to fix it?

    Thanks,
    Yagero

    • Hello Yagero,

      this is interesting and I was able to verify it on my own NVIDIA card, so I guess something is wrong with the driver here - maybe some optimization that automatically reduces the element array data size depending on the size of the vertex buffer when the data is sent to the GPU (when not using an element array buffer).

      I've looked into the issue for a bit and if you try and use an index buffer (element array buffer), it will work. I will update the article accordingly.

      If we go with the optimization hypothesis, this behavior makes sense, since the driver won't be able to 'optimize' anything when an element array buffer is used.

      So the only solution/fix I can recommend is to store the data in an element array buffer, which you would probably do anyway, and maybe report this to NVIDIA as a possible unwanted optimization bug in the driver.

      Cheers,
      Andreas

Leave a Comment


NOTE - You can use these HTML tags and attributes:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>