7 minute read

In this post I describe the process of finding a strange inconsistency when trying to map large integer float values to a fragment shader in GLSL.

An Open GL Triangle with a red, green and blue edge
An Open GL Triangle with a red, green and blue vertex.

This is a new topic for this blog, but something there might be more of in the future. I’m not a shader pro, but spent some time on shaders to debug an issue on a project that is pushing WebGL to its limits. We are talking millions of nodes, streaming, grid systems, oct-trees, preprocessing etc, to display CAD models online.

Pixel Picking 🟥🟩🟦

One of the most important things in such a project is to be able to identify individual parts. But WebGL has no chance at visualizing millions of interactable parts. To solve this the project uses a method named “Pixel Picking or Color Picking” to be able to click anything in the 3D Model, and get back its ID.

Every index in the mesh has assigned an ID, and when Pixel Picking by clicking the mouse we can (in the background), draw every ID as a color. Check which color is under the mouse coordinates, translate back to a number and then we know which part was clicked.

A normal Red Green Blue (RGB) color has 255 different variants in each major color. If we represent the ID 1 as R0G0B1 we have coded ID 1 to a blackish color! Id 256 could be R0 G1 B255, and so on. This leaves us with a total max of 256^3 == 16 777 216 different colors! We could never identify them by sight, but the computer has no doubt!

The Problem

When Ids were in excess of 2 million we noticed something strange. Some Pixel Pickings seemed to return the wrong value, but not always.

We dug deeper, and noticed that the problem became larger for larger IDs. When in excess of 10 000 000 it became almost random if the correct ID or its neighbors were returned.

In addition this issue did not occur on all computers. Some intel machines worked OK for all ids in our ranges, but Nvidia and Apple devices were more inaccurate.

The diagnostics

Our (veeery simplified) shader looked like this:

// vert.glsl
in highp float a_nodeId; // <- The Input from the "Vertex" (Vertex is a point in a 3D model)
// highp denotes that we need a high precision float. Known as Float32 in some languages.

out highp float v_nodeId; // The Node Id as Output

void vert(){
  v_nodeId = a_nodeId;
}

// frag.glsl
in highp float v_nodeId; // The same Node Id as above, sent to the Fragment shader.
out vec4 f_color; // The final color displayed on screen
void frag(){
   f_color = idToColor(v_nodeId);
}

In some way or form the nodeId loses precision.

There were several hypothesis, but the following stuck around:

Is the precision of WebGL floats enough to support integer values over 2 million?

This was investigated. And while floating point precision is up to the implementation, the LOWER bound is 32 bit float, with the highp modifier.

This should allow us to represent 24 bit of sequential whole integer values. Equal to a max of 256^3 == 16 777 216. The same amount that we have from the colors!

But at these high numbers floats are not able to represent many numbers between whole numbers. Float32 has no way of representing a number between and 8 400 000 to 8 400 001.

But the code above does no math? And this somehow works on some machines?

The debugging

Ok, so how do you debug a shader? You will not get any printf or similar. You just return colors, and hope for the best.

And what we noticed was that the float for the same three vertices varied by up to 1 for a ID of around 3 millions, also if we hard-coded the value in the fragment shader it was correct always.

What could be happening?

If you go back to the top of this article you find a nice triangle with three corners. One red, one blue and one green. These are vertices. What happens between them is that the fragment shader automatically receives interpolated values. What seems to be happening is that the interpolation of three identical values of high floats are never checked for equality. Somehow the ids are multiplied in a way that causes inaccuracies. The algorithm includes the depth from camera, and solves the issue about randomly getting wrong values for the same triangle. The Intel algorithm for this may check that all values are equal, and avoid the math?

What is the fix?

// Add `flat` modifier
flat out highp float v_nodeId;

The flat modifier removes interpolation, so we use the float from the last vertex. In the image above that would leave us with a totally blue triangle. Everything we need, right? End of story!

No. flat breaks performance on every Apple Metal device, so we cannot use this fix!

Why? Short story: Apple Metal devices have to emulate the WebGL spec on flat, and thereby is super slow. Lets hope EXT_provoking_vertex is merged soon.

The current fix is to split the float into two. To avoid large values, and add together at a later point. Thousands and sub-thousands for instances is a quick and dirty hack for getting the right result in the current circumstances! Better solutions are possible, but this works fast enough for now.

Thats all for this time. 🔺

// Nils Henrik


Code Snippets are licensed under MIT No Attribution

Tags:

Categories:

Updated: