You know, actually I always wondered why people had problems with understanding why you need epsilon tests for float comparisons: you simply lose precision when you calculate something because floats aren’t lossless and so you can’t simply use == to compare them, because chances are high that two floats that ought to be equal won’t be equal bit-wise. Thus you use epsilon tests instead and only check if both are almost the same (there are many techniques for this though).

I think this concept is easily understandable, but what about comparisons for floats that are bit-wise equal?

Yes, yes, yes! No, no, no!

So let’s take a look at this code:

// Update Current Time.
mCurrentTime += elapsedTime;

// Check if the animation has finished.
if (!mConfigDataBlock - > mAnimationCycle && mCurrentTime >= mTotalIntegrationTime) {
    // Animation has finished.
    mAnimationFinished = true;

    // Are we restoring the animation?
    if (mAutoRestoreAnimation) {
        // Yes, so play last animation.
        playAnimation(mLastAnimationName, false);
    } else {
        // No, so fix Animation at end of frames.
        mCurrentTime = mTotalIntegrationTime - (mFrameIntegrationTime * 0.5);
    }
}

// Update Current Mod Time.
mCurrentModTime = mFmod(mCurrentTime, mTotalIntegrationTime);

// Calculate Current Frame.
mCurrentFrameIndex = (S32)(mCurrentModTime / mFrameIntegrationTime);

This is from TGB’s t2dAnimationController::updateAnimation function. I had a very weird bug that caused an animation with 5 frames (base 0) and that shouldn’t cycle to run through the following frames: start/0 -> 1 -> 2 -> 3 ->0 -> 4/end

Just so you know some initial conditions:

  • mConfigDataBlock->mAnimationCycle = false, so !mConfigDataBlock->mAnimationCycle = true

  • mAutoRestoreAnimation = false

So obviously the if-block doesn’t get executed for some unknown reason but the mFmod call still wraps the time around and the index jumps to 0 for one frame. Then mCurrentTime increases even more, the if-block gets executed and the time is reset to a value that makes it jump to the last frame (ie. it doesn’t wrap now) and then the animations correctly ends.

Now this sounds easy to fix and it is. Simply add an _epsilon test _and you’re fine off and everything works correctly. But to make sure that I’ve actually fixed the bug for the right reason, I did some more test runs with the original code and echoed the value of the floats to determine when the bug occured.

The first little shock was: it echoed: "3.00000 >= 3.00000".

But this is not bad, is it? It’s simply the precision error in the float that causes it. The printing algorithm rounds it to a certain decimal for which both values are equal but actually they aren’t, right?

Well, so I changed the format string to "%f %f %u %u" and did a pointer conversion to output the integer representation of the two floats, too. They ought to be different now. But the output now was: "3.00000 3.00000 1050253722 1050253722"

Which was very bad for my state of mind. I actually started the debugger and stepped through the function in the disassembler and somehow the greater than or equal comparison would fail to recognize bit-wise equal floats as equal!

The interesting thing was when I added an additional float comparison check a few lines above, the bug wouldn’t happen anymore, so I’ve found a Heisenbug. It gets even nastier: I don’t know if you know any assembler, but maybe you have heard of the asm op FINIT, which resets the FPU state and all registers. Now if I add a __asm { FINIT } right before the if-check the bug disappears, too.

So it seems its an FPU bug. Some calculations that TGB performs confuse the FPU and it won’t be in a valid state when the float comparison happens.

I’ve also talked to other coders and general consensus was that you shouldn’t really trust the FPU too much and always use epsilon tests (that is some redundancy) just to be on the safe side, because you can’t really foresee the state of the FPU at any given time.

TTimo has also mentioned, that the FPU state can be corrupted by other tasks and across task switches, too, so all I’m really certain of now is that this all is some kind of Pandora’s box that I don’t want to open ever again…

So to sum it up: always use epsilon checks and don’t ask too many questions when your code breaks without (if it’s possible), because you might not like what you will find out.

Anyway, stay tuned,
Black

PS: I’ve moved to Munich because of university and I’m only at home on the weekends and only then I have continuous internet access, so updates might become rare again.