Floating-Point Numbers Comparison Functions in C

Key Takeaways
  • Floating-point numbers must be compared using relative error to avoid unexpected results due to rounding errors and limited precision.
  • The C language standard library provides specific macros to compare floating-point numbers safely, taking into account special cases like NaN and infinities.
  • The fmax, fmin, and fdim functions allow calculating the maximum, minimum, and positive difference between two floating-point numbers respectively, correctly handling cases where one of the arguments is NaN.

The problem of comparing floating-point numbers

When working with integer numbers, comparing two values is a simple and straightforward operation. Just use the standard relational operators like ==, !=, <, >, <=, and >=.

Although these operators also work with floating-point numbers (float, double, and long double types), direct comparison between these data types can lead to unexpected results due to the approximate nature of floating-point number representation.

For example, due to rounding errors and limited precision, two numbers that should be equal might not be exactly equal when represented in floating-point. This can cause problems when writing code like this:

double x;
double y;

// Suppose x and y are calculated somehow

if (x == y) {
    // Do something if x is equal to y
} else {
    // Do something else if x is not equal to y
}

The problem in this code is that x and y might not be exactly equal due to small differences caused by floating-point representation, even if mathematically they should be. Therefore the equality comparison x == y might return false even when you expect it to be true.

For this reason, the correct way to compare floating-point numbers is to use relative error.

Mathematically, the relative error between two numbers a and b is defined as:

\text{err}_r = \frac{|a - b|}{|b|}

This value represents the ratio between the absolute error (that is, the absolute difference between a and b, |a - b|) and the reference value b. In simple terms, it tells us as a percentage how much a deviates from b.

To compare two floating-point numbers x and y, you can calculate the relative error between them and check if this error is less than a predefined tolerance threshold, often called "epsilon". If the relative error is less than this threshold, the two numbers are considered "practically equal". So the code above can be rewritten this way:

double x;
double y;

double epsilon = 1e-10; // Tolerance threshold

if (fabs(x - y) / fabs(y) < epsilon) {
    // Do something if x is practically equal to y
} else {
    // Do something else if x is not practically equal to y
}

To realize this problem, consider the following example:

#include <stdio.h>

int main() {
    double a = 0.1 + 0.2;
    double b = 0.3;

    if (a == b) {
        printf("a and b are equal\n");
    } else {
        printf("a and b are not equal\n");
    }

    return 0;
}

In this example, we are performing the sum:

a = 0.1 + 0.2

then, we compare a with b, which is equal to 0.3. Although mathematically a and b should be equal, the direct comparison a == b will return false due to the approximate representation of floating-point numbers.

In fact, if you run this code, the output will be:

a and b are not equal

The reason for this behavior lies in the fact that 0.1, 0.2, and 0.3 cannot be represented exactly in binary, leading to small imprecisions in calculations.

Therefore, to correctly compare a and b, you should use relative error as described previously:

#include <stdio.h>
#include <math.h>

int main() {
    double a = 0.1 + 0.2;
    double b = 0.3;
    double epsilon = 1e-10; // Tolerance threshold

    if (fabs(a - b) / fabs(b) < epsilon) {
        printf("a and b are practically equal\n");
    } else {
        printf("a and b are not practically equal\n");
    }

    return 0;
}
Note

Floating-point numbers must always be compared using relative error

Since floating-point numbers can introduce rounding errors and limited precision, it is fundamental to compare them using relative error rather than standard relational operators. This approach helps avoid unexpected results and ensures that comparisons are more reliable:

\frac{|a - b|}{|b|} &lt; \epsilon

where \epsilon is a predefined tolerance threshold.

Fortunately, the C language standard library provides specific functions to compare floating-point numbers safely and reliably, taking into account the peculiarities of these data types. These functions are defined in the <math.h> header.

Let's see them in detail.

Comparison macros for floating-point numbers

The C language standard library defines several macros to compare floating-point numbers safely. These macros are designed to handle the peculiarities of floating-point numbers, such as NaN (Not a Number) and infinities.

These parametric macros are:

Macro Description
int isgreater(real x, real y); Returns a non-zero value if x is greater than y.
int isgreaterequal(real x, real y); Returns a non-zero value if x is greater than or equal to y.
int isless(real x, real y); Returns a non-zero value if x is less than y.
int islessequal(real x, real y); Returns a non-zero value if x is less than or equal to y.
int islessgreater(real x, real y); Returns a non-zero value if x is less than or greater than y (that is, if x and y are not equal).
int isunordered(real x, real y); Returns a non-zero value if at least one of x and y is NaN (Not a Number).
Table 1: Macros for comparing floating-point numbers in C language

The first observation concerns the fact that these macros accept as arguments values of any floating-point type: float, double, or long double. The specific type is automatically deduced by the compiler based on the types of the passed arguments. The return type is always int but the returned value is 0 (false) or non-zero (true), depending on the comparison result.

The isgreater, isgreaterequal, isless, and islessequal macros essentially perform the same operations as the relational operators >, >=, <, and <=, but safely for floating-point numbers. They take into account special cases like NaN and infinities, ensuring reliable results.

The islessgreater macro is useful for checking if two floating-point numbers are different from each other and has two peculiarities:

  1. It correctly handles cases where one or both arguments are NaN, returning 0 (false) in such situations.
  2. It uses as epsilon a predefined tolerance threshold: DBL_EPSILON, FLT_EPSILON, or LDBL_EPSILON, depending on the floating-point data type used.

The last macro, isunordered, is particularly useful for checking if at least one of the two floating-point numbers is NaN. This is important because any comparison with NaN always returns false, so this macro allows handling such situations correctly.

Minimum, Maximum, and Positive Difference Functions

In addition to comparison macros, the C language standard library also provides functions to calculate the minimum, maximum, and positive difference between two floating-point numbers. These functions are also defined in the <math.h> header.

The functions are:

Function Description
double fmax(double x, double y); Returns the maximum between x and y. If one of them is NaN, returns the other value.
float fmaxf(float x, float y); float version of fmax.
long double fmaxl(long double x, long double y); long double version of fmax.
double fmin(double x, double y); Returns the minimum between x and y. If one of them is NaN, returns the other value.
float fminf(float x, float y); float version of fmin.
long double fminl(long double x, long double y); long double version of fmin.
double fdim(double x, double y); Returns the positive difference between x and y, that is x - y if x > y, otherwise 0. If one of them is NaN, returns NaN.
float fdimf(float x, float y); float version of fdim.
long double fdiml(long double x, long double y); long double version of fdim.
Table 2: Minimum, Maximum, and Positive Difference Functions for floating-point numbers in C language

Some observations about these functions:

  • Each function has three versions, one for each floating-point data type: float, double, and long double. The f suffix indicates the float version, while the l suffix indicates the long double version.
  • The fmax and fmin functions correctly handle cases where one of the arguments is NaN, returning the other value. This is useful to avoid unexpected results when working with floating-point numbers that might be NaN.
  • The fdim function calculates the positive difference between two floating-point numbers, returning 0 if the first number is less than or equal to the second. Also in this case, if one of the arguments is NaN, the function returns NaN. Mathematically, fdim(x, y) can be expressed as:

    \text{fdim}(x, y) = \begin{cases} x - y &amp; \text{if } x &gt; y \\ +0 &amp; \text{otherwise} \end{cases}