Classification and Manipulation Functions for Floating-Point Numbers in C

Key Takeaways
  • The C standard provides macros to classify floating-point numbers into categories such as normal, subnormal, infinite and NaN.
  • Classification macros include fpclassify, isfinite, isinf, isnan, isnormal and signbit.
  • The C standard also offers functions to manipulate floating-point numbers at low level, such as copysign, scalbn, nextafter and nexttoward.
  • Manipulation functions allow copying the sign, multiplying by powers of two and finding the representable number closest to a given floating-point number.

Classification Macros for Floating-Point Numbers

The C standard provides several macros to classify floating-point numbers. These macros are defined in the <math.h> header.

The purpose of these parametric macros is to determine whether a floating-point number is a normal number or represents a special value such as infinity or NaN (Not a Number).

These macros take as input an argument of type double, float or long double and return an integer value indicating the classification of the number.

These macros are:

  • int fpclassify(real x);

    This macro takes as input a floating-point number x of any type (double, float or long double) and returns an integer value indicating the classification of the number. The possible return values are:

    • FP_INFINITE: if x represents positive or negative infinity.
    • FP_NAN: if x represents a NaN (Not a Number) value.
    • FP_NORMAL: if x is a normal number.
    • FP_SUBNORMAL: if x is a subnormal (denormalized) number.
    • FP_ZERO: if x is zero (positive or negative).
  • int isfinite(real x);

    This macro returns a non-zero value (true) if the floating-point number x is finite (that is, it is neither infinite nor NaN), otherwise it returns zero (false).

  • int isinf(real x);

    This macro returns a non-zero value (true) if the floating-point number x represents positive or negative infinity, otherwise it returns zero (false).

  • int isnan(real x);

    This macro returns a non-zero value (true) if the floating-point number x represents a NaN (Not a Number) value, otherwise it returns zero (false).

  • int isnormal(real x);

    This macro returns a non-zero value (true) if the floating-point number x is a normal number, otherwise it returns zero (false).

  • int signbit(real x);

    This last macro is slightly different from the others. It returns a non-zero value (true) if the sign bit of the floating-point number x is set (that is, if x is negative), otherwise it returns zero (false). This macro can be used to determine the sign of special numbers such as zero, infinity and NaN.

These macros are useful for correctly handling floating-point numbers in situations where it is necessary to distinguish between normal, subnormal, infinite and NaN numbers, thus ensuring predictable and safe behavior in mathematical operations.

Manipulation Functions for Floating-Point Numbers

In addition to classification macros, the C standard also provides several functions to manipulate floating-point numbers.

By the verb "manipulate" we mean performing operations that work at low level on floating-point numbers.

Manipulation operations include:

  • Sign copy:

    The functions:

    • double copysign(double x, double y);
    • float copysignf(float x, float y);
    • long double copysignl(long double x, long double y);

    These functions return a floating-point number with the absolute value of x and the sign of y. In other words, the result will have the same numerical value as x, but the sign (positive or negative) will be that of y.

  • Multiplication by powers of two:

    The functions:

    • double scalbn(double x, int n);
    • float scalbnf(float x, int n);
    • long double scalbnl(long double x, int n);

    These functions return the result of x multiplied by 2 raised to the power of n. In practice, these functions modify the exponent of the floating-point number x by adding n to the current exponent.

    For example, if x is 1.5 (which is represented as 1.5 \cdot 2^0) and n is 2, the result will be 6.0 (which is represented as 1.5 \cdot 2^2).

    This operation is much more efficient compared to direct multiplication by powers of two, since it acts directly on the exponent of the floating-point number which is in base 2.

  • Representable number search functions:

    Even though floating-point numbers are used to represent real numbers, not all real numbers can be represented exactly in this format due to precision limitations. Indeed, an infinite quantity of real numbers cannot be represented exactly in floating-point.

    Since double, float and long double numbers have a limited number of bits to represent the mantissa and exponent, there exist only a finite number of values that can be represented exactly. Therefore, given a floating point number it is always possible to find the representable number closest to it.

    In this sense, the C standard provides the following functions:

    • double nextafter(double x, double y);
    • float nextafterf(float x, float y);
    • long double nextafterl(long double x, long double y);
    • double nexttoward(double x, long double y);
    • float nexttowardf(float x, long double y);
    • long double nexttowardl(long double x, long double y);

    The nextafter and nexttoward functions return the representable floating-point number closest to x in the direction of y. If x is equal to y, y is returned.

    The difference between the two functions is that nextafter accepts both arguments of the same type (for example, both double), while nexttoward accepts the first argument of type double, float or long double, and the second argument of type long double. This allows nexttoward to provide greater precision in the direction toward which one wishes to find the closest representable number.