Classification and Manipulation Functions for Floating-Point Numbers in C
- The C standard provides macros to classify floating-point numbers into categories such as normal, subnormal, infinite and NaN.
- Classification macros include
fpclassify,isfinite,isinf,isnan,isnormalandsignbit. - The C standard also offers functions to manipulate floating-point numbers at low level, such as
copysign,scalbn,nextafterandnexttoward. - Manipulation functions allow copying the sign, multiplying by powers of two and finding the representable number closest to a given floating-point number.
Classification Macros for Floating-Point Numbers
The C standard provides several macros to classify floating-point numbers. These macros are defined in the <math.h> header.
The purpose of these parametric macros is to determine whether a floating-point number is a normal number or represents a special value such as infinity or NaN (Not a Number).
These macros take as input an argument of type double, float or long double and return an integer value indicating the classification of the number.
These macros are:
-
int fpclassify(real x);This macro takes as input a floating-point number
xof any type (double,floatorlong double) and returns an integer value indicating the classification of the number. The possible return values are:FP_INFINITE: ifxrepresents positive or negative infinity.FP_NAN: ifxrepresents a NaN (Not a Number) value.FP_NORMAL: ifxis a normal number.FP_SUBNORMAL: ifxis a subnormal (denormalized) number.FP_ZERO: ifxis zero (positive or negative).
-
int isfinite(real x);This macro returns a non-zero value (true) if the floating-point number
xis finite (that is, it is neither infinite nor NaN), otherwise it returns zero (false). -
int isinf(real x);This macro returns a non-zero value (true) if the floating-point number
xrepresents positive or negative infinity, otherwise it returns zero (false). -
int isnan(real x);This macro returns a non-zero value (true) if the floating-point number
xrepresents a NaN (Not a Number) value, otherwise it returns zero (false). -
int isnormal(real x);This macro returns a non-zero value (true) if the floating-point number
xis a normal number, otherwise it returns zero (false). -
int signbit(real x);This last macro is slightly different from the others. It returns a non-zero value (true) if the sign bit of the floating-point number
xis set (that is, ifxis negative), otherwise it returns zero (false). This macro can be used to determine the sign of special numbers such as zero, infinity and NaN.
These macros are useful for correctly handling floating-point numbers in situations where it is necessary to distinguish between normal, subnormal, infinite and NaN numbers, thus ensuring predictable and safe behavior in mathematical operations.
Manipulation Functions for Floating-Point Numbers
In addition to classification macros, the C standard also provides several functions to manipulate floating-point numbers.
By the verb "manipulate" we mean performing operations that work at low level on floating-point numbers.
Manipulation operations include:
-
Sign copy:
The functions:
double copysign(double x, double y);float copysignf(float x, float y);long double copysignl(long double x, long double y);
These functions return a floating-point number with the absolute value of
xand the sign ofy. In other words, the result will have the same numerical value asx, but the sign (positive or negative) will be that ofy. -
Multiplication by powers of two:
The functions:
double scalbn(double x, int n);float scalbnf(float x, int n);long double scalbnl(long double x, int n);
These functions return the result of
xmultiplied by 2 raised to the power ofn. In practice, these functions modify the exponent of the floating-point numberxby addingnto the current exponent.For example, if
xis 1.5 (which is represented as) and nis 2, the result will be 6.0 (which is represented as). This operation is much more efficient compared to direct multiplication by powers of two, since it acts directly on the exponent of the floating-point number which is in base 2.
-
Representable number search functions:
Even though floating-point numbers are used to represent real numbers, not all real numbers can be represented exactly in this format due to precision limitations. Indeed, an infinite quantity of real numbers cannot be represented exactly in floating-point.
Since
double,floatandlong doublenumbers have a limited number of bits to represent the mantissa and exponent, there exist only a finite number of values that can be represented exactly. Therefore, given a floating point number it is always possible to find the representable number closest to it.In this sense, the C standard provides the following functions:
double nextafter(double x, double y);float nextafterf(float x, float y);long double nextafterl(long double x, long double y);double nexttoward(double x, long double y);float nexttowardf(float x, long double y);long double nexttowardl(long double x, long double y);
The
nextafterandnexttowardfunctions return the representable floating-point number closest toxin the direction ofy. Ifxis equal toy,yis returned.The difference between the two functions is that
nextafteraccepts both arguments of the same type (for example, bothdouble), whilenexttowardaccepts the first argument of typedouble,floatorlong double, and the second argument of typelong double. This allowsnexttowardto provide greater precision in the direction toward which one wishes to find the closest representable number.