Determining the Characteristics of Floating Point Numbers in C

For those who write algorithms and numerical calculation programs in C language, it is fundamental to know the characteristics of floating point numbers supported by the compiler used. These characteristics include precision, the range of representable values and the behavior of arithmetic operations.

Although the average developer often does not care about these details, for those involved in numerical calculation they are crucial to ensure the accuracy and reliability of the results obtained.

Before delving into the mathematical functionalities offered by the C language and its standard library, in this lesson we will see how to determine the characteristics of floating point numbers supported by the C compiler in use.

Key Takeaways

To determine if a C compiler supports floating point numbers, we can verify the presence of the __STDC_IEC_559__ macro.
The specific characteristics of floating point numbers, such as precision and the range of representable values, can be obtained through the macros defined in the <float.h> header file.
The behavior of floating point expression evaluation can be determined using the FLT_EVAL_METHOD macro, which indicates the evaluation method adopted by the compiler.

Determining if a C compiler supports floating point numbers

When we studied floating point numbers we saw the types that the C language makes available to represent these numbers: float, double and long double. Each of these types has specific characteristics in terms of precision and range of representable values.

The fact is that the C language standard provides for two things:

It is not necessary for the compiler to implement all three floating point types. A compiler might not support them at all.

Consider, for example, C compilers for simpler microcontrollers (for example Microchip PIC18, Atmel AVR and STMicroelectronics STM32), which often do not support floating point numbers for reasons of space and performance. Indeed, in the majority of cases, such microcontrollers do not have a floating point calculation unit (FPU, Floating Point Unit).

In these cases, what happens is that the compiler simply does not support floating point numbers and does not compile code that uses them, or it emulates floating point operations through software code, which can be very slow.
It is not mandatory that, even if floating point types are supported, they must comply with the IEEE 754 standard.

The IEEE 754 standard defines how floating point numbers must be represented, which operations must be supported and how they must behave in case of errors (such as overflow, underflow, division by zero, etc.). However, a C compiler could choose to implement floating point numbers differently.

In general, there is no universal way to know if a C compiler supports floating point numbers. However, it is at least possible to verify that they comply with the IEEE 754 standard (also called IEC 60559 standard) using the __STDC_IEC_559__ macro.

For example, the following code verifies if the compiler supports the IEEE 754 standard:

#include <stdio.h>

int main() {
    #ifdef __STDC_IEC_559__
        printf("The compiler supports the IEEE 754 standard.\n");
    #else
        printf("The compiler does NOT support the IEEE 754 standard.\n");
    #endif
    return 0;
}

Determining the characteristics of floating point numbers

Having said that, once we have established that the compiler supports floating point numbers, it is useful to know their specific characteristics, such as precision and the range of representable values. To do this, we can use the <float.h> header file, which defines a series of macros that provide this information.

These macros have one of three possible prefixes, depending on the floating point type they refer to:

FLT_ for the float type
DBL_ for the double type
LDBL_ for the long double type

The following table lists some of the most common macros defined in <float.h>:

**Table 1:** Table of the main macros defined in float.h
Macro	Description
`FLT_RADIX`	Specifies the base of the exponent representation (typically 2 for binary numbers). This macro is common to all floating point types.
`FLT_MANT_DIG`, `DBL_MANT_DIG`, `LDBL_MANT_DIG`	Number of significant digits (bits) in the mantissa for `float`, `double` and `long double`, respectively. This number indicates the number of digits in the base specified by `FLT_RADIX` that can be represented with precision and not in base 10.
`FLT_MIN_EXP`, `DBL_MIN_EXP`, `LDBL_MIN_EXP`	The minimum exponent (in base `FLT_RADIX`) for the `float`, `double` and `long double` types, respectively.
`FLT_MAX_EXP`, `DBL_MAX_EXP`, `LDBL_MAX_EXP`	The maximum exponent (in base `FLT_RADIX`) for the `float`, `double` and `long double` types, respectively.
`FLT_MIN`, `DBL_MIN`, `LDBL_MIN`	The smallest representable positive normalized number for the `float`, `double` and `long double` types, respectively.
`FLT_MAX`, `DBL_MAX`, `LDBL_MAX`	The largest representable number for the `float`, `double` and `long double` types, respectively.
`FLT_EPSILON`, `DBL_EPSILON`, `LDBL_EPSILON`	The so-called machine epsilon, which represents the smallest difference between 1 and the next representable number for the `float`, `double` and `long double` types, respectively.
`FLT_DIG`, `DBL_DIG`, `LDBL_DIG`	Number of decimal digits that can be represented with precision for the `float`, `double` and `long double` types, respectively.

Among the listed macros, some are particularly useful for understanding the precision and range of floating point numbers:

FLT_MANT_DIG, DBL_MANT_DIG, LDBL_MANT_DIG: These macros indicate how many significant digits can be represented with precision. For example, if FLT_MANT_DIG is 24, it means that a number of type float can represent with precision up to 24 binary digits in its mantissa.
FLT_EPSILON, DBL_EPSILON, LDBL_EPSILON: Represents the upper limit of relative error due to rounding when performing floating point operations. For example, if FLT_EPSILON is 1.19209290e-07, it means that the difference between 1 and the next representable number is approximately 1.19 x 10^-7 for numbers of type float.

Let's try to write a simple program that prints some of these characteristics for the float, double and long double types:

#include <stdio.h>
#include <float.h>

int main() {
    printf("Characteristics of floating point numbers:\n\n");

    printf("Type float:\n");
    printf("  Base of the exponent: %d\n", FLT_RADIX);
    printf("  Significant digits: %d\n", FLT_MANT_DIG);
    printf("  Min exponent: %d\n", FLT_MIN_EXP);
    printf("  Max exponent: %d\n", FLT_MAX_EXP);
    printf("  Min normalized positive: %e\n", FLT_MIN);
    printf("  Max representable: %e\n", FLT_MAX);
    printf("  Machine epsilon: %e\n", FLT_EPSILON);
    printf("  Decimal digits representable with precision: %d\n\n", FLT_DIG);

    printf("Type double:\n");
    printf("  Base of the exponent: %d\n", FLT_RADIX);
    printf("  Significant digits: %d\n", DBL_MANT_DIG);
    printf("  Min exponent: %d\n", DBL_MIN_EXP);
    printf("  Max exponent: %d\n", DBL_MAX_EXP);
    printf("  Min normalized positive: %e\n", DBL_MIN);
    printf("  Max representable: %e\n", DBL_MAX);
    printf("  Machine epsilon: %e\n", DBL_EPSILON);
    printf("  Decimal digits representable with precision: %d\n\n", DBL_DIG);

    printf("Type long double:\n");
    printf("  Base of the exponent: %d\n", FLT_RADIX);
    printf("  Significant digits: %d\n", LDBL_MANT_DIG);
    printf("  Min exponent: %d\n", LDBL_MIN_EXP);
    printf("  Max exponent: %d\n", LDBL_MAX_EXP);
    printf("  Min normalized positive: %Le\n", LDBL_MIN);
    printf("  Max representable: %Le\n", LDBL_MAX);
    printf("  Machine epsilon: %Le\n", LDBL_EPSILON);
    printf("  Decimal digits representable with precision: %d\n\n", LDBL_DIG);

    return 0;
}

Trying to compile and execute this program, we will obtain an output that provides us with an overview of the characteristics of floating point numbers supported by our C compiler. For example, on a linux system with GCC running on a machine with x86_64 architecture, the output could be similar to the following:

Characteristics of floating point numbers:

Type float:
  Base of the exponent: 2
  Significant digits: 24
  Min exponent: -125
  Max exponent: 128
  Min normalized positive: 1.175494e-38
  Max representable: 3.402823e+38
  Machine epsilon: 1.192093e-07
  Decimal digits representable with precision: 6

Type double:
  Base of the exponent: 2
  Significant digits: 53
  Min exponent: -1021
  Max exponent: 1024
  Min normalized positive: 2.225074e-308
  Max representable: 1.797693e+308
  Machine epsilon: 2.220446e-16
  Decimal digits representable with precision: 15

Type long double:
  Base of the exponent: 2
  Significant digits: 64
  Min exponent: -16381
  Max exponent: 16384
  Min normalized positive: 3.362103e-4932
  Max representable: 1.189731e+4932
  Machine epsilon: 1.084202e-19
  Decimal digits representable with precision: 18

Behavior of Floating Point Evaluation

A fundamental detail to consider when working with floating point numbers in C is the behavior of expression evaluation.

In fact, the C language standard allows compilers to use a greater precision during the evaluation of floating point expressions compared to that of the data types involved. This means that, for example, an expression involving variables of type float could be evaluated with a precision equivalent to that of a double or even a long double. This behavior can lead to unexpected results, especially when comparing the results of floating point operations with expected values.

For those developing numerical calculation algorithms, it is important to be aware of this aspect. Therefore, the C standard provides an additional macro, FLT_EVAL_METHOD, defined in <float.h>, which indicates the evaluation method used by the compiler:

0: Floating point expressions are evaluated with the precision of the data type involved.
1: Floating point expressions are evaluated with the precision of double.
2: Floating point expressions are evaluated with the precision of long double.
-1: The evaluation method is not specified.

For example, we can write a simple program to print the value of FLT_EVAL_METHOD:

#include <stdio.h>
#include <float.h>

int main() {
    printf("Evaluation method of floating point expressions: %d\n",
           FLT_EVAL_METHOD);
    return 0;
}

Compiling and executing this program, we will obtain an output that indicates the evaluation method of floating point expressions used by our C compiler. For example, on a linux system with GCC 15 running on a machine with x86_64 architecture, the output could be:

Evaluation method of floating point expressions: 0

Which means that floating point expressions are evaluated with the precision of the data type involved.