Determining the Characteristics of Floating Point Numbers in C
For those who write algorithms and numerical calculation programs in C language, it is fundamental to know the characteristics of floating point numbers supported by the compiler used. These characteristics include precision, the range of representable values and the behavior of arithmetic operations.
Although the average developer often does not care about these details, for those involved in numerical calculation they are crucial to ensure the accuracy and reliability of the results obtained.
Before delving into the mathematical functionalities offered by the C language and its standard library, in this lesson we will see how to determine the characteristics of floating point numbers supported by the C compiler in use.
- To determine if a C compiler supports floating point numbers, we can verify the presence of the
__STDC_IEC_559__macro. - The specific characteristics of floating point numbers, such as precision and the range of representable values, can be obtained through the macros defined in the
<float.h>header file. - The behavior of floating point expression evaluation can be determined using the
FLT_EVAL_METHODmacro, which indicates the evaluation method adopted by the compiler.
Determining if a C compiler supports floating point numbers
When we studied floating point numbers we saw the types that the C language makes available to represent these numbers: float, double and long double. Each of these types has specific characteristics in terms of precision and range of representable values.
The fact is that the C language standard provides for two things:
-
It is not necessary for the compiler to implement all three floating point types. A compiler might not support them at all.
Consider, for example, C compilers for simpler microcontrollers (for example Microchip PIC18, Atmel AVR and STMicroelectronics STM32), which often do not support floating point numbers for reasons of space and performance. Indeed, in the majority of cases, such microcontrollers do not have a floating point calculation unit (FPU, Floating Point Unit).
In these cases, what happens is that the compiler simply does not support floating point numbers and does not compile code that uses them, or it emulates floating point operations through software code, which can be very slow.
-
It is not mandatory that, even if floating point types are supported, they must comply with the IEEE 754 standard.
The IEEE 754 standard defines how floating point numbers must be represented, which operations must be supported and how they must behave in case of errors (such as overflow, underflow, division by zero, etc.). However, a C compiler could choose to implement floating point numbers differently.
In general, there is no universal way to know if a C compiler supports floating point numbers. However, it is at least possible to verify that they comply with the IEEE 754 standard (also called IEC 60559 standard) using the __STDC_IEC_559__ macro.
For example, the following code verifies if the compiler supports the IEEE 754 standard:
#include <stdio.h>
int main() {
#ifdef __STDC_IEC_559__
printf("The compiler supports the IEEE 754 standard.\n");
#else
printf("The compiler does NOT support the IEEE 754 standard.\n");
#endif
return 0;
}
Determining the characteristics of floating point numbers
Having said that, once we have established that the compiler supports floating point numbers, it is useful to know their specific characteristics, such as precision and the range of representable values. To do this, we can use the <float.h> header file, which defines a series of macros that provide this information.
These macros have one of three possible prefixes, depending on the floating point type they refer to:
FLT_for thefloattypeDBL_for thedoubletypeLDBL_for thelong doubletype
The following table lists some of the most common macros defined in <float.h>:
| Macro | Description |
|---|---|
FLT_RADIX |
Specifies the base of the exponent representation (typically 2 for binary numbers). This macro is common to all floating point types. |
FLT_MANT_DIG, DBL_MANT_DIG, LDBL_MANT_DIG |
Number of significant digits (bits) in the mantissa for float, double and long double, respectively. This number indicates the number of digits in the base specified by FLT_RADIX that can be represented with precision and not in base 10. |
FLT_MIN_EXP, DBL_MIN_EXP, LDBL_MIN_EXP |
The minimum exponent (in base FLT_RADIX) for the float, double and long double types, respectively. |
FLT_MAX_EXP, DBL_MAX_EXP, LDBL_MAX_EXP |
The maximum exponent (in base FLT_RADIX) for the float, double and long double types, respectively. |
FLT_MIN, DBL_MIN, LDBL_MIN |
The smallest representable positive normalized number for the float, double and long double types, respectively. |
FLT_MAX, DBL_MAX, LDBL_MAX |
The largest representable number for the float, double and long double types, respectively. |
FLT_EPSILON, DBL_EPSILON, LDBL_EPSILON |
The so-called machine epsilon, which represents the smallest difference between 1 and the next representable number for the float, double and long double types, respectively. |
FLT_DIG, DBL_DIG, LDBL_DIG |
Number of decimal digits that can be represented with precision for the float, double and long double types, respectively. |
Among the listed macros, some are particularly useful for understanding the precision and range of floating point numbers:
FLT_MANT_DIG,DBL_MANT_DIG,LDBL_MANT_DIG: These macros indicate how many significant digits can be represented with precision. For example, ifFLT_MANT_DIGis 24, it means that a number of typefloatcan represent with precision up to 24 binary digits in its mantissa.FLT_EPSILON,DBL_EPSILON,LDBL_EPSILON: Represents the upper limit of relative error due to rounding when performing floating point operations. For example, ifFLT_EPSILONis 1.19209290e-07, it means that the difference between 1 and the next representable number is approximately 1.19 x 10^-7 for numbers of typefloat.
Let's try to write a simple program that prints some of these characteristics for the float, double and long double types:
#include <stdio.h>
#include <float.h>
int main() {
printf("Characteristics of floating point numbers:\n\n");
printf("Type float:\n");
printf(" Base of the exponent: %d\n", FLT_RADIX);
printf(" Significant digits: %d\n", FLT_MANT_DIG);
printf(" Min exponent: %d\n", FLT_MIN_EXP);
printf(" Max exponent: %d\n", FLT_MAX_EXP);
printf(" Min normalized positive: %e\n", FLT_MIN);
printf(" Max representable: %e\n", FLT_MAX);
printf(" Machine epsilon: %e\n", FLT_EPSILON);
printf(" Decimal digits representable with precision: %d\n\n", FLT_DIG);
printf("Type double:\n");
printf(" Base of the exponent: %d\n", FLT_RADIX);
printf(" Significant digits: %d\n", DBL_MANT_DIG);
printf(" Min exponent: %d\n", DBL_MIN_EXP);
printf(" Max exponent: %d\n", DBL_MAX_EXP);
printf(" Min normalized positive: %e\n", DBL_MIN);
printf(" Max representable: %e\n", DBL_MAX);
printf(" Machine epsilon: %e\n", DBL_EPSILON);
printf(" Decimal digits representable with precision: %d\n\n", DBL_DIG);
printf("Type long double:\n");
printf(" Base of the exponent: %d\n", FLT_RADIX);
printf(" Significant digits: %d\n", LDBL_MANT_DIG);
printf(" Min exponent: %d\n", LDBL_MIN_EXP);
printf(" Max exponent: %d\n", LDBL_MAX_EXP);
printf(" Min normalized positive: %Le\n", LDBL_MIN);
printf(" Max representable: %Le\n", LDBL_MAX);
printf(" Machine epsilon: %Le\n", LDBL_EPSILON);
printf(" Decimal digits representable with precision: %d\n\n", LDBL_DIG);
return 0;
}
Trying to compile and execute this program, we will obtain an output that provides us with an overview of the characteristics of floating point numbers supported by our C compiler. For example, on a linux system with GCC running on a machine with x86_64 architecture, the output could be similar to the following:
Characteristics of floating point numbers:
Type float:
Base of the exponent: 2
Significant digits: 24
Min exponent: -125
Max exponent: 128
Min normalized positive: 1.175494e-38
Max representable: 3.402823e+38
Machine epsilon: 1.192093e-07
Decimal digits representable with precision: 6
Type double:
Base of the exponent: 2
Significant digits: 53
Min exponent: -1021
Max exponent: 1024
Min normalized positive: 2.225074e-308
Max representable: 1.797693e+308
Machine epsilon: 2.220446e-16
Decimal digits representable with precision: 15
Type long double:
Base of the exponent: 2
Significant digits: 64
Min exponent: -16381
Max exponent: 16384
Min normalized positive: 3.362103e-4932
Max representable: 1.189731e+4932
Machine epsilon: 1.084202e-19
Decimal digits representable with precision: 18
Behavior of Floating Point Evaluation
A fundamental detail to consider when working with floating point numbers in C is the behavior of expression evaluation.
In fact, the C language standard allows compilers to use a greater precision during the evaluation of floating point expressions compared to that of the data types involved. This means that, for example, an expression involving variables of type float could be evaluated with a precision equivalent to that of a double or even a long double. This behavior can lead to unexpected results, especially when comparing the results of floating point operations with expected values.
For those developing numerical calculation algorithms, it is important to be aware of this aspect. Therefore, the C standard provides an additional macro, FLT_EVAL_METHOD, defined in <float.h>, which indicates the evaluation method used by the compiler:
0: Floating point expressions are evaluated with the precision of the data type involved.1: Floating point expressions are evaluated with the precision ofdouble.2: Floating point expressions are evaluated with the precision oflong double.-1: The evaluation method is not specified.
For example, we can write a simple program to print the value of FLT_EVAL_METHOD:
#include <stdio.h>
#include <float.h>
int main() {
printf("Evaluation method of floating point expressions: %d\n",
FLT_EVAL_METHOD);
return 0;
}
Compiling and executing this program, we will obtain an output that indicates the evaluation method of floating point expressions used by our C compiler. For example, on a linux system with GCC 15 running on a machine with x86_64 architecture, the output could be:
Evaluation method of floating point expressions: 0
Which means that floating point expressions are evaluated with the precision of the data type involved.