Introduction to the Preprocessor in C
So far, in the previous lessons, we have used preprocessing directives without going too much into the details of how they work. We used the #define
and #include
directives.
These directives in the C language are handled by the Preprocessor. The preprocessor is a program that runs before the actual compiler. Its job is to modify the source code before it is passed to the compiler, which will then generate the executable code.
The preprocessor is a very versatile tool that makes the C language (and C++ as well) unique and powerful. However, its use can be counterproductive if one does not understand how it works. The purpose of this lesson is precisely to understand the role of the preprocessor in the compilation process of a program written in C.
How the Preprocessor Works
Typically, the behavior of the preprocessor is controlled by Preprocessing Directives. These directives are instructions interpreted by the preprocessor and that modify the source code. The result of this operation is then passed to the compiler, which transforms it into object code—that is, instructions executable by the processor. The key point is that the source code is first modified or transformed.
Preprocessing directives always start with the #
(hash) character and end at the end of the line they are on. In previous lessons, we have already encountered two types of directives: #define
and #include
.
The #define
directive allows us to define what is known as a macro, short for macro-instruction. Macros are not the focus of this lesson, but they are useful for understanding the mechanisms behind the preprocessor. A macro can be thought of as a symbolic name we give to a value or an expression. This symbolic name can then be used throughout the source code in place of the value or expression it represents.
A simple macro defined with the #define
directive looks like this:
#define PI 3.141592
With this macro, we have defined a symbolic name PI
and associated it with an expression (specifically a literal value) 3.141592
. When we feed a source file to the preprocessor, it performs two operations:
- It first stores the macro name along with the value or expression it represents;
- Then, every time it encounters the macro name in the source code, it replaces it with the value or expression it represents. In technical terms, we say the preprocessor expands the macro.
The #include
directive, on the other hand, allows us to include the contents of one source file into another. This directive is very useful for organizing source code across multiple files. So far, we have used it to include the stdio.h
file, which contains the declarations of standard input/output functions.
When the preprocessor encounters a #include
directive, it performs the following operations:
- It searches for the file specified by the
#include
directive within the system directories; - If the file is found, it opens it and reads its contents;
- It replaces the
#include
directive with the contents of the file.
In both cases—whether we use #define
or #include
—the original source file is modified at the text level. #define
replaces a label with an expression, #include
inserts the entire contents of another file into the code.
To clarify the role of the preprocessor in compilation, let's look at the following figure:
The preprocessor receives as input the file source.c
, which may contain preprocessor directives. In general, a C source file always contains one or more directives.
The preprocessor scans the source file and, each time it encounters a preprocessor directive, it interprets and executes it. The important thing to note is that when a directive is executed, the preprocessor modifies the source file and removes the directive. In fact, the final result of the preprocessor is another modified source file that is, above all, free of directives.
This modified source file is then passed directly to the actual compiler, which analyzes it and produces as output the executable object code.
This whole process is almost always hidden. In practice, we do not have to invoke the preprocessor directly—we just invoke the compiler, which will handle calling the preprocessor and passing it the source file. For example, the gcc
compiler, when invoked on a source file, automatically calls its preprocessor, which is named cpp
(C PreProcessor).
To better clarify, let's examine a sample program. Suppose we want to compile the following program, which takes an angle in degrees as input and converts it to radians:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
|
This program contains two preprocessing directives:
- On line 1, the
#include
directive includes thestdio.h
file, which contains the declarations of standard input/output functions; - On line 3, the
#define
directive defines the macroCONVERSION_FACTOR
. This macro is used on line 15 to convert the angle from degrees to radians and represents the expression3.141592 / 180.0
.
Now let's try to compile the program by invoking the preprocessor and compiler separately. On Linux, we can use the cpp
preprocessor directly on the source file and save the result to a temporary file:
$ cpp source.c > source.i
If we examine the contents of the source.i
file, we see that the preprocessor has replaced the #include
directive with the contents of the stdio.h
file and has expanded the CONVERSION_FACTOR
macro into the expression 3.141592 / 180.0
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
|
We have omitted the content of stdio.h
for brevity.
The key thing to note is that the preprocessor removed the preprocessing directives and replaced them with blank spaces. In the resulting file, the #include
and #define
directives are missing. However, on line 14, instead of the macro CONVERSION_FACTOR
, we have the expression 3.141592 / 180.0
.
Additionally, the preprocessor also performed another hidden operation: it removed all comments from the source code and replaced them with blank spaces.
To summarize:
Preprocessor
The Preprocessor is a program or component that takes a C source file as input and modifies it before passing it to the compiler. The modifications applied by the preprocessor are controlled by preprocessing directives.
When the preprocessor encounters a preprocessing directive, it interprets and executes it. In particular, when a directive is executed, the preprocessor modifies the source file and removes the directive itself, replacing it with a blank space.
Moreover, the preprocessor removes all comments from the source code.
Preprocessing Directives
Preprocessing directives can be grouped into three categories:
- Macro Definition: the
#define
and#undef
directives allow you to define and remove macros; - File Inclusion: the
#include
directive allows you to include the contents of one source file into another; - Conditional Compilation: the directives
#if
,#ifdef
,#ifndef
,#elif
,#else
, and#endif
allow you to compile only certain parts of the source code based on conditions.
In addition to these, there are other more specialized and less commonly used directives: #error
, #line
, and #pragma
.
We will analyze these directives in detail in the next lessons. For the rest of this lesson, we will focus on the common rules that all preprocessing directives must follow.
Common Rules
Although used for different purposes, preprocessing directives must follow some common rules. Specifically:
-
They must begin with the
#
(hash) character;Each directive always starts with the
#
character, but it doesn't have to be at the beginning of the line. The#
must be the first non-whitespace character on the line, so a directive can be indented.For example:
#define PI 3.141592 #define E 2.718281
This code is perfectly valid. The
#define
directive on line 2 is indented by 4 spaces. -
They end at the end of the current line unless the backslash
\
character is used;Each directive ends at the end of the line on which it appears. This means a directive cannot span multiple lines. If a directive is too long for a single line, it must be split using the backslash
\
character at the end of every line except the last.For example, suppose we want to define a macro for the value of
. We could write the macro on a single line like this: #define PI_2 3.141592 / 2.0
If we want to split the macro across two lines, we must explicitly use the backslash
\
at the end of the first line:#define PI_2 3.141592 \ / 2.0
If we had omitted the backslash
\
at the end of the first line, the compiler would have produced a compilation error. The preprocessor would have handed the compiler the following code:/ 2.0
In this code, the first line is empty because the directive was removed, while the second line remains and represents an incomplete expression.
-
They can appear anywhere in the program;
Preprocessing directives can appear anywhere in the program. They don’t have to be at the beginning of the source file. However, as a programming best practice, preprocessing directives are usually grouped at the top of the source file.
-
Comments can appear on the same line as the directive;
Preprocessing directives can be followed by comments. Comments can appear on the same line as the directive or on subsequent lines.
For example:
#define PI_2 3.141592 / 2.0 /* Pi divided by 2 */
This code is perfectly valid.
-
The tokens of a directive can be separated by an arbitrary number of spaces and tabs;
A directive, such as a
#define
, can be written like this:# define PI 3.141592
This code is perfectly valid. The tokens of the
#define
directive are separated by an arbitrary number of spaces and tabs.
Summary
This lesson serves as a starting point for understanding how the preprocessor works. In this lesson, we saw that:
- The preprocessor is a program that takes a source file as input and modifies it before passing it to the actual compiler;
- The preprocessor is controlled by preprocessing directives;
- Preprocessing directives are instructions that begin with the
#
(hash) character and end at the end of the current line; - Once the directives are executed, they are removed from the source file.
These four points are fundamental to understanding how to properly use the preprocessor. Many common errors stem from forgetting one of the points above.
At this point, we can dive in and begin studying individual preprocessing directives. In the next lesson, we will analyze the #define
directive, which allows us to define macros.