File Positioning in C
- The current position within a file is managed automatically during sequential read and write operations.
- Positioning functions allow moving the current position flexibly.
- The main positioning functions are
fseek(),ftell(),rewind(),fgetpos()andfsetpos(). fseek()allows moving the current position relative to a specified reference position.ftell()returns the current position within the file.rewind()returns the current position to the beginning of the file.fgetpos()andfsetpos()use the abstract typefpos_tto manage positions in large files.
Position Within a File
An important concept regarding reading and writing files in C language is that of position within a file.
Each stream has an associated current position or file position. When a file is opened, the current position is initialized at the beginning of the file. If instead the file is opened in append mode, the current position is initialized at the end of the file.
At this point, each read or write operation automatically advances the current position by a certain number of bytes, depending on how many bytes were read or written. By doing so we can access the data within the file sequentially.
To better understand, let's consider the example of a text file read, character by character, using the fgetc() function:
#include <stdio.h>
int main() {
FILE *file = fopen("example.txt", "r");
if (file == NULL) {
// Error handling
return 1;
}
int ch;
while ((ch = fgetc(file)) != EOF) {
putchar(ch); // Prints the read character
}
fclose(file);
return 0;
}
In this example, we first open the file for reading through the fopen() function. The current position is initialized at the beginning of the file. Each time we call fgetc(), we read a character from the file and the current position advances by one byte. This process continues until we reach the end of the file, indicated by EOF.
From our point of view, we can imagine the current position as a cursor that moves forward as we read or write data in the file. We don't need to worry about explicitly indicating which character to read or write, since the current position automatically manages this aspect for us.
The exact same thing also happens during writing. Let's consider the example of a program that writes the first 10 integers to a text file, using the fprintf() function:
#include <stdio.h>
int main() {
FILE *file = fopen("example.txt", "w");
if (file == NULL) {
// Error handling
return 1;
}
for (int i = 1; i <= 10; i++) {
fprintf(file, "%d\n", i);
}
fclose(file);
return 0;
}
Also in this case, the current position is initialized at the beginning of the file when we open it in write mode. Each time we call fprintf(), we write an integer to the file and the current position advances by a certain number of bytes, depending on how many characters were written to represent the number and the newline character. This process continues until we have written all the desired numbers to the file.
Similarly to reading, we can imagine the current position as a cursor that moves forward as we write data to the file, without having to worry about explicitly specifying where to write each time.
In general, the Input/Output functions seen so far, whether character-based, line-based, formatted or binary, are designed to be sequential, that is they operate in such a way as to read or write data starting from the current position and automatically advancing the position itself.
For many applications, sequential input and output are more than sufficient.
However, for many programs the need often arises to be able to move the current position within the file more flexibly, for example to re-read a part of the file or to overwrite data at a specific position.
Let's consider an example where a file contains data related to a list of students, with the relevant personal data and grades. Although the personal data remains unchanged, grades may be subject to changes. At this point one could act in two ways:
-
Operating sequentially:
That is, the entire file is read, grades are modified in memory and the entire file is rewritten with the new grades. This approach is simple but inefficient, especially if the file is large. In fact, every time a grade is modified, it is necessary to re-read and rewrite the entire file, even if only a small part of the data has changed.
-
Using positioning functions:
In this case, the file can be opened, students' personal data read and, when reaching the grades section, positioning functions can be used to move the current position directly to the part of the file containing the grades. In this way, it is possible to modify only the grades without having to re-read or rewrite the entire file, greatly improving the efficiency of the operation.
This second mode of file access, known as direct access or random access, is made possible by the positioning functions that the C language makes available.
The main positioning functions are five and are defined in the standard library <stdio.h>:
fseek()ftell()rewind()fgetpos()fsetpos()
Let's see them in detail.
The fseek function
The fseek() function allows moving the current position within a file flexibly. Its signature is as follows:
int fseek(FILE *stream, long offset, int whence);
The first argument is the pointer to the file (stream) on which you want to operate.
Of particular importance are the next two arguments. In fact, the function does not move the current position to an absolute point in the file, but rather relatively to a reference position specified by the third argument whence.
The possible constants for whence are:
SEEK_SET: the current position is set tooffsetbytes from the beginning of the file.SEEK_CUR: the current position is moved byoffsetbytes relative to the current position.SEEK_END: the current position is set tooffsetbytes from the end of the file.
These three constants are defined as macros in the <stdio.h> library.
So offset can be a positive or negative number, depending on where you want to move the current position.
For example, to move the current position to the beginning of the file, you can use:
fseek(file, 0L, SEEK_SET);
We specified an offset of 0 bytes from the beginning of the file.
If, instead, we want to move the current position 10 bytes forward relative to the current position, we can use:
fseek(file, 10L, SEEK_CUR);
Finally, to move the current position -5 bytes from the end of the file, we can use:
fseek(file, -5L, SEEK_END);
An important observation concerns the type of the offset parameter, which is of type long. This means that the offset can be a very large number, allowing moving the current position even in large files. In fact, in the examples above we used the suffix L to indicate that the numbers 0, 10 and -5 are of type long.
In case of success the fseek() function returns 0. Instead, in case of error, when for example the specified position is not valid, the function returns a value other than 0.
fseek and text files
The fseek function, like other positioning functions, were designed to work well with binary files.
When using them with text files you need to be careful, especially depending on the operating system on which the program runs.
Under UNIX and Linux systems, the fseek function makes no difference between text files and binary files, so it can be used safely on both types.
Something different happens, however, with legacy systems (for example MS-DOS) and some old versions of Windows. On such systems, in fact, the newline character is not represented by a single character, but by a sequence of multiple characters. For example, on Windows historically the line break in a text file is represented by the characters \r\n, that is carriage return and line feed.
This detail, apparently of little importance, creates a whole series of problems. First of all, an offset in a text file will no longer represent a precise number of bytes, but rather a number of characters, which may be different from the number of bytes actually occupied in the file. The same applies when writing to a file.
For example, if we have a text file with the following content:
Hello\r\n
World\r\n
And we want to move the position to the letter W of World, we must take into account that, although W is the seventh byte of the file, it is actually the sixth character, since the sequence \r\n occupies two bytes but represents a single newline character.
This is why there is a mode for opening files in binary mode or in text mode. These legacy systems (MS-DOS and old versions of Windows) differentiate the two types of files precisely to handle these differences. Something that instead is completely ignored in UNIX and Linux systems.
The consequence is that, on such systems, when using fseek with text files there are limitations:
- The
offsetparameter must always be0when using awhencevalue equal toSEEK_CURorSEEK_END. - If using
SEEK_SET, you can use a non-zero offset, but only if that value was previously returned by a call toftell()(see below).
Obviously, the problem arises if we want our code to compile both under Linux and Unix and under other systems. Otherwise, we can safely ignore these limitations.
Another limitation concerns the fact that, with binary files, fseek does not necessarily have to guarantee support for the SEEK_END macro. In fact, in some special cases, it might not be possible to determine the size of the binary file. Also in this case, however, under UNIX and Linux systems this limitation does not arise.
The ftell function
The ftell() function allows obtaining the current position within a file. Its declaration is as follows:
long ftell(FILE *stream);
In case of success, the function returns the current position as a value of type long. In case of error, instead, it returns -1L and sets the global variable errno to indicate the type of error (we will study the errno variable in the next lessons).
We must pay attention to the value returned by the function:
- If we are working with a binary file, the returned value represents the number of bytes from the beginning of the file to the current position.
- If we are working with a text file, the returned value represents the number of characters from the beginning of the file to the current position.
The second point is important. In fact, as seen above, in a text file the number of characters might not match the number of bytes actually occupied in the file, especially on legacy systems like MS-DOS and some versions of Windows.
Therefore, it is not always a good idea to use the value returned by ftell() to calculate an offset in bytes within a text file, especially if you expect the program may be run on different operating systems. The problem does not arise, however, if you work with binary files.
Application: Calculating file size
An important combined application of the fseek() and ftell() functions is calculating the size of a file.
In fact, to obtain the size of a file, we can follow these steps:
- We open the file whose size we want to calculate in binary read mode.
- We use
fseek()to move the current position to the end of the file. - We use
ftell()to obtain the current position, which corresponds to the file size in bytes. - Finally, we close the file.
Here is an example of code that implements this procedure:
#include <stdio.h>
long calculate_file_size(const char *file_name) {
FILE *file = fopen(file_name, "rb");
if (file == NULL) {
// Error handling
return -1L;
}
// We move the current position to the end of the file
if (fseek(file, 0L, SEEK_END) != 0) {
// Error handling
fclose(file);
return -1L;
}
// We obtain the current position, which corresponds to the file size
long size = ftell(file);
if (size == -1L) {
// Error handling
fclose(file);
return -1L;
}
fclose(file);
return size;
}
In this example, the calculate_file_size function opens a file in binary read mode, moves the current position to the end of the file using fseek(), and then obtains the file size with ftell(). Finally, it closes the file and returns the calculated size. In case of error, the function returns -1L.
The rewind function
The rewind() function is a utility function that allows returning the current position to the beginning of a file. Its declaration is as follows:
void rewind(FILE *stream);
The call to rewind() returns no value and is almost completely equivalent to calling fseek() with an offset of 0 and whence equal to SEEK_SET:
fseek(stream, 0L, SEEK_SET);
The only difference is that rewind() also clears the error state of the file associated with the stream, if present.
The fgetpos and fsetpos functions
The fseek and ftell functions, although very useful, have an important limitation: the offset used in fseek is of type long, which may not be sufficient to handle large files on some platforms.
To handle files of enormous sizes, such that a simple long is not enough, the C language provides two additional functions: fgetpos() and fsetpos().
These functions are able to handle positions within huge files because they use an abstract data type called fpos_t, which is specifically defined to represent positions in files more flexibly compared to a simple long.
The fpos_t type is defined in the standard library <stdio.h>, and its implementation can vary depending on the platform and compiler used. In fact, it is not said that fpos_t is an integer; it could be a complex structure containing additional information necessary to represent positions in large files.
The declarations of the two functions are as follows:
int fgetpos(FILE *stream, fpos_t *pos);
int fsetpos(FILE *stream, const fpos_t *pos);
The fgetpos() function saves the current position of the file associated with the stream stream in the object of type fpos_t pointed to by pos. In case of success, the function returns 0; in case of error, it returns a value other than 0 and sets the global variable errno to indicate the type of error.
Similarly, the fsetpos() function sets the current position of the file associated with the stream stream to the position specified by the object of type fpos_t pointed to by pos. Also in this case, in case of success the function returns 0, while in case of error it returns a value other than 0 and sets errno.
Below is an example of using the fgetpos() and fsetpos() functions, in which we save the current position, read some data, and then return to the saved position to re-read the same data:
fpos_t position;
/* ... */
// We save the current position
if (fgetpos(file, &position) != 0) {
// Error handling
}
// We read some data from the file
/* ... */
// We return to the saved position
if (fsetpos(file, &position) != 0) {
// Error handling
}
// We re-read the same data
/* ... */