Detecting End of File and I/O Errors in C

Key Takeaways
  • When working with files in C language, it is important to be able to detect the end of the file and handle any input/output (I/O) errors.
  • The feof(), ferror() and clearerr() functions are used to handle these situations.
  • The feof() function checks if the end of the file has been reached.
  • The ferror() function checks if an I/O error has occurred.
  • The clearerr() function resets the status flags of a file.

Error Conditions During Reading

In the previous lesson we saw how to read and write data to files in a formatted way using the fprintf() and fscanf() functions. When reading a file, the fscanf() function returns the number of elements successfully read, so this number is expected to be equal to the number of requested elements.

The correct way to detect any errors during reading is to compare the value returned by fscanf() with the number of requested elements. For example, if we are reading two values (an integer and a float), we expect fscanf() to return 2. If it returns a lower value, it means that a read error has occurred or that the end of the file has been reached:

int intValue;
float floatValue;
int result = fscanf(file, "%d %f", &intValue, &floatValue);
if (result != 2) {
    // Error handling
}

In this example, if fscanf() fails to read both values, it means that something went wrong.

There are three possible causes:

  1. End of File (EOF): If the end of the file is reached before reading all the requested values, fscanf() will return a value lower than the number of requested elements.
  2. Read Error: for reasons such as hardware problems or corrupted files, fscanf() might not be able to read the data correctly.
  3. Invalid Data: if the data in the file does not match the specified format (for example, if you try to read an integer but the file contains a string), fscanf() will fail to convert the data and will return a lower value.

The problem is that simply comparing the value returned by fscanf() with the number of requested elements does not allow distinguishing between these three causes. In many cases this is not a problem because if a read error has occurred, the best solution is simply to stop processing the file if not the entire program.

Other times, however, it can be useful or necessary to distinguish between the various causes of error. For this purpose, the C language provides three specific functions: feof(), ferror() and clearerr(). Let's see them in detail.

File Status Flags and clearerr Function

Each file stream in C language maintains an internal state that indicates not only the current position in the file, but also particular conditions. In detail, two status flags (indicators) are associated with each stream:

  1. End of File Flag (EOF): This flag is set when the end of the file is reached during a read operation.
  2. Error Flag: This flag is set when an error occurs during an input/output (I/O) operation on the file.

These two flags are initially not set (i.e., they are set to false) when a file is opened with the fopen() function.

The first flag, the End of File (EOF) flag, is set automatically by the system when an attempt is made to read beyond the end of the file. While the second flag, the Error flag, is set when an error occurs during an I/O operation, such as a hardware error or a permission problem.

Furthermore, the error flag can also be set during writing, for example if you try to write to a read-only file.

If, instead, an invalid data error does not set the error flag, but simply causes the read functions to return a value lower than the number of requested elements.

Once one of these flags has been set, it remains so until it is explicitly reset. For this reason, it is important to check the status of these flags after each I/O operation to properly handle any errors or end of file conditions. To reset these flags, you can use the clearerr() function.

The syntax of the clearerr() function is as follows:

void clearerr(FILE *stream);

It requires as a parameter a pointer to an object of type FILE, which represents the file stream on which you want to reset the status flags. The function does not return any value.

Usually, clearerr() is not used frequently in normal programs, also because many I/O functions automatically reset the status flags.

feof and ferror Functions

To check the status of the End of File (EOF) and Error flags, the C language provides two specific functions: feof() and ferror().

The feof() function checks if the End of File flag has been set for a specific file stream. Its syntax is as follows:

int feof(FILE *stream);

It requires as a parameter a pointer to an object of type FILE, which represents the file stream to check. The function returns an integer value: a non-zero value (true) if the End of File flag has been set, or 0 (false) if it has not been set. So if the end of the file has been reached, feof() will return a true value.

Similarly, the ferror() function checks if the Error flag has been set for a specific file stream. Its syntax is as follows:

int ferror(FILE *stream);

Also in this case, it requires as a parameter a pointer to an object of type FILE, which represents the file stream to check. The function returns an integer value: a non-zero value (true) if the Error flag has been set, or 0 (false) if it has not been set.

When we use the fscanf or scanf function to read data from a file, if we get as a result a number of elements read lower than that requested, we can use the feof() and ferror() functions to determine if the cause was the end of the file or a read error. In that case, the following cases can be verified:

  1. If feof() returns true, it means that the end of the file has been reached.
  2. If ferror() returns true, it means that a read error has occurred.
  3. If both functions return false, it means that the data in the file was not valid for the specified format.

Example of Using feof and ferror

Let's try to put what we said into practice with an example of a function that takes as input a text file name and returns the first line of the file that starts with a valid positive integer. The function returns the line number found, or returns:

  • -1 if the file cannot be opened,
  • -2 if a read error occurs,
  • -3 no line starts with a valid positive integer.

The function scheme can be as follows:

#include <stdio.h>

int findLine(const char *fileName) {
    FILE *file = fopen(fileName, "r");
    if (file == NULL) {
        // Error opening file
        return -1;
    }

    // File line reading loop
    int lineNumber = 1;
    int value;
    while (fscanf(file, "%d", &value) != 1) {

        // fscanf failed to read an integer
        // Check conditions
        // ...

        // Discard current line
        // ...

        lineNumber++;
    }

    fclose(file);
    return lineNumber;
}

As you can see, the function opens the file and then enters a loop to read the lines of the file. If fscanf() succeeds in reading an integer, the loop terminates and the function returns the current line number.

Conversely, if fscanf() fails to read an integer, we must check the error conditions using feof() and ferror(), as shown below:

if (ferror(file)) {
    // Read error
    fclose(file);
    return -2;
}

if (feof(file)) {
    // End of file reached
    fclose(file);
    // No valid line found
    return -3;
}

Having done this, we must discard the current line in order to proceed with reading the next one. To do this, we can use scansets to ignore everything up to the end of the current line:

// Discard current line
fscanf(file, "%*[^\n]");

In this line of code, the use of %* tells fscanf() to read but not store the characters that match the specified format. The format [^\n] indicates to read all characters that are not the newline character (\n). In this way, fscanf() will read and discard all characters up to the end of the current line, allowing us to proceed with reading the next line in the loop.

The complete code of the program therefore becomes the following:

#include <stdio.h>

int findLine(const char *fileName) {
    FILE *file = fopen(fileName, "r");
    if (file == NULL) {
        // Error opening file
        return -1;
    }

    // File line reading loop
    int lineNumber = 1;
    int value;
    while (fscanf(file, "%d", &value) != 1) {

        // fscanf failed to read an integer
        // Check conditions
        if (ferror(file)) {
            // Read error
            fclose(file);
            return -2;
        }

        if (feof(file)) {
            // End of file reached
            fclose(file);
            // No valid line found
            return -3;
        }

        // Discard current line
        fscanf(file, "%*[^\n]");
        // Discard newline character
        fgetc(file);

        lineNumber++;
    }

    fclose(file);
    return lineNumber;
}

int main(int argc, char *argv[]) {
    if (argc != 2) {
        printf("Usage: %s fileName\n", argv[0]);
        return 1;
    }

    int result = findLine(argv[1]);
    if (result > 0) {
        printf("First line with positive integer found at line: %d\n", result);
    } else {
        printf("Error: %d\n", result);
    }

    return 0;
}

By compiling and running this program with a text file as an argument, it will return the number of the first line that starts with a valid positive integer, or an appropriate error code in case of problems.

For example, suppose we have a text file called data.txt with the following content:

Hello, how are you?
Everything good?
10 example
Sample data

Running the program with this file as an argument:

$ ./findLine data.txt

The program will return:

First line with positive integer found at line: 3