Passing File Names as Command Line Arguments in C

Key Takeaways

File names can be passed as command line arguments to a program in C language.
The main function can be defined with two parameters, int argc and char *argv[], to receive command line arguments.
argc represents the number of arguments passed, while argv is an array of strings that contains the individual arguments.
argv[0] contains the program name, while argv[1], argv[2], ..., argv[argc-1] contain the passed arguments.
Using command line arguments to specify file names makes the program more flexible and easy to use.

Specifying File Names in a Program

When writing a program in C language that must operate on one or more files, the need arises to specify which files must be opened and used by the program itself.

Technically there are three ways to specify the names of files to open in a C program:

Hardcoding: insert the file names directly in the program's source code. This method is simple but not very flexible, since it requires modifying the code every time you want to use a different file.

Moreover, the main disadvantage of this approach is that either you use relative paths (which depend on the program's execution directory) or you use absolute paths (which can vary from system to system).
Interactive request: the program can ask the user to enter the file names during execution. This method is more flexible compared to the first, but requires user interaction every time the program is executed.

One could, for example, use the scanf() function to read file names from keyboard. However, this approach can be inconvenient for the user, especially if the program must be executed multiple times with different files.
Command line arguments: the most flexible and commonly used method consists of passing file names as command line arguments when starting the program. This way, the user can specify the files to use without having to modify the code or interact during execution.

This is, for example, the approach used by the vast majority of command line programs in Unix/Linux and Windows operating systems. The gcc compiler itself uses this method to specify the source files to compile.

Since the third option is the most flexible and powerful, in this lesson we will focus on how to implement it in a C program.

Specifying files via Command Line

We have already seen in previous lessons that the main function in C language can be defined in two ways:

Without arguments:
```
int main(void);
```
In this case, the main function does not receive any arguments from the command line.
With arguments:
```
int main(int argc, char *argv[]);
```
In this case, the main function receives two arguments: argc and argv. Where argc is an integer that represents the number of arguments passed to the program from the command line, while argv is an array of strings (array of pointers to char) that contains the individual arguments.

In previous lessons we have seen how to use these arguments to read program configuration parameters. In particular, we have seen that argv is a jagged array of strings, that is, each element of argv is a string, that is, an array of characters terminated by the null character \0, of variable length. Moreover, argv has other important characteristics:

argv[0], that is the first element of the array, contains the name with which the program was executed (which may include the complete or relative path of the executable file).
argv[1], argv[2], ..., argv[argc-1] contain the arguments passed to the program from the command line.
argv[argc] is always NULL, that is a null pointer, which indicates the end of the array and is used as a sentinel.

Having said this, we can exploit these arguments to specify the names of files to open in our program.

For example, suppose we want to write a program that opens a file and copies it to a second file. We can specify the names of the input and output files as command line arguments like this:

$ ./my_program input.txt output.txt

In this example, input.txt is the name of the input file from which to read data, while output.txt is the name of the output file to which to write the copied data.

To create the skeleton of this program in C language, we can define the main function with the argc and argv arguments, and then use argv[1] and argv[2] to obtain the names of the input and output files. Here is an example of how the code might look:

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char *argv[]) {
    // Check that file names have been provided
    if (argc != 3) {
        fprintf(stderr, "Usage: %s <input_file> <output_file>\n", argv[0]);
        return EXIT_FAILURE;
    }

    const char *input_file = argv[1];
    const char *output_file = argv[2];

    // Open the input file
    FILE *infile = fopen(input_file, "r");
    if (infile == NULL) {
        printf("Error opening input file: %s\n", input_file);
        return EXIT_FAILURE;
    }

    // Open the output file
    FILE *outfile = fopen(output_file, "w");
    if (outfile == NULL) {
        printf("Error opening output file: %s\n", output_file);
        // The input file must be closed before terminating
        fclose(infile);
        return EXIT_FAILURE;
    }

    // Here would go the code to copy data from infile to outfile

    // Close the files
    fclose(infile);
    fclose(outfile);

    return EXIT_SUCCESS;
}

In this example, the program checks that exactly two arguments have been provided (in addition to the program name). If the number of arguments is incorrect, it prints a correct usage message and terminates with an error code.

Subsequently, the program opens the input and output files using the names provided via argv[1] and argv[2]. If opening one of the files fails, an error message is printed and the program terminates.

Example: Program to verify if a file can be opened

Let's try to put together the knowledge acquired so far, opening files, closing files and command line arguments, to write a simple program that verifies if one or more files can be opened for reading.

This program will accept as command line arguments the names of files to verify, separated by spaces. For each of them, it will verify two things:

If the file exists.
If the file can be opened for reading.

The second point is quite important, because a file might exist but the user might not have the necessary permissions to open it for reading. Therefore the first point is not sufficient to guarantee that the file can actually be opened.

Finally, the program will print to screen the verification result for each file.

Here is an example of how the code for this program might look:

#include <stdio.h>
#include <stdlib.h>

int verify_file(const char *file_name) {
    FILE *file = fopen(file_name, "r");
    if (file == NULL) {
        return 0; // The file cannot be opened
    }
    fclose(file);
    return 1; // The file can be opened
}

int main(int argc, char *argv[]) {
    if (argc < 2) {
        // Insufficient number of arguments
        printf("Usage: %s <file1> [<file2> ... <fileN>]\n", argv[0]);
        return EXIT_FAILURE;
    }

    for (int i = 1; i < argc; i++) {
        const char *file_name = argv[i];
        if (verify_file(file_name)) {
            printf("The file '%s' can be opened for reading.\n", file_name);
        } else {
            printf("The file '%s' CANNOT be opened for reading.\n", file_name);
        }
    }

    return EXIT_SUCCESS;
}

In this example, the verify_file function attempts to open the specified file in reading mode. If the opening is successful, the file is closed immediately and the function returns 1, indicating that the file can be opened. If the opening fails, the function returns 0.

In the main function, the program checks that at least one file name has been provided as an argument. Then, for each file name passed, it calls the verify_file function and prints the verification result. This way, the user can easily verify if the specified files can be opened for reading simply by executing the program with the file names as command line arguments:

$ ./verify_file file1.txt file2.txt file3.txt
The file 'file1.txt' can be opened for reading.
The file 'file2.txt' CANNOT be opened for reading.
The file 'file3.txt' can be opened for reading.