String Variables in C
String variables, represented by character arrays, constitute one of the cornerstone elements of programming in C.
They are fundamental for managing alphanumeric data and texts. Knowledge of the methods to correctly declare and initialize string variables is essential for creating effective and efficient C programs. In this lesson we will see how to do it.
String Variables
In many programming languages, for example in Python, C#, Java, JavaScript, PHP, etc., there exists a data type called string. This data type is used to represent a sequence of characters.
The C language uses a different approach to represent strings. In C, a string is an array of characters terminated by a special character called string terminator. This character is represented by the value 0 and if we use the escape sequence we can indicate it with '\0'.
This approach has its advantages and disadvantages. The main advantage is that strings can be manipulated as character arrays.
The main disadvantages are:
- When dealing with strings you must always take into account the string terminator;
- There is no simple way to calculate the length of a string except to scroll through all the characters until you find the terminator.
Declaration of a String Variable
To declare a string variable in C, we must indicate the data type character array and the dimension of the array. The dimension of the array must be large enough to contain the longest string we might use.
Let's see with an example. Suppose we want to declare a string that can contain at most 100 characters. The declaration will be as follows:
#define STRING_MAX_LENGTH 100
char my_string[STRING_MAX_LENGTH + 1];
In the example above, we have defined a constant STRING_MAX_LENGTH that indicates the maximum length of the string. We then declared a variable my_string of type char (character array) of dimension STRING_MAX_LENGTH + 1. The dimension of the array is STRING_MAX_LENGTH + 1 because we must take into account the terminator.
Having defined the array my_string in this way does not mean that it will always contain strings of maximum length STRING_MAX_LENGTH. In reality, the variable my_string can contain strings of any length, as long as it does not exceed that dimension. The length of a string always and only depends on the position of the terminator.
Character Arrays and String Terminator
When declaring a character array to be used as a string, the most common error made is not taking into account the string terminator.
You must always ensure that the length of the array is equal to the maximum length of the string plus one. This is because, if the array contains a string of maximum length, the string terminator will be positioned in the cell following the last cell of the array.
When you do not leave the necessary space for the string terminator, the behavior of the program might be unpredictable.
String Variables
In C language a string variable is a character array.
A character array of length n can contain a string of maximum length n - 1 since the last character must always be the string terminator: '\0'.
The length of a string always and only depends on the position of the string terminator.
char variable_name[maximum_length + 1];
Initialization of a String Variable
A string variable can be initialized at the very moment it is declared. Let's see an example:
char my_string[12] = "Hello world!";
In the example above, we have declared and initialized a string variable named my_string of length 12. The compiler will fill the array with the characters of the string "Hello world!" and position the string terminator at the end of the string.
So, after this line of code the array my_string will contain the following values:
[ 'H', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd', '!', '\0' ]
At first glance the part to the right of the equal sign might seem like a string literal. In reality it is an abbreviation of an array initializer. From the point of view of the C compiler, the initialization line above is completely equivalent to:
char my_string[12] = { 'H', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd', '!', '\0' };
In the example above we declared the array my_string of exactly 12 elements. In fact the string "Hello world!" is 12 characters long, including the string terminator. What we ask ourselves now is what happens if the string literal is shorter or longer than the dimension of the array?
Let's examine the first case: suppose the string literal is shorter. For example:
char my_string[12] = "Hello";
In this case the compiler will fill the array with the characters of the string literal and position the string terminator at the end of the string. The remaining characters will be initialized to zero, that is the terminator. So the result will be as follows:
[ 'H', 'e', 'l', 'l', 'o', '\0', '\0', '\0', '\0', '\0', '\0', '\0' ]
This behavior is consistent with the behavior that the compiler uses for normal arrays. That is, when the initialization list is shorter than the dimension of the array, the C compiler fills the remaining elements with the value of zero, which in the case of strings is the terminator.
In the case where, instead, the string literal is longer than the dimension of the array, the C compiler will not be able to fill the array with all the characters of the string literal. In this case the C compiler will emit a compilation error.
However, if the string literal has exactly the same length as the array minus the terminator, the C compiler will not emit any error. Let's see an example:
char my_string[11] = "Hello world!";
In this case the C compiler will fill the array with the characters of the string literal but will not be able to position the string terminator at the end of the string. So the result will be as follows:
[ 'H', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd', '!' ]
So although the C compiler does not emit any error, the array cannot be used as a string.
Initialization of a String Variable and String Terminator
When initializing a character array to be used as a string, if the length of the string literal is exactly equal to the dimension of the array minus one, the C compiler will not emit any error. However, the array cannot be used as a string.
As in the case of normal arrays, also in the case of string variables it is possible to omit the initial length of the array. In this case the C compiler will calculate the length of the array based on the length of the string literal plus one for the string terminator. Let's see an example:
char my_string[] = "Hello world!";
In this way the C compiler will ensure that the length of my_string is exactly equal to 13.
Using this technique reduces the possibility of making errors since calculating by hand the length of a string literal for the initialization of a character array is an error-prone operation.
In summary:
Initialization of a String Variable
A string variable can be initialized at the very moment it is declared. To initialize a string variable the following syntax is used:
char string_variable_name[array_length] = "literal_string";
If array_length is omitted, the C compiler will calculate the length of the array based on the length of the string literal plus one for the string terminator.
In the case where the string literal is longer than the dimension of the array, the C compiler will emit a compilation error.
Conversely, if the string literal has a length less than the dimension of the array, the C compiler will fill the array with the characters of the string literal and with the terminator while the remaining characters will be initialized to zero.
Character Arrays and Character Pointers
We have seen in a previous lesson that there exists a close relationship between arrays and pointers in the C language.
In the case of strings, that is character arrays, we can have declarations of this type:
char my_string[] = "Hello world!";
char *string_pointer = "Hello world!";
The two declarations are very similar. In the first case we declared a character array and initialized it with a string literal. In the second case we declared a character pointer and initialized it making it point to the string literal.
From this point of view the two declarations can be used interchangeably. For example, we can use my_string and string_pointer interchangeably in the following expressions:
printf("%s\n", my_string);
printf("%s\n", string_pointer);
The printf function, as we will see, accepts the conversion specifier %s to print a string. In this case, my_string and string_pointer are both character pointers. So the printf function will treat them as character pointers and use them to print the string.
However, you must pay attention because there are important differences between the two declarations. Let's see what they are:
-
In the case of the array, the characters of the string literal are copied during initialization inside the array itself. For this reason we can modify the characters contained in the array at a later time. Let's see an example:
char my_string[] = "Hello world!"; my_string[0] = 'h'; printf("%s\n", my_string);In this case the
printffunction prints the stringhello world!because the character in position 0 has been modified.In the second case, instead, the pointer will always point to the string literal which, by definition, is immutable. So we cannot modify its characters. Let's see an example:
char *string_pointer = "Hello world!"; /* ERROR */ string_pointer[0] = 'h';In this case the C compiler will emit a compilation error because it is not possible to modify the characters of a string literal.
-
In the case of the array,
my_stringis the name of an array and as such it cannot point to another array during the execution of the program. Conversely,string_pointeris a pointer and as such it can point to another array during the execution of the program.
When you want to declare a string that can be modified during the execution of the program it is always the programmer's responsibility to create an array capable of containing it. Declaring a pointer to char is not sufficient. In the future we will see how to dynamically allocate a character array.
Uninitialized Character Pointers
Using an uninitialized character pointer is a very serious error and can lead to unpredictable behaviors. For example:
char *string_pointer;
string_pointer[0] = 'h';
string_pointer[1] = 'e';
string_pointer[2] = 'l';
string_pointer[3] = 'l';
string_pointer[4] = 'o';
string_pointer[5] = '\0';
Since string_pointer has not been initialized, its value is undefined, that is it could point to any memory address. In this case, the program writes the characters h, e, l, l, o and the string terminator at an unknown address and the program might terminate unexpectedly.
In Summary
In this lesson we have seen how to declare and initialize a string variable. In particular we have studied that:
- A string variable is a character array terminated by the character
'\0'. - A string variable can be initialized at the very moment it is declared.
- A string variable can be initialized with a string literal.
In the next lesson we will see how to read and write strings in C language.